WO2021103508A1

WO2021103508A1 - Seed user selection method, apparatus and device, and storage medium

Info

Publication number: WO2021103508A1
Application number: PCT/CN2020/097517
Authority: WO
Inventors: 陈啟柱
Original assignee: 北京三快在线科技有限公司
Priority date: 2019-11-25
Filing date: 2020-06-22
Publication date: 2021-06-03
Also published as: CN110942345B; CN110942345A

Abstract

A seed user selection method and a corresponding apparatus. The method comprises: obtaining user set data; creating m user groups according to the user set data; for the i-th user group in the m user groups, obtaining feature data of the i-th user group; determining an influence parameter of the first user with respect to the second user according to the feature data of the i-th user group; constructing an influence matrix; selecting at least one seed user from n users according to the influence matrix; and storing user information of the seed user.

Description

Seed user selection method, device, equipment and storage medium

This disclosure claims the priority of the Chinese patent application filed on November 25, 2019, with the application number 201911168479.9 and the invention title "Seed user selection method, device, equipment and storage medium", the entire content of which is incorporated herein by reference. In the open.

Technical field

The embodiments of the present application relate to the field of Internet technology, and in particular to a method, device, equipment, and storage medium for selecting seed users.

Background technique

As online social networking platforms become more and more closely integrated with daily life, the commercial value of online social networking platforms has been increasingly tapped and utilized.

In related technologies, operators of online social network platforms usually select some seed users from users registered on their platform based on the behaviors of users registered on their platform, and provide certain feedback to the selected seed users. , Enabling seed users to promote the operator’s products to the surrounding population.

Summary of the invention

The embodiments of the present application provide a method, device, equipment, and storage medium for selecting seed users. The technical solution is as follows:

On the one hand, an embodiment of the present application provides a method for selecting seed users, and the method includes:

Acquiring user collection data, where the user collection data includes n users, where n is an integer greater than 1;

Creating m user groups according to the user collection data, each of the user groups includes two users having an association relationship, and the m is a positive integer;

For the i-th user group in the m user groups, acquiring characteristic data of the i-th user group, where the i-th user group includes a first user and a second user that have an association relationship;

According to the characteristic data of the i-th user group, determine the influence parameter of the first user relative to the second user, and the influence parameter is used to characterize that the first user successfully recommends the product to the Probability of the second user;

Constructing an influence matrix, the influence matrix being a matrix of n rows and n columns, wherein the element in the u th row and the v column in the influence matrix represents the influence parameter of the user u relative to the user v;

Selecting at least one seed user from the n users according to the influence matrix;

Store the user information of the seed user.

On the other hand, an embodiment of the present application provides a device for selecting a seed user, and the device includes:

The collection data acquisition module is configured to acquire user collection data, where the user collection data includes n users, and the n is an integer greater than 1;

A user group creation module, configured to create m user groups according to the user collection data, each of the user groups includes two users having an association relationship, and the m is a positive integer;

The characteristic data acquisition module is configured to acquire characteristic data of the i-th user group for the i-th user group in the m user groups, wherein the i-th user group includes the first user group having an association relationship. User and second user;

The influence parameter determination module is configured to determine the influence parameter of the first user relative to the second user according to the characteristic data of the i-th user group, and the influence parameter is used to characterize the first user. The probability that the user successfully recommends the product to the second user;

The matrix construction module is used to construct an influence matrix, which is a matrix with n rows and n columns, wherein the element in the u-th row and v-th column in the influence matrix represents the influence of user u relative to user v Force parameter

A seed user selection module, configured to select at least one seed user from the n users according to the influence matrix;

The information storage module is used to store the user information of the seed user.

In another aspect, an embodiment of the present application provides a computer device, the computer device includes a processor and a memory, and a computer program is stored in the memory. The computer program is loaded and executed by the processor to realize the above-mentioned seed. The user's selection method.

In another aspect, an embodiment of the present application provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the above method for selecting a seed user is implemented.

In yet another aspect, a computer program product is provided. When the computer program product is executed by a processor, it is used to implement the above-mentioned seed user selection method.

The technical solutions provided in the embodiments of the present application can bring the following beneficial effects:

By determining the influence parameters between users according to the characteristic data of each user group, and then constructing an influence matrix according to the influence parameters between users, and selecting seed users according to the influence matrix, a kind of seed user’s Selection method. Moreover, in the embodiment of the present application, the influence parameter is determined based on the characteristic data of the user group with the association relationship, and the association relationship between the users is considered, and the social network relationship is deeply mined, so that the prediction of the influence parameter is more comprehensive. It is accurate, and solves the technical problem that the related technology only considers the characteristic data of a single user, which is too singular to accurately determine the seed user.

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings needed in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative work.

FIG. 1 is a flowchart of a method for selecting seed users according to an embodiment of the present application;

FIG. 2 is a diagram of a social network relationship provided by an embodiment of the present application;

Fig. 3 is a flowchart of selecting seed users according to an influence matrix provided by an embodiment of the present application;

FIG. 4 is a flowchart of selecting seed users according to an influence matrix according to another embodiment of the present application;

FIG. 5 is a flowchart of a method for selecting seed users according to another embodiment of the present application;

FIG. 6 is a block diagram of a device for selecting seed users according to an embodiment of the present application;

FIG. 7 is a block diagram of a device for selecting seed users according to another embodiment of the present application;

Fig. 8 is a block diagram of a computer device provided by an embodiment of the present application.

Detailed ways

In order to make the purpose, technical solutions, and advantages of the present application clearer, the implementation manners of the present application will be described in further detail below in conjunction with the accompanying drawings.

In the technical solutions provided by the embodiments of the present application, the execution subject of each step may be a computer device, such as a server with computing and storage capabilities, or terminals such as mobile phones, tablets, multimedia playback devices, wearable devices, or other computer devices. Optionally, when the computer device is a server, the computer device may be a server, a server cluster composed of multiple servers, or a cloud computing service center. For ease of description, in the following method embodiments, only the steps are executed by the server for introduction and description, but this does not constitute a limitation.

Please refer to FIG. 1, which shows a flowchart of a method for selecting a seed user according to an embodiment of the present application. The method can include the following steps (110-170):

Step 110: Obtain user set data, where the user set data includes n users, where n is an integer greater than 1.

The server can obtain user collection data from its own stored data, or it can obtain user collection data from other computer devices with storage functions, such as from other servers, terminals, etc., which are not limited in the embodiments of the present application. The user set data includes n users, and data information corresponding to the n users, such as user IDs, association relationships, and so on.

Step 120: Create m user groups according to the user set data, each user group includes two users having an association relationship, and m is a positive integer.

The association relationship refers to the relationship between two users who have the ability to send and receive information. Optionally, the association relationship is expressed as a social relationship. In the embodiment of the present application, the association relationship includes, but is not limited to, any of the following: friend relationship, follow and Concerned relationship, ordering relationship, etc. Optionally, the association relationship has different manifestations in different online social network platforms. For example, in an online social network platform such as an instant messaging application, the association relationship is expressed as a friend relationship; in an entertainment online social network platform, The association relationship is expressed as the following and being followed. Optionally, in a non-online social network platform, an association relationship between users can be constructed based on user information, where user information refers to information generated when a user uses the non-online social network platform, for example, in non-online social network platforms such as shopping In the online social network platform, the association relationship between each user can be constructed based on the user information such as each user address, purchase record, order data, link sharing, network interaction, and device sharing.

Optionally, in order to express the user group clearly and simply, the user group is expressed in the form of a number of pairs, for example, the user group may be expressed as (first user, second user). In the embodiment of the present application, for two users with an association relationship, two user groups can be created. The relationship characteristics of the two user groups are different. That is, when the user group is expressed in the form of number pairs, two of the number pairs The position of each element is different, the relationship characteristics represented are also different, for example, the user group (first user, second user) represents the relationship characteristics of the first user relative to the second user, and the user group (second user, first user) ) Represents the relationship feature of the second user relative to the first user.

After the server determines the user set containing n users and the association relationship between the users, it can create m user groups, and each user group includes two users with an association relationship. In the embodiment of the present application, the value of m is determined by the association relationship between users in the user set.

In a possible implementation manner, the above step 120 includes: constructing a relationship graph corresponding to the user set data, the relationship graph including n nodes, n nodes and n users are in one-to-one correspondence, and the two users having an association relationship correspond to each other. There are edges between nodes; from the relationship graph, extract m user groups. In this way, it is convenient for the server to quickly construct m user groups from the user set.

The relationship diagram, also known as the social network relationship diagram, is used to characterize the relationship between users. After the server determines a user set containing n users, it can construct a relationship graph based on the user set. The number of nodes included in the relationship graph is the same as the number of users, and there is a one-to-one correspondence between the nodes of the relationship graph and the users. For example, as shown in Figure 2, it shows a relationship graph of a social network, the number of users in the social network is 6, the number of nodes 21 in the relationship graph is also 6, and the 6 nodes in the relationship graph are There is a one-to-one correspondence between the 6 users, that is, the number in node 21 corresponds to the user ID, node 1 represents user 1, node 2 represents user 2, node 3 represents user 3... Optionally, the user ID is user In the order in the user set, in the embodiment of the application, the server can randomly sort the users in the user set after obtaining the user set, so as to obtain the user identification of each user. The server may also use certain parameters after obtaining the user set. The users in the user set are sorted, such as the number of strokes corresponding to the user name, the phonetic order of initials, etc., which are not limited in the embodiment of the present application. As shown in FIG. 2, the nodes corresponding to two users with an association relationship have an edge 22 between them. For example, there is an edge 22 between the node 1 and the node 2, which means that there is an association relationship between the user 1 and the user 2.

The server can extract m user groups according to the relationship graph. As shown in Figure 2, there are five edges 22 in the relationship graph. Since two users with an association relationship can create two user groups in this embodiment of the application, the server can extract from the relationship graph shown in Figure 2 10 user groups, namely (User 1, User 2), (User 2, User 1), (User 1, User 4), (User 4, User 1), (User 1, User 6), (User 6 , User 1), (User 6, User 5), (User 5, User 6), (User 4, User 3), and (User 3, User 4).

Step 130: For the i-th user group among the m user groups, obtain characteristic data of the i-th user group, where the i-th user group includes the first user and the second user having an association relationship.

The characteristic data represents the characteristics of the users in the user group and the characteristics of the relationship between the users. The characteristic data of the i-th user group includes the characteristic data of the first user, the characteristic data of the second user, and the relationship between the first user and the second user. Characteristic data. Optionally, in order to dig deeper into the association relationship and accurately represent the characteristic data of each user and user group, the characteristic data of different user groups composed of two associated users are not the same, for example, between user 1 and user 2. There is an association relationship. The characteristic data of (User 1, User 2) includes the characteristic data of User 1, the characteristic data of User 2, and the characteristic data of the relationship between User 1 and User 2, and the characteristic data of (User 2, User 1) includes The characteristic data of user 1, the characteristic data of user 2, and the characteristic data of the relationship between user 2 and user 1. Among them, the relationship feature data of user 1 relative to user 2 is different from the relationship feature data of user 2 relative to user 1. For example, if user 1 follows user 2, but user 2 does not follow user 1, then user 1 is relative The relationship feature data for user 2 is different from the relationship feature data for user 2 with respect to 1.

The relationship feature data refers to the feature data of the relative relationship between users. Optionally, the relationship feature data may include recommendation status, attention status, message status, etc., which is not limited in the embodiment of the present application. For example, as shown in Table 1, It shows the relationship feature data among the feature data of 10 user groups constructed according to the relationship diagram in FIG. 2.

Table I

用户组user group	推荐情况Recommended situation	关注情况Concerned about the situation	消息情况News situation
(用户1，用户2)(User 1, User 2)	11	11	1010
(用户2，用户1)(User 2, User 1)	00	11	2020
(用户1，用户4)(User 1, User 4)	00	00	1515
(用户4，用户1)(User 4, User 1)	11	11	3030
(用户1，用户6)(User 1, User 6)	11	00	2828
(用户6、用户1)(User 6, User 1)	11	00	1212
(用户6，用户5)(User 6, User 5)	00	11	4545
(用户5，用户6)(User 5, User 6)	11	11	2626
(用户4，用户3)(User 4, User 3)	00	11	55
(用户3，用户4)(User 3, User 4)	00	00	4040

Among them, the recommendation status indicates whether the product has been successfully recommended in history. As shown in Table 1, the recommendation status of (user1, user2) is 1, which means that user 1 has successfully recommended the product to user 2 in history, (user2, user1 ) If the recommendation status is 0, it means that user 2 has not successfully recommended the product to user 1 in the past; the attention status indicates whether or not to follow, as shown in Table 1, if the attention status of (user 1, user 4) is 0, it means that user 1 has not Follow user 4, (user 4, user 1)'s attention situation is 1, it means that user 4 follows user 1; message status indicates the number of messages sent in history, as shown in Table 1, (user 1, user 2) message situation If it is 10, it means that the number of messages sent by user 1 to user 2 in history is 10, and the message situation of (user 2, user 1) is 20, which means that the number of messages sent by user 2 to user 1 in history is 20. For the relationship feature data of other user groups, refer to the above explanation, which will not be repeated here.

The user’s respective characteristic data refers to the data generated by the user when using the application. Optionally, the user’s respective characteristic data may include user identification, user age, user gender, consumption level, and activity records. This is not limited. For example, as shown in Table 2, it shows the respective characteristic data of the users included in the 10 user groups constructed according to the relationship diagram of FIG. 2.

Table II

用户标识User ID	用户年龄User age	用户性别User gender	消费水平Consumption level		活动记录Activity record
11	24twenty four	男male	600600	11
22	1717	女 Female	120120	00
33	3535	女Female	480480	11
44	2020	男male	240240	00
55	4848	女Female	500500	00
66	23twenty three	男male	100100	11

Among them, the consumption level is used to indicate the user's consumption ability status. Optionally, the consumption level can be expressed by the user's average consumption amount. The average consumption amount can be either a daily average consumption amount or a monthly average consumption amount, or It is the average daily consumption amount during the period of participating in the activity, which is not limited in the embodiment of this application; the activity record refers to whether the user has participated in the product recommendation activity. As shown in Table 2, the activity record of user 1 is 1, which means user 1 Participated in product recommendation activities, user 2's activity record is 0, it means that user 2 has not participated in product recommendation activities. For the feature data of other users, refer to the above explanation, which will not be repeated here.

It should be noted that the embodiments of this application only take the relationship feature data including recommendation status, attention status, and message status, and the user's respective feature data including user identification, user age, user gender, consumption level, and activity records as examples. It is noted that after understanding the technical solutions of the embodiments of the present application, those skilled in the art will easily think that the relationship feature data and the user's respective feature data include other aspects, but they should all fall within the protection scope of the present application.

Step 140: Determine the influence parameter of the first user relative to the second user according to the characteristic data of the i-th user group, where the influence parameter is used to characterize the probability of the first user successfully recommending the product to the second user.

The influence parameter is used to indicate the probability of successfully recommending the product. For example, when the influence parameter of the first user relative to the second user is 0.8, it means that the probability of the first user successfully recommending the product to the second user is 0.8. The influence parameter can either be expressed in the form of numerical value or in the form of percentage, which is not limited in the embodiment of the present application. Optionally, when the influence parameter is expressed in the form of a numerical value, the value range of the influence parameter is [0,1]. Through this design, the calculation of the influence parameter by the server can be facilitated, and the processing speed of the server can be improved. Reduce the processing overhead of the server.

In a possible implementation manner, the above step 140 includes: invoking the influence calculation model, and calculating the influence parameter of the first user relative to the second user according to the characteristic data of the i-th user group; wherein, the i-th user group The characteristic data of includes: characteristic data of the first user, characteristic data of the second user, and characteristic data of the relationship between the first user and the second user. Through this design, it is possible to obtain more true and accurate prediction results of the influence parameters while facilitating the calculation of influence parameters by the server.

The influence calculation model is a model trained on historical data. Optionally, the influence calculation model can be a binary classification model, such as LR (Logistic Regression) model, neural network model, GBDT (Gradient Boosting Decision Tree, gradient The descending tree) model, etc.; the influence calculation model may also be a regression model, which is not limited in the embodiment of the present application.

Exemplarily, the training process of the influence calculation model is as follows: construct at least one training sample, each training sample includes a sample user group; obtain the feature data and labels of the training samples, and the label is used to represent the first sample in the training sample Whether the user has successfully recommended the product to the second sample of users; use the training sample to train the influence calculation model to obtain the influence calculation model that has completed the training.

Training samples refer to the samples used to train the influence calculation model. Each training sample includes a sample user group. The embodiment of this application does not limit the specific number of training samples. In practical applications, the server processing cost and influence calculation model can be combined The accuracy of these two factors are used to comprehensively determine the specific number of training samples. Optionally, after the server obtains historical data, it may determine a sample user set based on the historical data, then construct a sample relationship tree based on the sample user set, and then extract training samples from the sample relationship tree.

The label refers to the recommendation of the first sample user to the second sample user in the sample user group corresponding to the training sample. Optionally, the value of the label is 0 or 1, and the value 1 indicates that the product has been successfully recommended. A value of 0 means that the product has not been successfully recommended. For example, if the label corresponding to the sample user group (the first sample user, the second sample user) is 1, it means that the first sample user has successfully recommended the product to the second sample user ; If the label corresponding to the sample user group (the first sample user, the second sample user) is 0, it means that the first sample user has not successfully recommended a product to the second sample user.

In the embodiment of the present application, the feature data of the training sample includes the feature data of each sample user and the relationship feature data between the sample users. Based on the explanations of the characteristic data of each user and the characteristic data of the relationship between the users in the above step 130, the characteristic data of each sample user in step 140 and the explanation of the characteristic data of the relationship between the sample users are obtained here. Please refer to the above, so I won't repeat it here.

It should be noted that, in order to enable the trained influence calculation model to be used to predict influence parameters, in this embodiment of the application, the relationship feature data between sample users also includes historical influence parameters. Indicates the influence of sample users in the sample user group. For example, if the historical influence parameter corresponding to the sample user group (the first sample user, the second sample user) is 0.2, it means that the first sample user has The influence parameter of the second sample of users is 0.2.

After the server obtains the training sample and its corresponding feature data and labels, it selects a suitable model as the influence calculation model, such as a binary classification model or a regression model, and then uses the training sample to train the influence calculation model to complete The influence calculation model after training.

Step 150: Construct an influence matrix. The influence matrix is a matrix with n rows and n columns. The element in the u-th row and v-th column in the influence matrix represents the influence parameter of the user u relative to the user v.

In the embodiment of the present application, if there is an association relationship between user u and user v, the influence parameter of user u relative to user v, and the influence parameter of user v relative to user u can be calculated through the influence calculation model ; If there is no correlation between user u and user v, then the influence parameter of user u relative to user v, and the influence parameter of user v relative to user u, do not need to be calculated, and can be directly recorded as 0. The influence parameter of each user relative to itself is also recorded as 0.

Step 160: According to the influence matrix, at least one seed user is selected from n users.

From the influence matrix, the influence parameters of each of the n users relative to each other user can be obtained. The influence parameter of each user in n users relative to each other user can refer to the influence parameter of each user relative to other users that have an association relationship with the user, or it can refer to the influence parameter of each user relative to n users The influence parameter of each user in the embodiment of the present application does not limit this. Optionally, when the influence parameter of each user in the n users relative to each other user refers to the influence parameter of each user relative to each user in the n users, the user is related to the user The influence parameters of other users in the relationship are obtained through the influence calculation model. The influence parameters of the user relative to other users that are not associated with the user are 0, and the influence parameters of the user relative to the user itself are also 0. After the server determines the influence matrix, it can select at least one seed user from the n users according to a certain selection method.

Step 170: Store the user information of the seed user.

After the server determines the seed user, it can store the user information of the seed user in its own memory, or store the user information of the seed user in the memory of other computer equipment, such as other servers, terminals, etc. This application is implemented The example does not limit this.

In summary, the technical solutions provided by the embodiments of the present application determine the influence parameters between users according to the characteristic data of each user group, and then construct the influence matrix according to the influence parameters between users, and The influence matrix selects seed users, which expands a method of selecting seed users. Moreover, in the embodiment of the present application, the influence parameter is determined based on the characteristic data of the user group with the association relationship, and the association relationship between the users is considered, and the social network relationship is deeply mined, so that the prediction of the influence parameter is more comprehensive. It is accurate, and solves the technical problem that the related technology only considers the characteristic data of a single user, which is too singular to accurately determine the seed user.

In addition, in the technical solution provided by the embodiments of the present application, the influence parameter between users is determined according to the characteristic data by completing the trained influence calculation model, so that the server can calculate the influence parameter more simply. In addition, in the embodiment of the present application, the influence calculation model is obtained by training based on historical feature data, so that through the influence calculation model, the server can predict the influence parameters more truthfully and accurately, which improves the accuracy of seed users.

In an example, as shown in Fig. 3, the above selection of at least one seed user from n users according to the influence matrix includes the following steps (1041-1047):

Step 1041: Calculate the comprehensive influence parameter of each of the n users according to the influence matrix, where the comprehensive influence parameter of the user u is used to represent the comprehensive probability of the user u successfully recommending the product to each other user.

The comprehensive influence parameter refers to the comprehensive probability of the user successfully recommending the product to each user. Optionally, the comprehensive influence parameter is obtained by the accumulation of the user's influence parameters relative to other users. For example, the calculation formula of the comprehensive influence parameter is as follows Shown:

Among them, Wu represents the comprehensive influence parameter of user u, Wuj represents the influence parameter of user u relative to user j, u is a positive integer less than or equal to n, and j is a positive integer less than or equal to n.

Step 1042, build a seed user set.

In this embodiment of the application, the seed user set is initially empty.

Step 1043: From the non-seed users, select users s whose comprehensive influence parameters meet the conditions to be added to the seed user set, where the non-seed users refer to users who have not been added to the seed user set among the n users.

Eligibility for the comprehensive influence parameter may mean that the comprehensive influence parameter is the largest, or it may mean that the comprehensive influence parameter reaches a preset threshold, which is not limited in the embodiment of the present application. It should be noted that the introduction and description of the following embodiments only use the maximum comprehensive influence parameter as the comprehensive influence parameter to meet the conditions. Those skilled in the art will easily think of others after understanding the technical solution of this application. The technical solutions, such as the comprehensive influence parameter reaching the preset threshold value as the qualified embodiment of the comprehensive influence parameter, should all fall within the protection scope of this application.

When no seed users are selected, the number of non-seed users is the number of users in the user set, that is, the number of non-seed users is n. After the server determines the non-seed users and their corresponding comprehensive influence parameters, it selects the user with the largest comprehensive influence parameter from the non-seed users to join the seed user set. Optionally, the formula for selecting the user with the largest comprehensive influence parameter from non-seed users is as follows:

Among them, the set U represents the user set, the set S represents the seed user set, the user s represents the user with the largest comprehensive influence parameter selected from the non-seed users, and s is a positive integer less than or equal to n.

Step 1044: Subtract the value of the s-th row element from the value of each row element in the influence matrix to obtain an updated influence matrix, where the s-th row element includes the influence parameter of the user s relative to each other user.

The influence parameter of the user s relative to each user selected according to step 1043 corresponds to the value of the s-th row element in the influence matrix. After subtracting the s-th row element from the value of each row element in the influence matrix, the updated influence matrix can be obtained. For example, the update formula of the influence matrix is as follows:

W _nj ←W _nj -W _sj ;

Among them, Wnj represents the element in the nth row and jth column in the influence matrix, and Wsj represents the element in the sth row and jth column in the influence matrix. In the embodiment of the present application, the server determines the difference between Wnj and Wsj as the updated Wnj.

Exemplarily, in order to facilitate the calculation of the server and reduce the processing overhead of the server, the value range of the aforementioned influence parameter is [0, 1]; the aforementioned value of each row element in the influence matrix is subtracted from the value of the s-th row element , To get the updated influence matrix, including: subtract the value of the element in the sth row from the value of the element in the n rows of the influence matrix to obtain the calculated value of the n rows of elements; for the calculated n rows of elements For target elements whose median value is less than zero, modify the value of the target element to 0 to obtain the updated influence matrix.

After the server subtracts the value of the element in the n row of the influence matrix from the value of the element in the sth row, it can compare the calculated value of the n row element with 0. If any value is less than 0, the value is modified to 0 , To ensure that the values of the n rows of elements in the updated influence matrix are all greater than 0. For example, the update formula of the influence matrix is as follows:

W _nj ←max(W _nj -W _sj , 0).

Step 1045: Determine whether the stop condition selected by the seed user is satisfied;

Step 1046: If the stop condition selected by the seed user is not met, based on the updated influence matrix, the execution starts again from the step of calculating the comprehensive influence parameter of each of the n users according to the influence matrix.

If the stop condition selected by the seed user is not met, the server continuously repeats steps 1042 to 1044 based on the updated influence matrix. In this embodiment of the application, the stop condition may be a condition preset by the server. Optionally, the stop condition The number of elements in the seed user set may reach a preset threshold, such as 10, or the number of loop executions may reach a preset number of times, such as 5 times, which is not limited in the embodiment of the present application.

Step 1047: If the stop condition selected by the seed user is satisfied, the user in the seed user set is determined as the seed user.

In summary, the technical solution provided by the embodiments of the present application constructs an influence matrix and a set of seed users, and sets the stop condition for seed user selection. When the stop condition for seed user selection is not met, it is continuously based on the updated The influence matrix, select the user with the largest comprehensive influence parameter from the user set to add to the seed user set, thereby avoiding the possibility of excessive recommendation to a single user and wasting seed user resources, so as to achieve the purpose of selecting seed users reasonably .

In another example, as shown in FIG. 4, the above-mentioned selecting at least one seed user from n users according to the influence matrix includes the following steps (104A-104B):

Step 104A, according to the influence matrix, calculate the comprehensive influence parameter and comprehensive influence parameter of each of the n users, where the comprehensive influence parameter of user u is used to characterize that user u successfully recommends the product to each other user The comprehensive probability of user u is used to characterize the comprehensive probability of each other user recommending the product to user u successfully.

Exemplarily, the above step 104A includes: for the user u among the n users, obtaining the influence parameters of the user u relative to each other user; and summing the influence parameters of the user u relative to each other user to obtain the user u The comprehensive influence parameter. For example, the calculation formula of the comprehensive influence parameter is as follows:

Among them, Wun represents the comprehensive influence parameter of user u, Wuj represents the influence parameter of user u relative to user j, u is a positive integer less than or equal to n, and j is a positive integer less than or equal to n.

Exemplarily, the above step 104A includes: for the user u among the n users, obtaining the influence parameters of each other user relative to the user u; and calculating the maximum or average value of the influence parameters of each other user relative to the user u, Determined as the comprehensive influence parameter of user u. For example, the calculation formula for the comprehensive affected force parameter is as follows:

or

Among them, Wnu represents the comprehensive influence parameter of user u, Wiu represents the influence parameter of user i relative to user u, u is a positive integer less than or equal to n, and i is a positive integer less than or equal to n.

In step 104B, at least one seed user is selected from the n users according to the comprehensive influence parameter and the comprehensive influence parameter of each of the n users.

Illustratively, the aforementioned influence matrix W is as follows:

W _pq ∈[0,1];

Among them, Wpq is the element in the p-th row and q-th column of the influence matrix W, and Wpq represents the influence parameter of user p relative to user q, p is a positive integer less than or equal to n, and q is less than or equal to n Positive integer.

Step 104B includes the following steps:

(1) Define a column vector x whose element value is 0 or 1, and the sum of all elements is K:

x _R ∈{0,1};

Among them, if the value of xR is 1, it means that user R is selected as a seed user, and N is a positive integer less than or equal to n.

(3) Define the column vector e:

(4) Define the column vector r:

or

In the embodiment of the present application, it is determined that the element in the column vector r represents a reasonable influence parameter of the user.

(5) Calculate the column vector x based on the following formula:

among them,

Indicates that the variable to be solved is x, and the value of x is such that

The following expression takes the maximum value, λ is a real number greater than or equal to 0, ||W′xr|| represents the Euclidean norm, and W′xr represents the actual parameter of the user's influence and the reasonable influence The set of parameter differences, ||W'xr|| ² represents the sum of squares of the elements in the set.

In summary, the technical solution provided by the embodiments of the present application constructs an influence matrix, and calculates the comprehensive influence parameter and the comprehensive influenced parameter of each user according to the influence matrix, and then calculates the comprehensive influence parameter according to the comprehensive influence parameter. And integrated the parameters of the influence, select seed users from the user set, so as to realize the comprehensive consideration of the influence of the seed users, rationally select the seed users, and avoid the selected seed users to over-market or under-market a single user.

Please refer to FIG. 5, which shows a flowchart of a method for selecting a seed user according to another embodiment of the present application. The method can include the following steps (501～509):

Step 501, construct a training sample;

Step 502: Obtain training samples and their corresponding feature data and labels;

Step 503, adopt appropriate model training to obtain an influence calculation model;

Step 504, construct a user group;

Step 505: Obtain characteristic data corresponding to the user group;

Step 506, input the feature data into the influence calculation model;

Step 507: Calculate the influence parameters of each user;

Step 508: Select seed users according to the greedy algorithm, which is the selection method described in steps 1041 to 1046;

Step 509: Select seed users according to the optimization algorithm, which is the selection method described in step 104A to step 104B.

The following are device embodiments of this application, which can be used to implement the method embodiments of this application. For details that are not disclosed in the device embodiments of this application, please refer to the method embodiments of this application.

Please refer to FIG. 6, which shows a block diagram of an apparatus for selecting a seed user according to an embodiment of the present application. The device has the function of realizing the above method example, and the function can be realized by hardware, or by hardware executing corresponding software. The device can be the computer equipment introduced above, or can be set in the computer equipment. The device 700 may include: a collection data acquisition module 710, a user group creation module 720, a characteristic data acquisition module 730, an influence parameter determination module 740, a matrix construction module 750, a seed user selection module 760, and an information storage module 770.

The collection data obtaining module 710 is configured to obtain user collection data, where the user collection data includes n users, and the n is an integer greater than 1;

The user group creation module 720 is configured to create m user groups according to the user collection data, each of the user groups includes two users having an association relationship, and the m is a positive integer;

The characteristic data obtaining module 730 is configured to obtain characteristic data of the i-th user group for the i-th user group in the m user groups, wherein the i-th user group includes the i-th user group having an association relationship. A user and a second user;

The influence parameter determination module 740 is configured to determine the influence parameter of the first user relative to the second user according to the characteristic data of the i-th user group, and the influence parameter is used to characterize the first user. Probability of a user successfully recommending a product to the second user;

The matrix construction module 750 is configured to construct an influence matrix, which is a matrix of n rows and n columns, wherein the element in the u-th row and v-th column in the influence matrix represents the relationship between user u and user v Influence parameters;

The seed user selection module 760 is configured to select at least one seed user from the n users according to the influence matrix;

The information storage module 770 is used to store the user information of the seed user.

Optionally, as shown in FIG. 7, the seed user selection module 760 includes: a comprehensive influence parameter calculation sub-module 761, configured to calculate the comprehensive influence of each of the n users according to the influence matrix Power parameter, wherein the comprehensive influence parameter of the user u is used to characterize the comprehensive probability that the user u successfully recommends the product to each other user; the seed user set construction sub-module 762 is used to construct the seed user set, the The seed user set is initially empty; the user selection submodule 763 is used to select from non-seed users, users who meet the conditions of the comprehensive influence parameter and add them to the seed user set, where the non-seed users refer to Users who are not added to the seed user set among the n users; a matrix update sub-module 764 for subtracting the value of each row element in the influence matrix from the value of the s-th row element to obtain an update The following influence matrix, wherein the s-th row element includes the influence parameter of the user s relative to each other user; the loop sub-module 765, if the stop condition selected by the seed user is not met, then It is used to start execution again from the step of calculating the comprehensive influence parameter of each of the n users according to the influence matrix based on the updated influence matrix; the seed user determination submodule 766, if If the stop condition selected by the seed user is met, it is used to determine the user in the seed user set as the seed user.

Optionally, as shown in FIG. 7, the value range of the influence parameter is [0, 1]; the matrix update sub-module 764 is configured to: , Respectively subtract the value of the element in the sth row to obtain the value of the element in the n rows after calculation; for the target element whose value is less than zero among the elements in the n rows after the calculation, modify the value of the target element to 0 , Get the updated influence matrix.

Optionally, as shown in FIG. 7, the seed user selection module 760 includes: a comprehensive parameter calculation sub-module 767, configured to calculate the comprehensive influence parameter of each of the n users according to the influence matrix And the integrated influential parameter, where the integrated influential parameter of user u is used to characterize the overall probability of the user u successfully recommending the product to each other user, and the integrated influential parameter of user u is used to characterize each other The comprehensive probability that the user successfully recommends the product to the user u; the seed user selection sub-module 768 is used to select the comprehensive influence parameters and comprehensive influence parameters of each of the n users from the n At least one of the seed users is selected from the users.

Optionally, as shown in FIG. 7, the comprehensive parameter calculation submodule 767 is configured to: for the user u among the n users, obtain the influence parameter of the user u relative to each other user; Summing the influence parameters of the user u relative to each other user obtains the comprehensive influence parameters of the user u.

Optionally, as shown in FIG. 7, the comprehensive parameter calculation submodule 767 is configured to: for the user u among the n users, obtain the influence parameters of each other user relative to the user u; The maximum or average value of the influence parameters of each other user relative to the user u is determined as the comprehensive influence parameter of the user u.

Optionally, the influence matrix W is as follows:

W _pq ∈[0,1];

Wherein, Wpq is the element in the p-th row and q-th column in the influence matrix W, and the Wpq represents the influence parameter of the user p relative to the user q, and the p is a positive integer less than or equal to the n , Said q is a positive integer less than or equal to said n;

As shown in Figure 7, the seed user selection submodule 767 is used to:

Define a column vector x whose element value is 0 or 1, and the sum of all elements is K:

x _R ∈{0,1};

Wherein, if the value of xR is 1, it means that the user R is selected as the seed user, and the R is a positive integer less than or equal to the n;

Define the column vector e:

Define the column vector r:

or

The column vector x is calculated based on the following formula:

among them,

Indicates that the variable to be solved is x, and the value of x is such that

The following expression takes the maximum value, and λ is a real number greater than or equal to 0.

Optionally, the influence parameter determination module 740 is configured to: call an influence calculation model, and calculate the influence parameter of the first user relative to the second user according to the characteristic data of the i-th user group Wherein, the characteristic data of the i-th user group includes: characteristic data of the first user, characteristic data of the second user, and characteristic data of the relationship between the first user and the second user.

Optionally, the training process of the influence calculation model is as follows: at least one training sample is constructed, each of the training samples includes a sample user group; the feature data and labels of the training samples are obtained, and the labels are used to characterize all the training samples. Whether the first sample user in the training sample has successfully recommended a product to the second sample user; the training sample is used to train the influence calculation model to obtain the influence calculation model that has completed the training.

It should be noted that the device provided in the above embodiment, when implementing its functions, only uses the division of the above functional modules as an example. In actual applications, the above functions can be allocated by different functional modules as needed, i.e. The internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the device and method embodiments provided in the above-mentioned embodiments belong to the same conception, and the specific implementation process is detailed in the method embodiments, which will not be repeated here.

Please refer to FIG. 8, which shows a structural block diagram of a computer device provided by an embodiment of the present application. The computer device can be used to implement the seed user selection method provided in the above-mentioned embodiment. For example, the computer device may be the server described above. Specifically:

The computer device 800 includes a processing unit (such as a CPU (Central Processing Unit, central processing unit), GPU (Graphics Processing Unit, graphics processor), and FPGA (Field Programmable Gate Array, field programmable logic gate array), etc.) 801, including The system memory 804 of RAM (Random-Access Memory) 802 and ROM (Read-Only Memory) 803, and the system bus 805 connecting the system memory 804 and the central processing unit 801. The computer device 800 also includes a basic input/output system (I/O system) 806 that helps to transfer information between various devices in the computer device, and a large capacity for storing the operating system 813, application programs 814, and other program modules 815. Storage device 807.

The basic input/output system 806 includes a display 808 for displaying information and an input device 809 such as a mouse and a keyboard for the user to input information. Wherein, the display 808 and the input device 809 are both connected to the central processing unit 801 through the input and output controller 810 connected to the system bus 805. The basic input/output system 806 may also include an input and output controller 810 for receiving and processing input from multiple other devices such as a keyboard, a mouse, or an electronic stylus. Similarly, the input and output controller 810 also provides output to a display screen, a printer, or other types of output devices.

The mass storage device 807 is connected to the central processing unit 801 through a mass storage controller (not shown) connected to the system bus 805. The mass storage device 807 and its associated computer-readable medium provide non-volatile storage for the computer device 800. That is, the mass storage device 807 may include a computer-readable medium (not shown) such as a hard disk or a CD-ROM (Compact Disc Read-Only Memory) drive.

Without loss of generality, the computer-readable media may include computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storing information such as computer readable instructions, data structures, program modules or other data. Computer storage media include RAM, ROM, EPROM (Erasable Programmable Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), flash memory or Other solid-state storage technologies, such as CD-ROM, DVD (Digital Video Disc, high-density digital video disc) or other optical storage, tape cartridges, magnetic tape, disk storage or other magnetic storage devices. Of course, those skilled in the art can know that the computer storage medium is not limited to the above-mentioned types. The aforementioned system memory 704 and mass storage device 807 may be collectively referred to as a memory.

According to the embodiment of the present application, the computer device 800 may also be connected to a remote computer on the network through a network such as the Internet to run. That is, the computer device 800 can be connected to the network 812 through the network interface unit 811 connected to the system bus 805, or in other words, the network interface unit 811 can also be used to connect to other types of networks or remote computer systems (not shown) .

The memory also includes a computer program, which is stored in the memory and configured to be executed by one or more processors, so as to implement the above-mentioned method for selecting a seed user.

In an exemplary embodiment, a non-transitory computer-readable storage medium is also provided, on which a computer program is stored, and the computer program is executed by a processor to implement the above-mentioned method for selecting a seed user.

Optionally, the computer-readable storage medium may include: ROM (Read-Only Memory), RAM (Random-Access Memory, random access memory), SSD (Solid State Drives, solid state hard disk), or optical disk. Among them, random access memory may include ReRAM (Resistance Random Access Memory) and DRAM (Dynamic Random Access Memory).

In an exemplary embodiment, a computer program product is also provided, which is used to implement the above-mentioned seed user selection method when the computer program product is executed by a processor.

It should be understood that the "plurality" mentioned herein refers to two or more. "And/or" describes the association relationship of the associated object, indicating that there can be three types of relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, and B exists alone. The character "/" generally indicates that the associated objects before and after are in an "or" relationship. In addition, the numbering of the steps described in this article only exemplarily shows a possible order of execution among the steps. In some other embodiments, the above steps may also be executed out of the order of the numbers, such as two different numbers. The steps are executed at the same time, or the two steps with different numbers are executed in the reverse order from the figure, which is not limited in the embodiment of the present application.

The above are only exemplary embodiments of this application and are not intended to limit this application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included in the protection of this application. Within range.

Claims

A method for selecting seed users, the method includes:

Acquiring user collection data, where the user collection data includes n users, where n is an integer greater than 1;

Creating m user groups according to the user collection data, each of the user groups includes two users having an association relationship, and the m is a positive integer;

For the i-th user group in the m user groups, acquiring characteristic data of the i-th user group, where the i-th user group includes a first user and a second user that have an association relationship;

According to the characteristic data of the i-th user group, determine the influence parameter of the first user relative to the second user, and the influence parameter is used to characterize that the first user successfully recommends the product to the Probability of the second user;

Constructing an influence matrix, the influence matrix being a matrix of n rows and n columns, wherein the element in the u th row and the v column in the influence matrix represents the influence parameter of the user u relative to the user v;

Selecting at least one seed user from the n users according to the influence matrix;

Store the user information of the seed user.
The method according to claim 1, wherein the selecting at least one seed user from the n users according to the influence matrix comprises:

According to the influence matrix, the comprehensive influence parameter of each of the n users is calculated, where the comprehensive influence parameter of the user u is used to characterize the success of the user u recommending the product to each other user Comprehensive probability

Constructing a set of seed users, the set of seed users is initially empty;

From the non-seed users, select the user s that meets the conditions of the comprehensive influence parameter to be added to the seed user set, where the non-seed user refers to the n users that have not been added to the seed user set Users in

The value of each row element in the influence matrix is subtracted from the value of the sth row element to obtain the updated influence matrix, where the sth row element includes the user s relative to each other user Influence parameters;

If the stop condition selected by the seed user is not satisfied, based on the updated influence matrix, the step of calculating the comprehensive influence parameter of each of the n users from the influence matrix again Begin execution;

If the stop condition selected by the seed user is satisfied, the user in the seed user set is determined as the seed user.
The method according to claim 2, wherein the value range of the influence parameter is [0, 1];

The subtracting the value of the s-th row element from the value of each row element in the influence matrix to obtain the updated influence matrix includes:

Subtracting the value of the element in the sth row from the value of the element in the n rows in the influence matrix to obtain the value of the element in the n row after calculation;

For the target element whose value is less than zero among the calculated n rows of elements, the value of the target element is modified to 0 to obtain the updated influence matrix.
The method according to claim 1, wherein the selecting at least one seed user from the n users according to the influence matrix comprises:

According to the influence matrix, calculate the comprehensive influence parameter and the comprehensive influenced parameter of each of the n users, where the comprehensive influence parameter of user u is used to characterize that the user u successfully recommends the product to The comprehensive probability of each other user, where the comprehensive influence parameter of the user u is used to characterize the comprehensive probability of each other user successfully recommending the product to the user u;

According to the comprehensive influence parameter and the comprehensive influence parameter of each of the n users, at least one seed user is selected from the n users.
The method according to claim 4, the calculating the comprehensive influence parameter of each of the n users includes:

For the user u among the n users, acquiring the influence parameter of the user u relative to each other user;

Summing the influence parameters of the user u relative to each other user obtains the comprehensive influence parameters of the user u.
The method according to claim 4, the calculating the comprehensive influence parameter of each of the n users includes:

For the user u among the n users, obtain the influence parameters of each other user relative to the user u;

The maximum or average value of the influence parameters of each other user relative to the user u is determined as the comprehensive influence parameter of the user u.
According to the method of claim 4, the influence matrix W is as follows:

Wherein, Wpq is the element in the p-th row and q-th column in the influence matrix W, and the Wpq represents the influence parameter of the user p relative to the user q, and the p is a positive integer less than or equal to the n , Said q is a positive integer less than or equal to said n;

The selecting at least one seed user from the n users according to the comprehensive influence parameter and the comprehensive influence parameter of each of the n users includes:

Define a column vector x whose element value is 0 or 1, and the sum of all elements is K:

Wherein, if the value of xR is 1, it means that the user R is selected as the seed user, and the R is a positive integer less than or equal to the n;

Define the column vector e:

Define the column vector r:

or

The column vector x is calculated based on the following formula:

among them,
Indicates that the variable to be solved is x, and the value of x is such that
The following expression takes the maximum value, and λ is a real number greater than or equal to 0.
The method according to any one of claims 1 to 7, wherein the determining the influence parameter of the first user relative to the second user according to the characteristic data of the i-th user group comprises:

Call an influence calculation model, and calculate the influence parameter of the first user relative to the second user according to the characteristic data of the i-th user group;

Wherein, the characteristic data of the i-th user group includes: characteristic data of the first user, characteristic data of the second user, and characteristic data of the relationship between the first user and the second user.
According to the method of claim 8, the training process of the influence calculation model is as follows:

Constructing at least one training sample, each of the training samples includes a sample user group;

Acquiring feature data and labels of the training samples, where the labels are used to characterize whether the first sample user in the training sample has successfully recommended a product to the second sample user;

The training sample is used to train the influence calculation model to obtain the influence calculation model that has completed the training.
A device for selecting seed users, the device comprising:

The collection data acquisition module is configured to acquire user collection data, where the user collection data includes n users, and the n is an integer greater than 1;

A user group creation module, configured to create m user groups according to the user collection data, each of the user groups includes two users having an association relationship, and the m is a positive integer;

The characteristic data acquisition module is configured to acquire characteristic data of the i-th user group for the i-th user group in the m user groups, wherein the i-th user group includes the first user group having an association relationship. User and second user;

The influence parameter determination module is configured to determine the influence parameter of the first user relative to the second user according to the characteristic data of the i-th user group, and the influence parameter is used to characterize the first user. The probability that the user successfully recommends the product to the second user;

The matrix construction module is used to construct an influence matrix, which is a matrix with n rows and n columns, wherein the element in the u-th row and v-th column in the influence matrix represents the influence of user u relative to user v Force parameter

A seed user selection module, configured to select at least one seed user from the n users according to the influence matrix;

The information storage module is used to store the user information of the seed user.
A computer device, the computer device comprising a processor and a memory, and a computer program is stored in the memory, and the computer program is loaded and executed by the processor to implement the method according to any one of claims 1 to 9 How to select seed users.
A non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method for selecting a seed user according to any one of claims 1 to 9 is realized.