CN108647990B

CN108647990B - Method and device for determining target user and electronic equipment

Info

Publication number: CN108647990B
Application number: CN201810297028.4A
Authority: CN
Inventors: 吴健君; 张鹏飞
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2018-04-04
Filing date: 2018-04-04
Publication date: 2022-06-03
Anticipated expiration: 2038-04-04
Also published as: CN108647990A

Abstract

The embodiment of the invention provides a method, a device and electronic equipment for determining a target user, wherein the method comprises the following steps: for each feature in the first feature set, calculating a first ratio of a seed user sample with the feature and a second ratio of a non-seed user sample with the feature; generating a negative sample set according to the magnitude relation of the first proportion and the second proportion of each feature and a plurality of non-seed user samples; and finally, selecting the non-seed users corresponding to the first non-seed user samples in the number of the target users as the target users according to the sequence of the sample values from large to small. Therefore, the appropriate target user can be determined according to fewer seed users provided by the advertiser.

Description

Method and device for determining target user and electronic equipment

Technical Field

The present invention relates to the technical field of advertisement delivery, and in particular, to a method, an apparatus, and an electronic device for determining a target user.

Background

At present, advertisements are put in websites, as a business model, the advertisements are already used by all large internet companies, all the large internet companies have their advertisement putting platforms, advertisers can submit their advertisement demands through the advertisement putting platforms, then the advertisement putting platforms can find out target users according to the advertisement demands of the advertisers, and then the advertisements are put in the target users.

Specifically, when an advertiser issues an advertisement demand to an advertisement platform, the advertiser provides a seed user, and the advertisement platform searches for a target user meeting the advertisement demand through the seed user, and then puts an advertisement corresponding to the advertisement demand to the target user.

However, the inventor finds that the prior art has at least the following problems in the process of implementing the invention: when the number of seed users provided by the advertiser is small, the proper target users cannot be determined by the prior art.

Disclosure of Invention

The embodiment of the invention aims to provide a method, a device and electronic equipment for determining a target user, so as to determine a proper target user according to fewer seed users provided by an advertiser. The specific technical scheme is as follows:

In one aspect of the embodiments of the present invention, a method for determining a target user is provided, where the method includes:

acquiring a first feature set, a plurality of seed user samples and a plurality of non-seed user samples;

for each feature in the first set of features, calculating a first fraction of seed user samples having the feature in the plurality of seed user samples and a second fraction of non-seed user samples having the feature in the plurality of non-seed user samples;

generating a second feature set or a third feature set according to the magnitude relation of the first proportion and the second proportion of each feature; selecting a first non-seed user sample from a plurality of non-seed user samples according to the second feature set or the third feature set to generate a negative sample set;

obtaining a plurality of seed user samples, taking the plurality of seed user samples as a positive sample set, obtaining a first sample label of the positive sample set, a first feature vector of each seed user sample in the positive sample set, a second sample label of the negative sample set and a second feature vector of each non-seed user sample in the negative sample set, and training a preset logistic regression model to obtain a trained logistic regression model;

Acquiring a third feature vector of each non-seed user sample in the plurality of non-seed user samples, and calculating the sample value of each non-seed user sample in the plurality of non-seed user samples according to the third feature vector and the trained logistic regression model;

the method comprises the steps of obtaining the number of target users, selecting a first non-seed user sample meeting the number of the target users from a plurality of non-seed user samples according to the sequence of sample values from large to small, and taking a non-seed user corresponding to the first non-seed user sample as a target user.

Optionally, before obtaining the first feature set, the plurality of seed user samples, and the plurality of non-seed user samples, a method for determining a target user according to an embodiment of the present invention further includes:

the method comprises the steps of obtaining first features of a plurality of seed user samples and second features of a plurality of non-seed user samples, and establishing a first feature set according to the first features and the second features, wherein all the features in the first feature set are not repeated.

Optionally, after obtaining the first feature set, the plurality of seed user samples, and the plurality of non-seed user samples, the method for determining the target user according to the embodiment of the present invention further includes:

Coding each feature in the first feature set to obtain a coded first feature set;

correspondingly, for each feature in the first feature set, calculating a first fraction of seed user samples having the feature in the plurality of seed user samples and a second fraction of non-seed user samples having the feature in the plurality of non-seed user samples, includes:

for each feature in the encoded first set of features, a first fraction of seed user samples having the feature in the plurality of seed user samples and a second fraction of non-seed user samples having the feature in the plurality of non-seed user samples are calculated.

Optionally, a second feature set or a third feature set is generated according to a magnitude relationship between the first proportion and the second proportion of each feature; and according to the second feature set or the third feature set, selecting a first non-seed user sample from a plurality of non-seed user samples to generate a negative sample set, comprising:

for each feature in the first feature set, when the first proportion of the feature is smaller than the second proportion, adding the feature into the second feature set to obtain a second feature set added with a plurality of features;

And acquiring a plurality of features of the plurality of non-seed user samples, selecting a third feature in the second feature set from the plurality of features, selecting a non-seed user sample corresponding to the third feature from the plurality of non-seed user samples, and generating a negative sample set.

for each feature in the first feature set, when the first proportion is larger than the second proportion, adding the feature into a third feature set to obtain a third feature set added with a plurality of features;

and obtaining a plurality of characteristics of a plurality of non-seed user samples, selecting a fourth characteristic which does not exist in the third characteristic set from the plurality of characteristics, selecting a non-seed user sample corresponding to the fourth characteristic from the plurality of non-seed user samples, and generating a negative sample set.

In another aspect of the embodiments of the present invention, an apparatus for determining a target user is further provided, where the apparatus includes:

The system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring a first feature set, a plurality of seed user samples and a plurality of non-seed user samples;

a proportion calculation module for calculating, for each feature in the first feature set, a first proportion of the seed user sample having the feature in the plurality of seed user samples and a second proportion of the non-seed user sample having the feature in the plurality of non-seed user samples;

the negative sample set generating module is used for generating a second feature set or a third feature set according to the magnitude relation of the first proportion and the second proportion of each feature; selecting a first non-seed user sample from a plurality of non-seed user samples according to the second feature set or the third feature set to generate a negative sample set;

the training module is used for obtaining a plurality of seed user samples, taking the plurality of seed user samples as a positive sample set, obtaining a first sample label of the positive sample set, a first feature vector of each seed user sample in the positive sample set, a second sample label of the negative sample set and a second feature vector of each non-seed user sample in the negative sample set, and training a preset logistic regression model to obtain a trained logistic regression model;

The sample value calculation module is used for acquiring a third feature vector of each non-seed user sample in the non-seed user samples and calculating the sample value of each non-seed user sample in the non-seed user samples according to the third feature vector and the trained logistic regression model;

and the target user selection module is used for acquiring the number of target users, selecting a first non-seed user sample meeting the number of the target users from a plurality of non-seed user samples according to the sequence of sample values from large to small, and taking the non-seed user corresponding to the first non-seed user sample as the target user.

Optionally, an apparatus for determining a target user according to an embodiment of the present invention further includes:

the first feature set establishing module is used for acquiring first features of a plurality of seed user samples and second features of a plurality of non-seed user samples, and establishing a first feature set according to the first features and the second features, wherein the features in the first feature set are not repeated.

the encoding module is used for encoding each feature in the first feature set to obtain an encoded first feature set;

Correspondingly, the proportion calculation module is specifically configured to:

Optionally, the negative sample set generating module includes:

the second feature set generation submodule is used for adding each feature in the first feature set to the second feature set when the first proportion of the feature is smaller than the second proportion, so that a second feature set added with a plurality of features is obtained;

and the first negative sample set generation submodule is used for acquiring a plurality of features of a plurality of non-seed user samples, selecting a third feature existing in the second feature set from the plurality of features, and selecting a non-seed user sample corresponding to the third feature from the plurality of non-seed user samples to generate a negative sample set.

Optionally, the negative sample set generating module further includes:

a third feature set generation submodule, configured to, for each feature in the first feature set, add the feature to a third feature set when the first proportion is greater than the second proportion, to obtain a third feature set to which multiple features are added;

And the second negative sample set generation submodule is used for acquiring a plurality of features of a plurality of non-seed user samples, selecting a fourth feature which does not exist in the third feature set from the plurality of features, and selecting a non-seed user sample corresponding to the fourth feature from the plurality of non-seed user samples to generate a negative sample set.

In another aspect of the present invention, an embodiment of the present invention further provides an electronic device, where the electronic device includes: the system comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete mutual communication through the communication bus;

a memory for storing a computer program;

a processor for implementing any of the above methods of identifying a target user when executing a program stored in the memory.

In yet another aspect of the present invention, the present invention also provides a computer-readable storage medium, which stores instructions that, when executed on a computer, cause the computer to execute any one of the above-mentioned methods for determining a target user.

In yet another aspect of the present invention, the present invention also provides a computer program product containing instructions, which when run on a computer, causes the computer to execute any one of the above-mentioned methods for determining a target user.

After a first feature set, a plurality of seed user samples and a plurality of non-seed user samples are obtained, for each feature in the first feature set, calculating a first proportion of the seed user sample with the feature in the plurality of seed user samples and a second proportion of the non-seed user sample with the feature in the plurality of non-seed user samples, and then generating a second feature set or a third feature set for generating a negative sample set according to a magnitude relation between the first proportion and the second proportion of each feature and generating the negative sample set; generating a negative sample set according to the magnitude relation between the first proportion and the second proportion, so that a preset logistic regression model can be trained by adopting the negative sample set and the positive sample set, and calculating the sample value of each non-seed user sample in a plurality of non-seed user samples through the third feature vector of each non-seed user sample in the plurality of non-seed user samples and the trained logistic regression model after the trained logistic regression model is obtained; the larger the sample value is, the more likely it is to become the target user, therefore, in the plurality of non-seed user samples, the first non-seed user samples corresponding to the number of the target users can be selected according to the sequence from the larger sample value to the smaller sample value, and the non-seed users corresponding to the first non-seed user samples can be taken as the target users, so that the determination of the appropriate target users can be realized according to the fewer seed users provided by the advertiser. Of course, it is not necessary for any product or method of practicing the invention to achieve all of the above-described advantages at the same time.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below.

Fig. 1 is a flowchart of a first implementation of a method for determining a target user according to an embodiment of the present invention;

fig. 2 is a flowchart of a second implementation manner of a method for determining a target user according to an embodiment of the present invention;

fig. 3 is a flowchart of a third implementation manner of a method for determining a target user according to an embodiment of the present invention;

fig. 4 is a flowchart of a fourth implementation manner of a method for determining a target user according to an embodiment of the present invention;

fig. 5 is a flowchart of a fifth implementation manner of a method for determining a target user according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an apparatus for determining a target user according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.

In order to solve the problems in the prior art, embodiments of the present invention provide a method, an apparatus, and an electronic device for determining a target user, so as to determine a suitable target user according to fewer seed users provided by an advertiser. And then the advertisement corresponding to the advertisement demand is delivered to the target user, and the advertisement delivery effect is improved.

Next, a method for determining a target user according to an embodiment of the present invention is first described, as shown in fig. 1, which is a flowchart of a first implementation manner of the method for determining a target user according to an embodiment of the present invention, and in fig. 1, the method may include:

s110, a first feature set, a plurality of seed user samples and a plurality of non-seed user samples are obtained.

The first feature set may be a pre-established feature set storing a plurality of features, where the features may include: user age, user gender, city to which the user belongs, user viewing preferences, etc. The pre-established feature set may be a feature set established by features obtained by feature analysis of a historical user watching a movie.

In some examples, when an advertiser sends an advertisement demand to an advertisement delivery platform, a seed user sample may be sent to the advertisement delivery platform at the same time, and after receiving the seed user sample sent by the advertiser, the advertisement delivery platform may trigger a target user determination device that applies the method for determining a target user according to the embodiment of the present invention, and the target user determination device may obtain a plurality of seed user samples from the advertisement delivery platform. Each seed user sample may include identification information of the seed user, characteristics of the seed user, a feature vector of the seed user, and the like.

In some examples, the advertisement delivery platform may pre-establish a historical user database, where identification information of historical users, feature information of historical users, and the like may be stored in the historical user database. The target user determination means may obtain a plurality of non-seed user samples from the historical user database.

In some examples, the advertisement delivery platform may determine whether the number of seed users in the seed user information is smaller than a preset seed user threshold value after receiving the seed user information provided by the advertiser, and trigger the target user determination device when determining that the number of seed users in the seed user information is smaller than the preset seed user threshold value.

In some examples, when an advertiser sends an advertisement demand to an advertisement delivery platform, the advertiser may also send identification information of a seed user to the advertisement delivery platform at the same time, after receiving the identification information of the seed user, the advertisement delivery platform queries a historical user corresponding to the identification information in a pre-established historical user database, takes the historical user corresponding to the identification information as a seed user sample, and sends the historical users except the historical user corresponding to the identification information in the historical user database as non-seed user samples to the target user determination device, so that the target user determination device may obtain a plurality of seed user samples and a plurality of non-seed user samples.

In a possible implementation manner, the target user determination device may be disposed inside the advertisement delivery platform, or may be disposed separately from the advertisement delivery platform.

And S120, calculating a first proportion of the seed user sample with the characteristic in the plurality of seed user samples and a second proportion of the non-seed user sample with the characteristic in the plurality of non-seed user samples for each characteristic in the first characteristic set.

Specifically, after obtaining the first feature set, the plurality of seed user samples, and the plurality of non-seed user samples, the target user determination apparatus may calculate, for each feature in the first feature set, a first percentage of the seed user sample having the feature in the plurality of seed user samples and a second percentage of the non-seed user sample having the feature in the plurality of non-seed user samples.

For example, assume that the first feature set obtained by the target user determination device is: { age: 38. 40, 45, 47, 50, gender: male, female, the city of which: beijing, Guangzhou, Shanghai and Tianjin }. The obtained multiple seed user samples are: the method comprises the following steps of obtaining a user sample 1, a user sample 2, a user sample 3 and a user sample 4, wherein the obtained multiple non-seed user samples are as follows: user sample 5, user sample 6, user sample 7, user sample 8, user sample 9, and user sample 10.

Wherein, the user sample 1 is characterized by: 38. the characteristics of the user sample 2 are 40, female and Guangzhou, the characteristics of the user sample 3 are 45, male and Shanghai, the characteristics of the user sample 4 are 47, female and Tianjin, the characteristics of the user sample 5 are 50, male and Beijing, the characteristics of the user sample 6 are 47, male and Guangzhou, the characteristics of the user sample 7 are 40, female and Tianjin, the characteristics of the user sample 8 are 50, male and Shanghai, the characteristics of the user sample 9 are 38, female and Beijing, and the characteristics of the user sample 10 are 45, male and Tianjin.

The target user determination means as described above may count the seed user sample with the feature "38" as user sample 1 and the non-seed user sample with the feature "38" as user sample 9. Thus, it can be calculated that the first percentage of the user sample 1 with the feature "38" in the 4 seed user samples is 25%, and the second percentage of the user sample 9 with the feature "38" in the 6 non-seed user samples is 16.7%. By analogy, the first percentage of the seed user sample with the feature "male" in the 4 seed user samples was 50%, and the second percentage of the non-seed user sample with the feature "male" in the 6 non-seed user samples was 66.7%.

Through this step, the target user determination device may calculate a first fraction of each feature in the first set in the plurality of seed user samples and a second fraction of each feature in the first set in the plurality of non-seed user samples. And then the negative sample set can be screened out through the subsequent steps.

S130, generating a second feature set or a third feature set according to the magnitude relation of the first proportion and the second proportion of each feature; and selecting a first non-seed user sample from the plurality of non-seed user samples according to the second feature set or the third feature set to generate a negative sample set.

In some examples, in order to train the preset logistic regression model by using the negative sample set, in this step, the second feature set or the third feature set may be generated according to a magnitude relationship between the first proportion and the second proportion of each feature in the first set, and then the first non-seed user sample is selected from the plurality of non-seed user samples according to the second feature set or the third feature set, so as to generate the negative sample set.

Specifically, when the first proportion of the feature is smaller than the second proportion, the feature is added to the second feature set, and when the first proportion of the feature is larger than the second proportion, the feature is added to the third feature set.

For example, assuming that the first percentage of the feature "38" calculated by the target user specifying device is 25% and the second percentage is 16.7%, a third feature set including the feature "38" may be generated, and the first percentage of the calculated feature "man" is 50% and the second percentage is 66.7%, a second feature set including the feature "man" may be generated. Further, a first non-seed user sample may be selected from the plurality of non-seed user samples according to the third feature set or the second feature set, and a negative sample set may be generated.

In some examples, the target user determination device described above may generate only the second set of features, or only the third set of features. When only the second feature set is generated, the target user determination apparatus may select the first non-seed user sample among the plurality of non-seed user samples according to the second feature set, and generate the negative sample set.

When only the third feature set is generated, the target user determination apparatus may select the first non-seed user sample from the plurality of non-seed user samples according to the third feature set, and generate the negative sample set.

According to the method for determining the target user, the negative sample set is generated, the preset logistic regression model can be trained by using the negative sample set in the subsequent steps, and the target user can be searched by using the trained logistic regression model.

S140, a plurality of seed user samples are obtained, the plurality of seed user samples are used as a positive sample set, a first sample label of the positive sample set, a first feature vector of each seed user sample in the positive sample set, a second sample label of the negative sample set and a second feature vector of each non-seed user sample in the negative sample set are obtained, and a preset logistic regression model is trained to obtain a trained logistic regression model.

In some examples, the target user determination device may set a first exemplar label to the positive exemplar set and a second exemplar label to the negative exemplar set in advance, for example, the first exemplar label may be 1, and the second exemplar label may be 0 or-1.

In some examples, the plurality of seed user samples described above may include a feature vector for each seed user sample, and the plurality of non-seed user samples may include a feature vector for each non-seed user sample. Therefore, the target user determination device may obtain the first feature vector of each seed user sample in the positive sample set and the second feature vector of each non-seed user sample in the negative sample set.

In some examples, the above-mentioned preset logistic regression model may be a formula shown in formula (1):

Wherein, g (x)_i)＝w₀+w₁x_i1+…+w_jx_ij…+w_nx_in，x_ijRepresenting the jth feature vector of the ith user sample. i is more than or equal to 1, n is more than or equal to j is more than or equal to 1, and n is more than or equal to 1. P (y)_k＝1|x_i) The probability that the sample label of the ith user sample is 1 is represented, and k is 1 or 0. The probability that the sample label of the ith user sample is 0 is formula (2):

assuming that the total number of the seed user samples in the positive sample set and the non-seed user samples in the negative sample set obtained by the target user determination device is m, since m user samples are independent from each other, the joint distribution of all user samples is a product of edge distributions of the user samples, that is, formula (3):

then, the target user determination device can calculate g (x) by using the method of calculating the maximum likelihood estimation in the prior art_i) The respective parameters of (1): w is a₀,w₁,…,w_j,…,w_nSo that L (w) takes a maximum value. The method of calculating the maximum likelihood estimate is not described here too much.

In some examples, the target user determination device may substitute the first feature vector, the first sample label, the second sample label, and the second feature vector into equation (3) to calculate parameters satisfying the respective user samples: w is a₀,w₁,…,w_j,…,w_n. Thereby obtaining a trained logistic regression modelAnd (4) molding.

For example, assuming that the user sample 1, the user sample 2, the user sample 3, and the user sample 4 are user samples in a positive sample set, and the user sample 5, the user sample 6, the user sample 8, and the user sample 10 are user samples in a negative sample set, the target user determining apparatus may obtain a feature vector of the user sample 1, a feature vector of the user sample 2, a feature vector of the user sample 3, and a feature vector of the user sample 4, and obtain a feature vector of the user sample 5, a feature vector of the user sample 6, a feature vector of the user sample 8, and a feature vector of the user sample 10, respectively.

Then, training a preset logistic regression model through the feature vector of the user sample 1, the feature vector of the user sample 2, the feature vector of the user sample 3, the feature vector of the user sample 4 in the positive sample set, the feature vector of the user sample 5, the feature vector of the user sample 6, the feature vector of the user sample 8 and the feature vector of the user sample 10 in the negative sample set, namely calculating parameters meeting the 8 user samples through the formula (3): w is a₀,w₁,…,w_j,…,w_nThus, a trained logistic regression model can be obtained.

According to the embodiment of the invention, the seed user sample is used as the positive sample set, the first non-seed user sample selected from the plurality of non-seed user samples is used as the negative sample set according to the second feature set or the third feature set, and then the preset logistic regression model is trained by adopting the positive sample set and the negative sample set, so that the obtained trained logistic regression model can distinguish the positive sample from the negative sample. The accuracy of selecting the target user through the subsequent steps is improved.

S150, obtaining a third feature vector of each non-seed user sample in the non-seed user samples, and calculating the sample value of each non-seed user sample in the non-seed user samples according to the third feature vector and the trained logistic regression model.

In some examples, in the trained first logistic regression model, P (y)_k＝1|x_i) Sample that can represent the ith user sampleThe probability that the label is 1, and the greater the probability that the sample label is 1, the more suitable it can be said that the target user is.

In order to find the target user among the plurality of non-seed users, the target user determining apparatus may obtain a third feature direction ideal of each of the plurality of non-seed user samples after obtaining the trained logistic regression model, then calculate a probability that a sample label of each of the plurality of non-seed user samples is 1 through the third feature vector and the trained logistic regression model, and use the probability that the sample label of each of the plurality of non-seed user samples is 1 as the sample value of the non-seed user sample, thereby obtaining the sample values of all the non-seed user samples.

For example, after obtaining the trained logistic regression model, the target user identification device may calculate the probability that the sample label of the user sample 5 is 1, the probability that the sample label of the user sample 6 is 1, the probability that the sample label of the user sample 7 is 1, the probability that the sample label of the user sample 8 is 1, the probability that the sample label of the user sample 9 is 1, and the probability that the sample label of the user sample 10 is 1, based on the feature vector of the user sample 5, the feature vector of the user sample 6, the feature vector of the user sample 8, the feature vector of the user sample 9, and the feature vector of the user sample 10. The probability that the respective sample is labeled 1 is then taken as the corresponding sample value. Thus, the sample value of user sample 5, the sample value of user sample 6, the sample value of user sample 7, the sample value of user sample 8, the sample value of user sample 9, and the sample value of user sample 10 can be obtained.

And S160, acquiring the number of target users, selecting a first non-seed user sample meeting the number of the target users from a plurality of non-seed user samples according to the sequence of sample values from large to small, and taking the non-seed user corresponding to the first non-seed user sample as the target user.

In some examples, the advertisement demand sent by the advertiser to the advertisement delivery platform may include the target user number, and therefore, the target user determination device may obtain the target user number from the advertisement delivery platform.

Specifically, after the target user determining device obtains the number of the target users, the target user determining device may select, in the plurality of non-seed user samples, a first non-seed user sample corresponding to the number of the target users according to a descending order of sample values, and then use a non-seed user corresponding to the first non-seed user sample as the target user.

For example, it is assumed that the target user specifying device calculates the sample value of the user sample 5, the sample value of the user sample 6, the sample value of the user sample 7, the sample value of the user sample 8, the sample value of the user sample 9, and the sample value of the user sample 10 to be 0.65, 0.3, 0.55, 0.4, and 0.2, respectively, in the non-seed user sample. In this step, the sample value of the non-seed user sample may be obtained, that is: the sample value of user sample 5, the sample value of user sample 6, the sample value of user sample 7, the sample value of user sample 8, the sample value of user sample 9 and the sample value of user sample 10.

Assuming that the number of the target users is 3, the target user determining apparatus may select sample values 0.75, 0.65, and 0.55 from the 6 non-seed user samples in descending order of sample values, where the corresponding user samples are: user sample 9, user sample 5, and user sample 7.

Finally, the non-seed users corresponding to user sample 9, the non-seed users corresponding to user sample 5, and the non-seed users corresponding to user sample 7 may be determined to be target users.

In some examples, after determining the target user, the target user determining apparatus may send the identification information of the determined target user to the advertisement delivery platform, and the advertisement delivery platform may deliver an advertisement to the terminal device corresponding to the identification information of the target user.

According to the target user determination method provided by the embodiment of the invention, after a first feature set, a plurality of seed user samples and a plurality of non-seed user samples are obtained, for each feature in the first feature set, a first proportion of the seed user sample with the feature in the plurality of seed user samples and a second proportion of the non-seed user sample with the feature in the plurality of non-seed user samples are calculated, and then a second feature set or a third feature set used for generating a negative sample set is generated according to the magnitude relation of the first proportion and the second proportion of each feature; generating a negative sample set according to the magnitude relation between the first proportion and the second proportion, so that a preset logistic regression model can be trained by adopting the negative sample set and the positive sample set, and calculating the sample value of each non-seed user sample in a plurality of non-seed user samples through the third feature vector of each non-seed user sample in the plurality of non-seed user samples and the trained logistic regression model after the trained logistic regression model is obtained; the larger the sample value is, the more likely it is to become the target user, therefore, in the plurality of non-seed user samples, the first non-seed user samples corresponding to the number of the target users can be selected according to the sequence from the larger sample value to the smaller sample value, and the non-seed users corresponding to the first non-seed user samples can be taken as the target users, so that the determination of the appropriate target users can be realized according to the fewer seed users provided by the advertiser.

In an optional embodiment of the present invention, on the basis of the method for determining a target user shown in fig. 1, a possible implementation manner is further provided in the embodiment of the present invention, as shown in fig. 2, which is a flowchart of a second implementation manner of the method for determining a target user according to the embodiment of the present invention, and in fig. 2, before acquiring a first feature set, a plurality of seed user samples, and a plurality of non-seed user samples in S110, the method for determining a target user according to the embodiment of the present invention may further include:

s170, acquiring first characteristics of a plurality of seed user samples and second characteristics of a plurality of non-seed user samples, and establishing a first characteristic set according to the first characteristics and the second characteristics.

Wherein the features in the first set of features are not repeated.

In some examples, the advertisement delivery platform may perform feature analysis on historical users watching movies, or perform feature analysis on historical users clicking historical advertisements, obtain features of the historical users, and then establish the first feature set.

In some examples, when the seed user sample and the non-seed user sample are both user samples in the historical user database of the advertisement delivery platform, the target user determination apparatus may first obtain a first feature of the seed user sample from the historical user database, obtain a second feature of the non-seed user sample from the historical user database, and then establish the first feature set by the first feature and the second feature.

For example, assume that a plurality of seed user samples and corresponding features are: user sample 1, the corresponding features are: 38. for men and beijing, user sample 2, the corresponding characteristics are: 40. women, Guangzhou, user sample 3, the corresponding features are: 45. male, shanghai, user sample 4, the corresponding characteristics are: 47. women, Tianjin, multiple non-seed user samples and corresponding features are: user sample 5, the corresponding features are: 50. for men and beijing, user sample 6, the corresponding characteristics are: 47. male, guangzhou, user sample 7, the corresponding features are: 40. woman, tianjin, user sample 8, corresponding features are: 50. male, shanghai, user sample 9, corresponding features are: 38. women, beijing, user sample 10, the corresponding characteristics are: 45. male, Tianjin.

The target user determination device may obtain that the first characteristic is: 38. 40, 45, 47, male, female, Beijing, Guangzhou, Shanghai, Tianjin. The second characteristic is that: 38. 40, 45, 47, 50, male, female, Beijing, Guangzhou, Tianjin, Shanghai.

The first feature and the second feature may then be combined, and the combined first feature and second feature may be subjected to de-duplication processing, so as to obtain a first feature set.

The first feature set is generated by the method for determining the target user, the number of features in the first feature set can be reduced, the calculation amount of calculating the first proportion and the second proportion can be reduced, and the time overhead of determining the target user by applying the method for determining the target user is reduced.

In an optional embodiment of the present invention, on the basis of the method for determining a target user shown in fig. 2, a possible implementation manner is further provided in the embodiment of the present invention, as shown in fig. 3, which is a flowchart of a third implementation manner of the method for determining a target user according to the embodiment of the present invention, in fig. 3, after acquiring, at S110, a first feature set, a plurality of seed user samples, and a plurality of non-seed user samples, the method for determining a target user according to the embodiment of the present invention further includes:

and S180, coding each feature in the first feature set to obtain a coded first feature set.

In some examples, in order to reduce the occupation of the storage space of the hardware device by each feature in the first feature set and further reduce the time overhead of determining the target user by applying the method for determining the target user according to the embodiment of the present invention, after the target user determining device obtains the first feature set, the target user determining device may further encode each feature in the first feature set to obtain an encoded first feature set.

For example, it is assumed that the first feature set obtained by the target user determination device is: { ages {38, 40, 45, 47, 50}, sexes { male, female }, city { beijing, guangzhou, shanghai, tianjin } }, each feature in the first feature set may be encoded with an arabic number, and each feature in the first feature set may be converted into a first feature set comprising the arabic number: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 }.

In some examples, the target user determination device may further encode the first feature set using lower case or upper case english letters, to obtain an encoded feature set: { a, b, c, d, e, f, g, h, i, j, k }.

Accordingly, after the target user determining apparatus encodes the first feature set, in step S120, for each feature in the first feature set, calculating a first percentage of the seed user sample having the feature in the plurality of seed user samples and a second percentage of the non-seed user sample having the feature in the plurality of non-seed user samples, which may include:

and S121, calculating a first proportion of the seed user sample with the characteristic in the plurality of seed user samples and a second proportion of the non-seed user sample with the characteristic in the plurality of non-seed user samples for each characteristic in the encoded first characteristic set.

In some examples, after encoding the first feature set, the target user determination apparatus may further encode each feature of the plurality of seed user samples and each feature of the plurality of non-seed user samples in the same encoding manner as the first feature set.

By encoding the first feature set, the features of the plurality of seed user samples and the features of the plurality of non-seed user samples, the target user determination device can calculate the first proportion and the second proportion by using the encoded features when calculating the first proportion and the second proportion, so that the occupation of the features on the storage space of hardware equipment can be reduced, and the time overhead of determining the target user by applying the method for determining the target user of the embodiment of the invention can be further reduced.

In an optional embodiment of the present invention, on the basis of the method for determining a target user shown in fig. 1, a possible implementation manner is further provided in the embodiment of the present invention, as shown in fig. 4, which is a flowchart of a fourth implementation manner of the method for determining a target user according to the embodiment of the present invention, in fig. 4, S130 generates a second feature set or a third feature set according to a size relationship between a first proportion and a second proportion of each feature; and selecting a first non-seed user sample from the plurality of non-seed user samples according to the second feature set or the third feature set, and generating a negative sample set, which may include:

And S131, for each feature in the first feature set, when the first proportion of the feature is smaller than the second proportion, adding the feature to the second feature set to obtain a second feature set added with a plurality of features.

In some examples, when the target user determination apparatus generates the second feature set or the third feature set according to the magnitude relationship between the first proportion and the second proportion of each feature, the embodiment of the present invention provides two possible implementations, and in one possible implementation, when the first proportion of any feature in the first feature set is smaller than the second proportion, any feature may be added to the second feature set, so that the second feature set to which multiple features are added may be obtained.

For example, assuming that the target user specification device described above calculates the first percentage of the feature "male" as 50% and the second percentage as 66.7%, the feature "male" may be added to the second feature set, and the first percentage of the calculated feature "38" as 40% and the second percentage as 45%, the feature "38" may be added to the second feature set, and the first percentage of the calculated feature "beijing" as 66% and the second percentage as 73%, the feature "beijing" may be added to the second feature set, and the like, so that the second feature set to which the features "38", "male" and "beijing" are added can be obtained.

S132, obtaining a plurality of features of a plurality of non-seed user samples, selecting a third feature in the second feature set from the plurality of features, and selecting a non-seed user sample corresponding to the third feature from the plurality of non-seed user samples to generate a negative sample set.

After obtaining the second feature set added with the plurality of features, in order to train the preset logistic regression model, the target user determination apparatus may use the second feature set to screen a plurality of non-seed user samples to obtain a screening result, and then generate a negative sample set according to the screening result.

Specifically, the target user determination device may obtain a feature of each non-seed user sample, determine whether the feature exists in the second feature set, and if so, obtain a non-seed user sample corresponding to the feature. A plurality of non-seed user samples with features in the second feature set may thus be obtained, and then a negative sample set may be generated using the plurality of non-seed user samples with features in the second feature set.

For example, assume that the second feature set is: "38", "man" and "beijing", the features of the user sample 5, the features of the user sample 7, the features of the user sample 8 and the presence of the user sample 10 in the second feature set, the target user determination apparatus may acquire the user sample 5, the user sample 7, the user sample 8 and the user sample 10, and generate a target user identification including: user sample 5, user sample 7, user sample 8, and negative sample set of user samples 10.

By comparing the first proportion and the second proportion of one feature, when the first proportion is smaller than the second proportion, the feature is more prone to negative samples, so that non-seed user samples corresponding to the feature can be used as user samples in a negative sample set, and a generated negative sample set can be used for training a preset logistic regression model to find target users.

In some examples, when the target user determining apparatus obtains the second feature set through the step S131, the second feature set may have features belonging to the seed user sample, and if a negative sample set is generated by using the second feature set, the trained logistic regression model is trained according to the negative sample set, so that the accuracy of the trained logistic regression model is reduced, and the accuracy of searching for the target user is further reduced.

In order to improve the accuracy of finding a target user by using a trained logistic regression model, on the basis of the method for determining a target user shown in fig. 1, another possible implementation manner is provided in the embodiments of the present invention to implement that the generated negative sample set only includes features of non-seed user samples.

As shown in fig. 5, which is a flowchart of a fifth implementation manner of a method for determining a target user according to an embodiment of the present invention, in fig. 5, S130 generates a second feature set or a third feature set according to a magnitude relationship between a first proportion and a second proportion of each feature; and selecting a first non-seed user sample from the plurality of non-seed user samples according to the second feature set or the third feature set, and generating a negative sample set, which may include:

s133, for each feature in the first feature set, when the first percentage is greater than the second percentage, adding the feature to the third feature set to obtain a third feature set to which a plurality of features are added.

In another possible implementation manner of the embodiment of the present invention, when the first percentage of any feature in the first feature set is greater than the second percentage, it is stated that the any feature is more prone to positive sampling, and the any feature may be added to the third feature set, so that the third feature set to which a plurality of features are added may be obtained.

For example, assuming that the target user specifying device described above calculates the first proportion of the feature "female" to be 61% and the second proportion to be 60%, the feature "female" may be added to the third feature set, the first proportion of the feature "45" to be 75% and the second proportion to be 47%, the feature "45" may be added to the third feature set, and the first proportion of the feature "guangzhou" to be 68% and the second proportion to be 59%, the feature "guangzhou" may be added to the third feature set, and so on, so that the third feature set to which the features "45", "female" and "guangzhou" are added can be obtained.

S134, obtaining a plurality of features of a plurality of non-seed user samples, selecting a fourth feature that does not exist in the third feature set from the plurality of features, and selecting a non-seed user sample corresponding to the fourth feature from the plurality of non-seed user samples to generate a negative sample set.

After obtaining the third feature set to which a plurality of features are added, in order to avoid a situation that there may be features belonging to the seed user sample in the second feature set, the target user determination apparatus may use the third feature set to screen a plurality of non-seed user samples to obtain a screening result, and then generate a negative sample set according to the screening result.

Specifically, the target user determining apparatus may obtain a feature of each non-seed user sample, determine whether the feature exists in the third feature set, and if not, obtain a non-seed user sample corresponding to the feature, so as to obtain a plurality of non-seed user samples whose features do not exist in the third feature set, and then generate the negative sample set by using the plurality of non-seed user samples whose features do not exist in the third feature set.

For example, assume that the third feature set is: "45", "woman", "guangzhou", the characteristics of the user sample 5 and the characteristics of the user sample 8 do not exist in the third feature set, the target user determination apparatus described above may acquire the user sample 5 and the user sample 8 and generate a negative sample set including the user sample 5 and the user sample 8.

By comparing the first proportion and the second proportion of one feature, when the first proportion is larger than the second proportion, the feature is more prone to positive samples, so that the feature can be used as a third feature set to screen a plurality of non-seed user samples, and the features of the screened non-seed user samples do not exist in the third feature set. After the preset logistic regression model is trained by using the negative sample set generated according to the screened result, the accuracy of the trained logistic regression model can be improved, and the accuracy of searching the target user by applying the method for determining the target user provided by the embodiment of the invention can be further improved.

Corresponding to the foregoing method embodiment, an embodiment of the present invention further provides a device for determining a target user, as shown in fig. 6, which is a schematic structural diagram of the device for determining a target user according to the embodiment of the present invention, and in fig. 6, the device for determining a target user according to the embodiment of the present invention may include:

an obtaining module 610, configured to obtain a first feature set, a plurality of seed user samples, and a plurality of non-seed user samples;

a proportion calculation module 620, configured to calculate, for each feature in the first feature set, a first proportion of the seed user sample having the feature in the plurality of seed user samples and a second proportion of the non-seed user sample having the feature in the plurality of non-seed user samples;

A negative sample set generating module 630, configured to generate a second feature set or a third feature set according to a magnitude relationship between the first proportion and the second proportion of each feature; selecting a first non-seed user sample from the plurality of non-seed user samples according to the second feature set or the third feature set to generate a negative sample set;

the training module 640 is configured to obtain a plurality of seed user samples, use the plurality of seed user samples as a positive sample set, obtain a first sample label of the positive sample set, a first feature vector of each seed user sample in the positive sample set, a second sample label of the negative sample set, and a second feature vector of each non-seed user sample in the negative sample set, and train a preset logistic regression model to obtain a trained logistic regression model;

the sample value calculation module 650 is configured to obtain a third feature vector of each non-seed user sample in the plurality of non-seed user samples, and calculate a sample value of each non-seed user sample in the plurality of non-seed user samples according to the third feature vector and the trained logistic regression model;

the target user selection module 660 is configured to obtain the number of target users, select, in the multiple non-seed user samples, a first non-seed user sample that meets the number of the target users according to a descending order of sample values, and use a non-seed user corresponding to the first non-seed user sample as the target user.

By the device for determining the target user, after a first feature set, a plurality of seed user samples and a plurality of non-seed user samples are obtained, for each feature in the first feature set, a first proportion of the seed user sample with the feature in the plurality of seed user samples and a second proportion of the non-seed user sample with the feature in the plurality of non-seed user samples are calculated, and then a second feature set or a third feature set for generating a negative sample set is generated according to a magnitude relation between the first proportion and the second proportion of each feature and a negative sample set is generated; generating a negative sample set according to the magnitude relation between the first proportion and the second proportion, so that a preset logistic regression model can be trained by adopting the negative sample set and the positive sample set, and calculating the sample value of each non-seed user sample in a plurality of non-seed user samples through the third feature vector of each non-seed user sample in the plurality of non-seed user samples and the trained logistic regression model after the trained logistic regression model is obtained; the larger the sample value is, the more likely it is that the target user is, therefore, in the multiple non-seed user samples, according to the sequence from large sample value to small sample value, the first non-seed user sample corresponding to the target user number may be selected, and the non-seed user corresponding to the first non-seed user sample may be taken as the target user, so that the determination of the appropriate target user may be implemented according to fewer seed users provided by advertisers.

Specifically, the apparatus for determining a target user according to the embodiment of the present invention may further include:

Specifically, the negative sample set generating module 630 includes:

Specifically, the negative sample set generating module 630 may further include:

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, which includes a processor 710, a communication interface 720, a memory 730, and a communication bus 740, where the processor 710, the communication interface 720, and the memory 730 complete communication with each other through the communication bus 740,

A memory 730 for storing a computer program;

the processor 710, when executing the program stored in the memory 730, implements the following steps:

the method comprises the steps of obtaining a plurality of seed user samples, taking the plurality of seed user samples as a positive sample set, obtaining a first sample label of the positive sample set, a first feature vector of each seed user sample in the positive sample set, a second sample label of the negative sample set and a second feature vector of each non-seed user sample in the negative sample set, and training a preset logistic regression model to obtain a trained logistic regression model;

Obtaining a third feature vector of each non-seed user sample in the plurality of non-seed user samples, and calculating the sample value of each non-seed user sample in the plurality of non-seed user samples according to the third feature vector and the trained logistic regression model;

By means of the electronic device, after a first feature set, a plurality of seed user samples and a plurality of non-seed user samples are obtained, for each feature in the first feature set, a first proportion of the seed user sample with the feature in the plurality of seed user samples and a second proportion of the non-seed user sample with the feature in the plurality of non-seed user samples are calculated, and then a second feature set or a third feature set used for generating a negative sample set is generated according to the magnitude relation of the first proportion and the second proportion of each feature and a negative sample set is generated; generating a negative sample set according to the magnitude relation between the first proportion and the second proportion, so that a preset logistic regression model can be trained by adopting the negative sample set and the positive sample set, and after the trained logistic regression model is obtained, calculating the sample value of each non-seed user sample in a plurality of non-seed user samples through the third feature vector of each non-seed user sample in the plurality of non-seed user samples and the trained logistic regression model; the larger the sample value is, the more likely it is to become the target user, therefore, in the plurality of non-seed user samples, the first non-seed user samples corresponding to the number of the target users can be selected according to the sequence from the larger sample value to the smaller sample value, and the non-seed users corresponding to the first non-seed user samples can be taken as the target users, so that the determination of the appropriate target users can be realized according to the fewer seed users provided by the advertiser.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

In yet another embodiment of the present invention, a computer-readable storage medium is provided, which has instructions stored therein, which when run on a computer, cause the computer to perform a method of determining a target user as described in any of the above embodiments.

By means of a computer-readable storage medium of an embodiment of the present invention, after a first feature set, a plurality of seed user samples, and a plurality of non-seed user samples are acquired, for each feature in the first feature set, a first proportion of the seed user sample having the feature in the plurality of seed user samples and a second proportion of the non-seed user sample having the feature in the plurality of non-seed user samples are calculated, and then a second feature set or a third feature set for generating a negative sample set is generated according to a magnitude relationship between the first proportion and the second proportion of each feature, and a negative sample set is generated; generating a negative sample set according to the magnitude relation between the first proportion and the second proportion, so that a preset logistic regression model can be trained by adopting the negative sample set and the positive sample set, and calculating the sample value of each non-seed user sample in a plurality of non-seed user samples through the third feature vector of each non-seed user sample in the plurality of non-seed user samples and the trained logistic regression model after the trained logistic regression model is obtained; the larger the sample value is, the more likely it is to become the target user, therefore, in the plurality of non-seed user samples, the first non-seed user samples corresponding to the number of the target users can be selected according to the sequence from the larger sample value to the smaller sample value, and the non-seed users corresponding to the first non-seed user samples can be taken as the target users, so that the determination of the appropriate target users can be realized according to the fewer seed users provided by the advertiser.

In a further embodiment of the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform a method of determining a target user as described in any of the above embodiments.

By means of a computer program product containing instructions of an embodiment of the present invention, after a first feature set, a plurality of seed user samples, and a plurality of non-seed user samples are obtained, for each feature in the first feature set, a first proportion of the seed user sample having the feature in the plurality of seed user samples and a second proportion of the non-seed user sample having the feature in the plurality of non-seed user samples are calculated, and then a second feature set or a third feature set for generating a negative sample set is generated according to a magnitude relationship between the first proportion and the second proportion of each feature, and a negative sample set is generated; generating a negative sample set according to the magnitude relation between the first proportion and the second proportion, so that a preset logistic regression model can be trained by adopting the negative sample set and the positive sample set, and calculating the sample value of each non-seed user sample in a plurality of non-seed user samples through the third feature vector of each non-seed user sample in the plurality of non-seed user samples and the trained logistic regression model after the trained logistic regression model is obtained; the larger the sample value is, the more likely it is to become the target user, therefore, in the plurality of non-seed user samples, the first non-seed user samples corresponding to the number of the target users can be selected according to the sequence from the larger sample value to the smaller sample value, and the non-seed users corresponding to the first non-seed user samples can be taken as the target users, so that the determination of the appropriate target users can be realized according to the fewer seed users provided by the advertiser.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A method of identifying a target user, the method comprising:

for each feature in the first set of features, calculating a first fraction of seed user samples having that feature in the plurality of seed user samples and a second fraction of non-seed user samples having that feature in the plurality of non-seed user samples;

generating a second feature set or a third feature set according to the magnitude relation between the first proportion and the second proportion of each feature; selecting a first non-seed user sample from the plurality of non-seed user samples according to the second feature set or the third feature set to generate a negative sample set;

Obtaining a third feature vector of each non-seed user sample in the plurality of non-seed user samples, and calculating a sample value of each non-seed user sample in the plurality of non-seed user samples according to the third feature vector and the trained logistic regression model;

acquiring the number of target users, selecting a first non-seed user sample meeting the number of the target users from the plurality of non-seed user samples according to the sequence of the sample values from large to small, and taking a non-seed user corresponding to the first non-seed user sample as a target user;

generating a second feature set or a third feature set according to the magnitude relation of the first proportion and the second proportion of each feature; and according to the second feature set or the third feature set, selecting a first non-seed user sample from the plurality of non-seed user samples to generate a negative sample set, including:

for each feature in the first feature set, when the first proportion of the feature is smaller than the second proportion, adding the feature into a second feature set to obtain a second feature set added with a plurality of features; obtaining a plurality of features of the plurality of non-seed user samples, selecting a third feature existing in the second feature set from the plurality of features, and selecting a non-seed user sample corresponding to the third feature from the plurality of non-seed user samples to generate a negative sample set;

Or

For each feature in the first feature set, when the first proportion is larger than the second proportion, adding the feature to a third feature set to obtain a third feature set added with a plurality of features; and acquiring a plurality of features of the plurality of non-seed user samples, selecting a fourth feature which does not exist in the third feature set from the plurality of features, and selecting a non-seed user sample corresponding to the fourth feature from the plurality of non-seed user samples to generate a negative sample set.

2. The method of claim 1, wherein prior to said obtaining the first feature set, the plurality of seed user samples, and the plurality of non-seed user samples, the method further comprises:

obtaining first features of the plurality of seed user samples and second features of the plurality of non-seed user samples, and establishing the first feature set according to the first features and the second features, wherein each feature in the first feature set is not repeated.

3. The method of claim 1, wherein after the obtaining the first feature set, the plurality of seed user samples, and the plurality of non-seed user samples, the method further comprises:

correspondingly, for each feature in the first feature set, calculating a first fraction of seed user samples having the feature in the plurality of seed user samples and a second fraction of non-seed user samples having the feature in the plurality of non-seed user samples includes:

for each feature in the encoded first feature set, a first fraction of seed user samples having the feature in the plurality of seed user samples and a second fraction of non-seed user samples having the feature in the plurality of non-seed user samples are calculated.

4. An apparatus for determining a target user, the apparatus comprising:

a proportion calculation module for calculating, for each feature in the first feature set, a first proportion of seed user samples having the feature in the plurality of seed user samples and a second proportion of non-seed user samples having the feature in the plurality of non-seed user samples;

The negative sample set generating module is used for generating a second feature set or a third feature set according to the magnitude relation of the first proportion and the second proportion of each feature; selecting a first non-seed user sample from the plurality of non-seed user samples according to the second feature set or the third feature set to generate a negative sample set;

the training module is used for obtaining the plurality of seed user samples, taking the plurality of seed user samples as a positive sample set, obtaining a first sample label of the positive sample set, a first feature vector of each seed user sample in the positive sample set, a second sample label of the negative sample set and a second feature vector of each non-seed user sample in the negative sample set, and training a preset logistic regression model to obtain a trained logistic regression model;

the sample value calculation module is used for acquiring a third feature vector of each non-seed user sample in the plurality of non-seed user samples and calculating the sample value of each non-seed user sample in the plurality of non-seed user samples according to the third feature vector and the trained logistic regression model;

The target user selection module is used for acquiring the number of target users, selecting a first non-seed user sample meeting the number of the target users from the plurality of non-seed user samples according to the sequence of the sample values from large to small, and taking a non-seed user corresponding to the first non-seed user sample as a target user;

the negative sample set generation module comprises:

a second feature set generation submodule, configured to, for each feature in the first feature set, add the feature to a second feature set when the first proportion of the feature is smaller than the second proportion, to obtain a second feature set to which multiple features are added; a first negative sample set generation submodule, configured to obtain a plurality of features of the plurality of non-seed user samples, select, among the plurality of features, a third feature existing in the second feature set, and select, among the plurality of non-seed user samples, a non-seed user sample corresponding to the third feature, so as to generate a negative sample set;

or

The negative sample set generation module comprises:

a third feature set generation submodule, configured to, for each feature in the first feature set, add the feature to a third feature set when the first proportion is greater than the second proportion, to obtain a third feature set to which multiple features are added; and the second negative sample set generation submodule is used for acquiring a plurality of features of the plurality of non-seed user samples, selecting a fourth feature which does not exist in the third feature set from the plurality of features, and selecting a non-seed user sample corresponding to the fourth feature from the plurality of non-seed user samples to generate a negative sample set.

5. The apparatus of claim 4, further comprising:

a first feature set establishing module, configured to obtain first features of the multiple seed user samples and second features of the multiple non-seed user samples, and establish the first feature set according to the first features and the second features, where each feature in the first feature set is not repeated.

6. The apparatus of claim 4, further comprising:

7. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

A memory for storing a computer program;

a processor for implementing the method steps of any of claims 1 to 3 when executing a program stored in the memory.