CN105447730B

CN105447730B - Target user orientation method and device

Info

Publication number: CN105447730B
Application number: CN201510996108.5A
Authority: CN
Inventors: 王莉峰
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2015-12-25
Filing date: 2015-12-25
Publication date: 2020-11-06
Anticipated expiration: 2035-12-25
Also published as: CN105447730A

Abstract

The embodiment of the invention discloses a target user orientation method and a target user orientation device, wherein the method comprises the following steps: generating a virtual user based on the user characteristics of the seed user; the seed user and the virtual user are taken together as a positive sample user; determining a negative sample user; respectively extracting user characteristics of the positive sample user and the negative sample user; training based on the user characteristics of the positive sample user and the negative sample user to determine an orientation parameter; and based on the orientation parameters, positioning a target user meeting a preset similarity condition with the seed user.

Description

Target user orientation method and device

Technical Field

The invention relates to the technical field of information, in particular to a target user orientation method and device.

Background

With the development of information technology, determining a target user for information push in the field of information push to improve the efficiency of information push is a problem that is always addressed in the prior art. The information push comprises advertisement push, video, audio, image-text information and the like recommended to the user. In order to reduce the user's dislike of information push and improve the user satisfaction and the efficiency of transmission and conversion of push information, users who may be interested in push information generally need to be selected, and a target user is located. In the prior art, the following methods for locating a target user are provided:

the first method comprises the following steps: taking the conversion user within a period of time as a seed user and a positive sample user; taking the untransformed user as a negative sample user; then, extracting user characteristics such as user query behaviors, webpage browsing behaviors, social data and the like, and obtaining a conversion model according to the extracted user behaviors; and predicting all users by using the conversion model to determine whether the users are audiences of the advertisement.

And the second method comprises the following steps: the user with the conversion behavior is used as a positive sample user, and other users are used as negative sample users to obtain a primary selection model; the model focuses primarily on this feature of the web page to which the user has targeted the advertisement. And the target user is positioned by utilizing a selection model, and the selection model not only pays attention to whether the user visits the corresponding website, but also pays attention to more attribute tags of the user. The attribute tags may include category tags that include characteristics associated with the advertising website that are of interest to the user, and the like. And combining the primary selection model and the fine selection model to locate the target user.

And the third is that: firstly, extracting user characteristics by using data such as user registration, behaviors, social contact and the like, establishing an inverted index, wherein index items are characteristics, index values are users, and updating the index values periodically; then, after the advertiser gives a seed user number packet, screening out a feature subset with higher relevance based on feature selection methods such as mutual information and the like on the basis of the extracted user features; and finally, constructing a process for searching and querying similar users by using the feature subset with high relevance, wherein the process is similar to a process for searching and querying similar documents by using a search engine. The advertiser performs image analysis based on the expanded similar users to determine whether to use the function. Besides providing online retrieval and expansion of similar users, an offline mining model is also provided, and similarly, on the basis of the high correlation characteristic of a given seed user number packet, local sensitive Hash clustering is carried out on the users, and similar users are mined by calculating Hamming distance.

Although the method realizes the positioning of the target user, the positioning accuracy of the target user is still low, and the target user positioned by the method is wrongly identified as the target user or the real target user is omitted.

Disclosure of Invention

In view of this, embodiments of the present invention are intended to provide a method and an apparatus for positioning a target user, which at least partially solve the problem of inaccurate positioning of the target user.

In order to achieve the purpose, the technical scheme of the invention is realized as follows:

a first aspect of an embodiment of the present invention provides a target user targeting method, where the method includes:

generating a virtual user based on the user characteristics of the seed user;

the seed user and the virtual user are taken together as a positive sample user;

determining a negative sample user;

respectively extracting user characteristics of the positive sample user and the negative sample user;

training based on the user characteristics of the positive sample user and the negative sample user to determine an orientation parameter;

and based on the orientation parameters, positioning a target user meeting a preset similarity condition with the seed user.

Based on the above scheme, the generating a virtual user based on the user characteristics of the seed user includes:

determining a first position of the seed user in a feature space;

determining a second location based on at least two of the first locations;

determining a user characteristic of the virtual user based on the second location and generating a virtual user.

extracting numerical user features in the user features of the seed users;

determining a selectable range of numerical user characteristics based on the numerical user characteristics;

selecting a value from the selectable range as a numerical user characteristic of the virtual user;

extracting non-numerical user features in the user features of the seed user, and giving probability values to the non-numerical user features;

determining non-numerical user characteristics of the virtual user based on the probability values.

Based on the above scheme, the determining negative example users includes:

calculating the similarity between the alternative sample user and the seed user based on the user characteristics of the seed user;

and determining the alternative sample users meeting the negative sample user condition as the negative sample users in the sample users based on the similarity.

Based on the above scheme, the training based on the user characteristics of the positive sample user and the negative sample user to determine the orientation parameters includes:

and performing model training by using the user characteristics of the positive sample user and the negative sample user to determine a classification model of the selected target user.

Based on the above scheme, the performing model training by using the user characteristics of the sample user to determine the training model of the selected target user includes:

dividing the positive sample user and the negative sample user into a training set, a verification set and a test set;

performing model training by using different training algorithms by using the training set;

verifying whether the model training needs to be continued by using the verification set;

after stopping the model training, carrying out effect evaluation on the alternative models obtained by each training algorithm by using the test set;

based on the effectiveness evaluation, one of the candidate models is selected as the classification model.

A second aspect of the embodiments of the present invention provides a target user directing apparatus, including:

the generating unit is used for generating a virtual user based on the user characteristics of the seed user;

a first determining unit, configured to use the seed user and the virtual user together as a positive sample user;

a second determination unit for determining a negative example user;

the training unit is used for respectively extracting the user characteristics of the positive sample user and the negative sample user;

and the positioning unit is used for positioning the target user meeting the preset similarity condition with the seed user based on the orientation parameter.

Based on the above scheme, the generating unit is configured to determine a first position of the seed user in a feature space; determining a second location based on at least two of the first locations; determining a user characteristic of the virtual user based on the second location and generating a virtual user.

Based on the above scheme, the generating unit is specifically configured to extract numerical user features from the user features of the seed user; determining a selectable range of numerical user characteristics based on the numerical user characteristics; selecting a value from the selectable range as a numerical user characteristic of the virtual user; extracting non-numerical user features in the user features of the seed user, and giving probability values to the non-numerical user features; determining non-numerical user characteristics of the virtual user based on the probability values.

Based on the above scheme, the second determining unit is specifically configured to calculate a similarity between the candidate sample user and the seed user based on the user characteristics of the seed user; and determining the alternative sample users meeting the negative sample user condition as the negative sample users in the sample users based on the similarity.

Based on the above scheme, the training unit is specifically configured to perform model training by using the user characteristics of the positive sample user and the negative sample user, and determine a classification model of the selected target user.

Based on the above scheme, the training unit is specifically configured to divide the positive sample users and the negative sample users into a training set, a verification set, and a test set; performing model training by using different training algorithms by using the training set; verifying whether the model training needs to be continued by using the verification set; after stopping the model training, carrying out effect evaluation on the alternative models obtained by each training algorithm by using the test set; based on the effectiveness evaluation, one of the candidate models is selected as the classification model.

According to the target user positioning method and device, firstly, the virtual users are generated based on the user characteristics of the seed users, the seed users and the virtual users are used as the positive sample users, the number of the positive sample users is obviously increased, the problem of unbalanced training data caused by the fact that the number of the positive sample users is small and the number of the negative samples is large can be obviously relieved, and further the phenomena that the identification of the directional parameters obtained by training due to unbalanced data can be not strong enough and the positioning accuracy of the target users is low can be relieved.

Drawings

Fig. 1 is a schematic flowchart of a first target user targeting method according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of determining a positive sample user according to an embodiment of the present invention;

fig. 3 is a schematic diagram illustrating generation of a first virtual user according to an embodiment of the present invention;

fig. 4 is a schematic diagram illustrating generation of a second virtual user according to an embodiment of the present invention;

fig. 5A is a schematic flowchart of a first method for determining negative examples according to an embodiment of the present invention;

fig. 5B is a schematic flowchart of a second method for determining negative examples according to an embodiment of the present invention;

fig. 6 is a schematic flow chart of a method for determining orientation parameters according to an embodiment of the present invention;

fig. 7 is a flowchart illustrating a second target user targeting method according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a target user orientation apparatus according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of another target user orientation apparatus according to an embodiment of the present invention.

Detailed Description

The technical solution of the present invention is further described in detail with reference to the drawings and the specific embodiments of the specification.

The first embodiment is as follows:

as shown in fig. 1, the present embodiment provides a target user targeting method, where the method includes:

step S110: generating a virtual user based on the user characteristics of the seed user;

step S120: the seed user and the virtual user are taken together as a positive sample user;

step S130: determining a negative sample user;

step S140: respectively extracting user characteristics of the positive sample user and the negative sample user;

step S150: and based on the orientation parameters, positioning a target user meeting a preset similarity condition with the seed user.

The target user targeting method described in this embodiment may be used in an application scenario of target user targeting of an advertisement or target user targeting of recommendation of multimedia information such as video and audio.

The seed user is a user who performs a target activity, for example, in the field of advertisement, the seed user may include a commodity which clicks on the advertisement, or purchases the advertisement, or an application which downloads the advertisement, that is, a user who performs a conversion action desired by the advertisement, or a user who evaluates that the target activity is performed with a very high probability.

The seed user described in this embodiment may be received by extracting from a peripheral device, or may be a user determined by data processing itself. For example, it is necessary to publish an advertisement a, and if the advertisement a is published for the first time, a user performing a corresponding conversion behavior on the advertisement a may be extracted as a seed user of the advertisement a by obtaining historical advertisement data of an advertisement B that is very similar to the advertisement a. Of course, after the advertisement a has been put in the test for a period of time, the test putting effect data of the advertisement a may be extracted, and the user who clicks the advertisement a or performs the desired conversion behavior of the advertisement a may be extracted as the seed user.

In step S110 of this embodiment, a virtual user is generated based on the user characteristics of the seed user, and in step S120, the seed user and the virtual user are jointly used as the positive sample user, which obviously increases the number of positive sample users. For example, the current seed users are M, and N virtual users are generated in step S110 according to the user characteristics of the seed users, so that the total number of positive sample users is M + N, which is obviously greater than the number M of seed users. Compared with the prior art that only the seed user is used as the positive sample user, the method obviously expands the number of the positive sample users, thereby increasing the data of the positive sample users, and reducing the phenomenon that the identification of the obtained orientation parameters is not strong enough and the positioning of the target user is not accurate due to less data of the positive sample users.

In step S110, the virtual users generated based on the seed user are all users with a great similarity to the seed user, for example, the similarity between the virtual user and the seed user is greater than a similarity threshold. For example, the age of the seed users may be between 20 and 25 years, at which time the age of the virtual users may all be between 20 and 25. Thus, it is obvious that the similarity between the virtual user and the seed user is high.

There are many common user characteristics between seed users who usually view the same advertisement or all perform the conversion behavior of the advertisement, but the virtual user generated by this embodiment may be a user very similar to the seed user, and the virtual user may be a user corresponding to the real existence, but may also be a purely virtual user because of the missing users of the statistics of the database, and there is no user; but because the similarity with the seed user is very large, the probability that the user watches the same advertisement or executes the conversion behavior of the advertisement is very large as the seed user, so that the method can be used as a positive sample user and is a data source for training the directional parameters of the targeted user.

As shown in fig. 2, the seed users represented by the solid lines expand the virtual users represented by the dotted lines through user expansion. The number of users in the positive sample user set composed of the seed users and the virtual users is the sum of the number of the seed users and the number of the virtual users, and is obviously greater than the number of the seed users.

In step S130, any method may be used to determine the negative sample user, for example, any user other than the seed user may be selected as the negative sample user. It should be noted that, in this embodiment, there is no certain precedence relationship between the step S110 and the step S130. The step S110 may be performed before the step S130, or may be performed after the step S130.

In this embodiment, since the positive sample user is a user very similar to the seed user, the positive sample user is a user with a very high probability called the target user. For example, the target user targeting method of this embodiment is applied to the field of advertisements, and the target user is a push user of an advertisement, and the positive sample user is a user who largely views, clicks and/or purchases goods or services advertised by the advertisement, that is, a user who has a higher probability of performing a conversion behavior expected by the advertisement. The conversion behavior here may include clicking on the advertisement into a landing page of the advertisement; purchase goods or services for the advertisement, download an application for the advertisement, or participate in an campaign for the advertisement, such as providing feedback to the brand of the advertisement.

After the positive sample user and the negative sample user are determined, the positive sample user and the negative sample user are trained, respectively, in step S140 to obtain the orientation model. How to train the user characteristics of the positive sample user and the negative sample user can be trained by various training tools such as a neural network and a learning machine, and finally directional parameters capable of dividing the user into a target user and a non-target user are obtained.

In particular implementations, the user characteristics may include one or more of user attribute characteristics, user behavior characteristics, and user social characteristics, e.g., targeted to a target user for advertising on a network platform such as a social network. The user attribute features may include user information such as user age, user gender, user education level, user education background, user occupation, and user hobbies. The user behavior characteristics may include content posted by the user on the social platform, a public number of interest of the user, an Application (App) downloaded by the user, and some user operations performed by the user using the App. The content published here may include original content published by the user and also may include content forwarded by the user. Some user operations performed by the user with the App here may be, for example, videos that the user watches with a video App. Analyzing information of actors in the videos can determine whether the user is a fan of an actor, and if the user is a fan of the actor, recommending advertisements including the actor to the user can obtain a high conversion probability. Of course, the user behavior characteristics can be divided into the user long-term behavior habit, the user short-term behavior bias and the like according to time. The user social characteristics may include information such as a social relationship chain of the user, a social frequency of the user with other users, a social platform that the user is accustomed to, and the like. These characteristics can reflect the user's needs and the content of interest to the user, which can be used as parameters for targeting the target user.

Finally, it is determined in step S150 that the user meeting the preset similarity condition with the seed user is the target user, and in this case, because of the similarity with the seed user, the target user has a high probability of executing an activity similar to or the same as that of the seed user, and can be used as the target user for executing a certain project targeting activity. Such as pushing advertisements to targeted users.

In summary, the embodiment provides a target user orientation method, which is not limited to the number of the obtained seed users, but automatically expands the sample users according to the user characteristics of the seed users, so as to obtain more sample users or sample users that can be more accurately located by the target user, and improve the location accuracy of the target user.

Example two:

as shown in fig. 1, the present embodiment provides a target user targeting method, where the method includes: step S110: generating a virtual user based on the user characteristics of the seed user; the seed user can be a user provided by an advertiser or a user determined according to historical advertisement data of the advertisement; step S120: the seed user and the virtual user are taken together as a positive sample user; in this way, the expansion of the positive sample user is realized through the generation of the virtual user; step S130: determining a negative sample user; for example, users other than the seed user may be randomly selected as the negative example user, or a part of users may be selected from the negative example users as the negative example users; step S140: respectively extracting user characteristics of the positive sample user and the negative sample user; step S150: and based on the orientation parameters, positioning a target user meeting a preset similarity condition with the seed user.

In this embodiment, the step S110 may include:

determining first positions of M seed users in a feature space; determining a second location based on at least two of the first locations; determining a user characteristic of the virtual user based on the second location and generating a virtual user.

In this embodiment, the feature space may be a multi-dimensional vector space, in the vector space, the seed users may be aggregated in at least one dimension or multiple dimensions due to the similarity thereof, in this embodiment, an aggregation range may be determined in a dimension satisfying aggregation, and the virtual user may be constructed in a blank area of the aggregation range. In the dimension that does not satisfy the focus, the user characteristics of the virtual user can be randomly constructed or constructed based on the user characteristics of the seed user.

The two first positions in this embodiment may be the extreme positions of the aggregate range.

As shown in fig. 3, in the two-dimensional vector space, the features of the seed user are determined according to the user features of the seed user. Both the first position and the second position can be represented by coordinates in a vector space. As can be seen from fig. 4, the seed users gather near location a, and there is a very large possibility that these users in the blank area near location a are seed users, but these users are missed due to the problems of insufficient data volume and the like when determining the seed users. In the present embodiment, a virtual user is constructed by using step S110. These virtual users and seed users are collectively formed into positive sample users to expand the number of positive sample users. More positive sample users are used for providing user characteristics to determine the target user, more parameters or more accurate parameters can be provided for positioning the target user, and therefore positioning accuracy of the target user can be improved.

Example three:

In this embodiment, the step S110 may include: extracting numerical user features in the user features of the seed users; determining a selectable range of numerical user characteristics based on the numerical user characteristics; selecting a value from the selectable range as a numerical user characteristic of the virtual user; extracting non-numerical user features in the user features of the seed user, and giving probability values to the non-numerical user features; determining non-numerical user characteristics of the virtual user based on the probability values.

In this embodiment, the numerical user characteristic is a quantifiable user characteristic, such as the age of the user, the frequency of watching a certain type of video, and the like. In this case, the numerical user characteristics of the seed user will form the data interval. For example, through statistics, it is found that the age of seed users is between 15 and 23 years. Here, 15 to 23 are the data intervals, i.e., the optional ranges mentioned above. Thus, when generating a virtual user, an age may be selected from between 15 and 23 years as the age characteristic of the virtual user. How to select specifically can be determined according to a preset selection function. For example, the preset selection function may be a random function.

The non-numerical user features correspond to user features that are not quantifiable. For example, each seed user has 3 hobbies, which are ranked in turn with corresponding probabilities. For example, the interest of the seed user a is running, drawing, listening to music, and the probabilities corresponding to a1, a2, a 3; the interest of the seed user B is singing, running and drawing, and the corresponding probabilities are B1, B2 and B3; based on seed user a and seed user B, constructing interest preferences of virtual user C may include: running, drawing, listening to music, singing; the probabilities respectively correspond to (a1+ b2)/2, (a2+ b2)/2, a3/2 and a1/2 in sequence; and selecting one or more top-ranked hobbies as hobbies of the virtual user C according to probability ranking. For example, the top 3 or 2 hobbies are selected as the hobbies of the virtual user C. In a specific implementation, if p hobbies of the seed user are extracted, the number of hobbies of the generated virtual user C is not more than p. And p is an integer less than 1. For example, the geographical location where the user appears frequently may also be determined in the manner of the above-mentioned hobbies.

The generation of the numerical user characteristics and the non-numerical user characteristics of a virtual user is completed, the determination of the user characteristics of the virtual user is completed, and then the user identification is formed for the virtual user, so that a virtual user which is similar to the seed user enough is completely formed.

In a specific implementation process, when the virtual user is generated in step S110, the virtual user may be based on user characteristics of all seed users, or may be based on only 2 or more than 2 seed users of a part of seed users. For example, the virtual user a is generated based on the user characteristics of seed user a1, seed user a2, and seed user A3, then the virtual user a is very similar to seed user a1, seed user a2, and seed user A3. However, the seed users may include seed users such as seed user B and seed user C in addition to seed user a1, seed user a2, and seed user A3.

In a specific implementation process, the generation of the virtual user provided in the second embodiment and the third embodiment is not limited, and the embodiment further provides a geometric construction method, including:

constructing the coordinates between the first seed user and other seed users by taking the first seed user as an origin coordinate; connecting wires; the first seed user can be any one of seed users;

and taking a cut-off point on each vector, and constructing the virtual user by using the vector relation between the cut-off point and the first seed user. The user characteristics of the virtual user here depend on the distance from the cut-off point to the two seed users. In order to expand the seed users as much as possible, at least half of the seed users in all the seed users may be polled in sequence to be used as the first seed user to construct the virtual user.

As shown in fig. 4, a line segment a is formed between the first sub-user and the second sub-user, and a cut-off point B is randomly taken on the line segment a, where the similarity between the constructed virtual user D and the user characteristics of the first sub-user and the second sub-user is determined. For example, the first seed user is 20 years old, the second seed user is 30 years old, the cut point B is located at the middle of the line segment a, and the constructed virtual user may be 25 years old. Typically, when a particular implementation is implemented, segments are generally selected where segment a is smaller than the specified segment length. The line segment a is indicated by a double-headed arrow in fig. 3.

Of course, there are many methods for determining the user characteristics of the virtual user and constructing the virtual user in the specific implementation process, and the method has the characteristic of simple and convenient implementation.

Example four:

In this embodiment, the step S130 may include: calculating the similarity between the alternative sample user and the seed user based on the user characteristics of the seed user; and determining the alternative sample users meeting the negative sample user condition as the negative sample users in the sample users based on the similarity.

In the prior art, when a target user is located, negative sample users are all randomly selected users, which may cause a seed user or a great deal of similarity with the seed user among the negative sample users, which obviously causes an inaccurate location of the target user. In addition, in the prior art, all users except the seed user may be regarded as negative sample users, which obviously causes the phenomenon that the positive sample users are far less than the negative sample users, and data imbalance between the positive sample users and the negative sample users occurs. In this embodiment, on one hand, the number of positive sample users is expanded through the generation of virtual users, and meanwhile, negative sample users are screened from alternative users based on the user characteristics of seed users, which obviously reduces the number of negative sample users.

In this embodiment, the similarity between the candidate sample user and the seed user is calculated by using the user characteristics of the seed user in this embodiment, and various similarity calculation methods may be used for the similarity calculation here, which may be referred to in the prior art and will not be described in detail here.

And after the similarity is calculated, selecting a candidate sample user with the similarity small enough to the seed user as the negative sample user. For example, a user whose similarity is smaller than a specified threshold is selected as the negative sample user. The specified threshold here is usually a small value. For another example, the similarity degrees are ranked from small to large, and m top ranked candidate sample users are selected as the negative sample users.

Fig. 5A and fig. 5B are schematic diagrams of determining an effect of a negative sample user by using the user characteristics of the seed user according to this embodiment. Since the seed user is a positive sample user; in this embodiment, negative sample users are obtained based on similarity calculation, and obviously, the number of N sample users obtained based on the user characteristics of the seed user is greater than the number M of the seed users.

In the embodiment, the negative sample user is determined in a similarity calculation mode, so that the difference between the positive sample user and the negative sample user can be ensured to be large enough, and the phenomena of insufficient positioning accuracy of target users, small positioning quantity of the target users and the like caused by the similarity between the positive sample user and the negative sample user are reduced.

Example five:

as shown in fig. 1, the present embodiment provides a target user targeting method, where the method includes: step S110: generating a virtual user based on the user characteristics of the seed user; the seed user can be a user provided by an advertiser or a user determined according to historical advertisement data of the advertisement; step S120: the seed user and the virtual user are taken together as a positive sample user; in this way, the expansion of the positive sample user is realized through the generation of the virtual user; step S130: determining a negative sample user; for example, users other than the seed user may be randomly selected as the negative example user, or a part of users may be selected from the negative example users as the negative example users; step S140: respectively extracting user characteristics of the positive sample user and the negative sample user; the step S140 may include: performing model training by using the user characteristics of the positive sample user and the negative sample user to determine a classification model of a selected target user; step S150: and based on the orientation parameters, positioning a target user meeting a preset similarity condition with the seed user.

In this embodiment, model training is performed using user characteristics of sample users, where the model training results in a classification model that selects a target user and a non-target user from a large number of users. In this embodiment, the classification model obtained by the model training may include a positioning algorithm for positioning the target user, and information of each known quantity in the positioning algorithm.

As shown in fig. 6, the step S140 in this embodiment may specifically include:

step S141: dividing the positive sample user and the negative sample user into a training set, a verification set and a test set;

step S142: performing model training by using different training algorithms by using the training set;

step S143: verifying whether the model training needs to be continued by using the verification set;

step S144: after stopping the model training, carrying out effect evaluation on the alternative models obtained by each training algorithm by using the test set;

step S145: based on the effectiveness evaluation, one of the candidate models is selected as the classification model.

In this embodiment, the sample users are always divided into three sets; these three sets are the training set, the validation set, and the test set, respectively. The sample users in step S141 include positive sample users and negative sample users. The three sets may be selected to each include a positive sample user and a negative sample user. In this embodiment, any two of the training set, the validation set, and the test set include different users to ensure that the most suitable classification model is obtained.

Model training is performed on each of the training algorithms using a training set, and a validation set is used to determine whether each of the training algorithms can stop training. In this way, if 10 training algorithms are used for model training, after training through the training set and verification through the verification set, 10 candidate models of classification models capable of user classification will be obtained. In this embodiment, each candidate model is subjected to effect evaluation by using a test set, and the candidate model with the highest accuracy is selected as the classification model, so that it is ensured that the target user can be accurately located.

For example, a candidate model a is obtained by using a training set pair, and user characteristics in a verification set are input into the candidate model a; the alternative model A carries out information processing on the user characteristics of each user in the verification set, the obtained processing result determines whether the user is a positive sample user or a negative sample user, and then the processing result is compared with the data of the sample user, so that the correct probability of the processing result can be obviously determined. If the correct probability of the processing result of the alternative model A exceeds the specified probability, the training of the alternative model A is considered to be stopped.

In step S144, the user characteristics of the sample users in the test set are input into the alternative model a, and the correct probability of the alternative model a is reached. If there are candidate model B and candidate model C, the three candidate models are compared in step S135 to obtain the correct probabilities in step S134, and the model with the highest correct probability is selected as the final classification model.

Therefore, the model training in this embodiment can determine not only the known quantity for positioning the target user, but also a positioning algorithm suitable for positioning, so as to realize accurate positioning of the target user.

For example, in this embodiment, various learning devices such as an ad hoc network and the like having learning capabilities may be used to employ an LR-FTRL algorithm and an algorithm Based on artificial Rule-Based, and specifically, the LR-FTRL algorithm may be selected to obtain an LR model; the RuleSet model can be obtained by the Rule-Based algorithm.

LR in the LR-FTRL algorithm is an abbreviation of Logistic Regression Logistic, and a corresponding Chinese Regression model is a generalized linear Regression analysis model and is commonly used in the fields of data mining, automatic disease diagnosis, economic prediction and the like. FTRL in the LR-FTRL algorithm is an abbreviation for Follow the regulated Leader. In a word, the LR-FTRL calculation is an online optimization regression model algorithm, supports the learning rate of each feature dimension considered by each dimension, and has the characteristics of good convergence performance and high training speed.

As shown in fig. 7, a training flow diagram provided based on the method of this embodiment includes four parts:

a first part:

and constructing the labeled data. The annotation data construction here includes determining positive and negative exemplar users.

The over-sampling process with the seed users has more positive sample users than the number of seed users. The certification sample users may include virtual users determined based on the user characteristics of the seed users, seed users expanded based on the seed users of advertisements similar to the advertisements corresponding to the seed users, and the like.

The alternative users are undersampled for user selection, and negative sample users are obtained, wherein the number of the negative sample users is less than that of the selected users in fig. 7.

A second part:

user characteristics of a sample user are obtained. Various characteristics of the user are maintained in the database, such as a user portrait formed based on the age, sex, hobbies, activity position, education background and the like of the user, social relations of the user, advertising behaviors aiming at the advertisements, and App behaviors by using the App; and (4) extracting the features, and performing feature normalization and discretization to obtain the user features. The normalization process here uses unification of feature names representing the same information, and the like.

And a third part:

and (5) determining a classification model. The determination of the classification model comprises the steps of obtaining data of a positive sample user and a negative sample user and extracting user characteristics of the sample users; dividing sample users into a training set, a verification set and a test set, training by using different training algorithms to obtain a training model, determining when to stop training the training model by using the verification set, and selecting one training model from the training models obtained by training by using the different algorithms by using the test set as a final classification model.

The fourth part:

and (4) online user orientation. The online user orientation here includes the location of the target user. Firstly, extracting the user characteristics of all users, and utilizing a classification model to carry out user classification estimation to obtain a parameter for measuring the similarity between the user and a seed user. And then selecting the users with the similarity of the first N bits as target users, so that the rest users are non-target users, and finishing user classification. Finally, corresponding advertisements are released to the target users, so that the advertisement release effect is improved, and the conversion rate of the advertisements is improved.

Example six:

as shown in fig. 8, the present embodiment provides a target user orientation apparatus, including:

a generating unit 110, configured to generate a virtual user based on the user characteristics of the seed user;

a first determining unit 120, configured to use the seed user and the virtual user together as a positive sample user;

a second determining unit 130 for determining negative example users;

a training unit 140, configured to extract user characteristics of the positive sample user and the negative sample user respectively;

and a positioning unit 150, configured to position, based on the orientation parameter, a target user that meets a predetermined similarity condition with the seed user.

The target user targeting device described in this embodiment may be a device applied to various electronic devices with information processing, and a device using a processor applied to a server of a social platform or an advertisement delivery platform.

The specific structure of the generating unit 110, the first determining unit 120, the training unit 140 and the positioning unit 150 may correspond to a processor or a processing circuit. The processor may comprise a central processing unit, microprocessor, digital signal processor, or programmable array, or the like. The processing circuitry may comprise an application specific integrated circuit or the like.

The generating unit 110, the first determining unit 120, the training unit 140, and the positioning unit 150 may be integrated with a same processor or processing circuit, or may be respectively corresponding to different processors or processing circuits. When at least two of the generating unit 110, the first determining unit 120, the training unit 140, and the positioning unit 150 correspond to the same processor, the processor may use time division multiplexing or concurrent threads to respectively complete the functions of the generating unit 110, the first determining unit 120, the training unit 140, and the positioning unit 150. For example, the processor or the processing circuit may implement the functions of the generating unit 110, the first determining unit 120, the training unit 140, and the positioning unit 150 by executing predetermined instructions.

As shown in fig. 9, the present embodiment further provides another device for directing the target user, where the device includes a processor 220, a storage medium 240, at least one external communication interface 210, and a display screen 250; the processor 220, the storage medium 304, and the external communication interface 301 are all connected by a bus 303. The storage medium 240 has stored thereon computer-executable instructions; the processor 220 executes the computer-executable instructions stored in the storage medium 304 to at least: generating a virtual user based on the user characteristics of the seed user; the seed user and the virtual user are taken together as a positive sample user; determining a negative sample user; respectively extracting user characteristics of the positive sample user and the negative sample user; training based on the user characteristics of the positive sample user and the negative sample user to determine an orientation parameter; and based on the orientation parameters, positioning a target user meeting a preset similarity condition with the seed user.

In short, the device described in this embodiment can generate a virtual user based on the user characteristics of the sub-users, and then use the sub-users and the virtual user together as the positive sample user, so that the number of the positive sample users is increased, the number of the positive samples and the number of the negative samples can be reduced, and the target user is inaccurately positioned due to the fact that the identification capability of the directional parameters obtained by training caused by imbalance is not strong enough for the target user.

Example seven:

as shown in fig. 8, the present embodiment provides a target user orientation apparatus, including: a generating unit 110, configured to generate a virtual user based on the user characteristics of the seed user; here, the generating unit 110 may be specifically configured to determine a first position of the seed user in a feature space; determining a second location based on at least two of the first locations; determining a user characteristic of the virtual user based on the second location and generating a virtual user; a first determining unit 120, configured to use the seed user and the virtual user together as a positive sample user; a second determining unit 130 for determining negative example users; a training unit 140, configured to extract user characteristics of the positive sample user and the negative sample user respectively; and a positioning unit 150, configured to position, based on the orientation parameter, a target user that meets a predetermined similarity condition with the seed user.

In this embodiment, the hardware structure corresponding to the generating unit 110 may refer to the foregoing embodiments, in this embodiment, the generating unit 110 forms a feature space, and determines a first position of a seed user in the feature space; determining a second position according to the first positions of the at least two seed users, and determining the user characteristics of the virtual user based on the values of the user characteristics corresponding to the second position in the characteristic space, thereby completing the generation of the virtual user; the method has the characteristics of simple structure, simple realization and large similarity between the constructed virtual user and the seed user.

In summary, the present embodiment provides an apparatus, which is capable of forming a virtual user according to the user characteristics of a seed user, and using the virtual user and the seed user together as a positive sample user. In this way the training unit 140 can have more sample data and thus will obtain orientation parameters that will enable a more accurate positioning of the target user.

Example eight:

as shown in fig. 8, the present embodiment provides a target user orientation apparatus, including: a generating unit 110, configured to generate a virtual user based on the user characteristics of the seed user; the generating unit is specifically configured to extract a numerical user feature from the user features of the seed user; determining a selectable range of numerical user characteristics based on the numerical user characteristics; selecting a value from the selectable range as a numerical user characteristic of the virtual user; extracting non-numerical user features in the user features of the seed user, and giving probability values to the non-numerical user features; determining non-numerical user characteristics of the virtual user based on the probability values; a first determining unit 120, configured to use the seed user and the virtual user together as a positive sample user; a second determining unit 130 for determining negative example users; a training unit 140, configured to extract user characteristics of the positive sample user and the negative sample user respectively; and a positioning unit 150, configured to position, based on the orientation parameter, a target user that meets a predetermined similarity condition with the seed user.

In this embodiment, the generation unit 110 may also refer to the foregoing embodiments, except that in this embodiment, the generation unit 110 divides the user features into numerical user features and non-numerical user features, and then constructs the user features of the virtual users according to the characteristics of the non-numerical user features of the numerical user features, so as to complete the generation of the virtual users.

Example nine:

as shown in fig. 8, the present embodiment provides a target user orientation apparatus, including: a generating unit 110, configured to generate a virtual user based on the user characteristics of the seed user; a first determining unit 120, configured to use the seed user and the virtual user together as a positive sample user; a second determining unit 130 for determining negative example users; here, the second determining unit 130 is specifically configured to calculate, based on the user characteristics of the seed user, a similarity between the candidate sample user and the seed user; determining the alternative sample users meeting the negative sample user condition as the negative sample users in the sample users based on the similarity; a training unit 140, configured to extract user characteristics of the positive sample user and the negative sample user respectively; and a positioning unit 150, configured to position, based on the orientation parameter, a target user that meets a predetermined similarity condition with the seed user.

In this embodiment, the negative sample user will be selected according to the user characteristics of the seed user. The negative sample user is a candidate user with a small similarity to the seed user, for example, a user with a similarity smaller than a preset threshold, or a candidate user ranked according to the similarity to the seed user, and the similarity between the user and the seed user is small, so that the probability that the user can watch the same advertisement or execute the same operation is obviously low, and the negative sample user can be used as a negative sample user for training to obtain the targeting parameter. The specific structure of the generating unit 110 in this embodiment may be referred to in the foregoing embodiments, but in this embodiment, the generating unit 110 develops the negative sample user based on the user characteristics of the seed user, and obviously, compared with randomly selecting a user as the negative sample user, it may reduce the phenomenon of insufficient accuracy caused by using a user very similar to the seed user as the negative sample user.

Example ten:

as shown in fig. 8, the present embodiment provides a target user orientation apparatus, including: a generating unit 110, configured to generate a virtual user based on the user characteristics of the seed user; a first determining unit 120, configured to use the seed user and the virtual user together as a positive sample user; a second determining unit 130 for determining negative example users; a training unit 140, configured to extract user characteristics of the positive sample user and the negative sample user respectively; the training unit 140 is specifically configured to perform model training by using the user characteristics of the positive sample user and the negative sample user, and determine a classification model of a selected target user; and a positioning unit 150, configured to position, based on the orientation parameter, a target user that meets a predetermined similarity condition with the seed user.

In short, the orientation parameter finally obtained by the training unit 140 in this embodiment is a systematic classification model, and when a target user is subsequently selected from a large number of users, the user can be classified into the target user and a non-target user by operating the classification model, which has a characteristic of accurately positioning the target user.

As a further improvement of this embodiment, the training unit 140 is specifically configured to divide the positive sample users and the negative sample users into a training set, a verification set, and a test set; performing model training by using different training algorithms by using the training set; verifying whether the model training needs to be continued by using the verification set; after stopping the model training, carrying out effect evaluation on the alternative models obtained by each training algorithm by using the test set; based on the effectiveness evaluation, one of the candidate models is selected as the classification model.

In this embodiment, the sample users are divided into three sets, which are a training set, a verification set and a test set, each set includes a positive sample user and a negative sample user, training of the candidate model of the classification model is performed through the training set, whether continuous training is needed is determined through the verification set, and an optimal training model is selected as a final classification model through the test set, so that the obtained training model can further improve the directional accuracy of the target user.

In this embodiment, the specific structure of the training unit 140 may be corresponding to various structures such as a learning machine or a neural network, and the classification model may be obtained by training through the input of data of a sample user. When the classification model of the user classifies other users, the target user can be simply, conveniently and accurately screened out.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all the functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may be separately used as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A method for targeting a target user, the method comprising:

generating a virtual user based on the user characteristics of the seed user;

determining the alternative sample users with the similarity smaller than a similarity threshold value based on the similarity; and are

Taking the alternative sample users with the similarity smaller than the similarity threshold value as negative sample users;

2. The method of claim 1,

generating a virtual user based on the user characteristics of the seed user, including:

determining a first position of the seed user in a feature space;

determining a second location based on at least two of the first locations;

3. The method of claim 1,

extracting numerical user features in the user features of the seed users;

4. The method according to any one of claims 1 to 3,

training based on the user characteristics of the positive sample user and the negative sample user to determine orientation parameters, including:

5. The method of claim 4,

the method for performing model training by using the user characteristics of the sample user to determine the training model of the selected target user comprises the following steps:

6. An apparatus for targeting a target user, the apparatus comprising:

a second determining unit, configured to calculate, based on the user characteristics of the seed user, a similarity between the candidate sample user and the seed user; determining the alternative sample users with the similarity smaller than a similarity threshold value based on the similarity; taking the alternative sample user with the similarity smaller than the similarity threshold as a negative sample user;

the training unit is used for respectively extracting the user characteristics of the positive sample user and the negative sample user; training based on the user characteristics of the positive sample user and the negative sample user to determine an orientation parameter;

7. The apparatus of claim 6,

the generating unit is used for determining a first position of the seed user in a feature space; determining a second location based on at least two of the first locations; determining a user characteristic of the virtual user based on the second location and generating a virtual user.

8. The apparatus of claim 6,

the generating unit is specifically configured to extract numerical user features from the user features of the seed user; determining a selectable range of numerical user characteristics based on the numerical user characteristics; selecting a value from the selectable range as a numerical user characteristic of the virtual user; extracting non-numerical user features in the user features of the seed user, and giving probability values to the non-numerical user features; determining non-numerical user characteristics of the virtual user based on the probability values.

9. The apparatus according to any one of claims 6 to 8,

the training unit is specifically configured to perform model training by using the user characteristics of the positive sample user and the negative sample user, and determine a classification model of a selected target user.

10. The apparatus of claim 9,

the training unit is specifically used for dividing the positive sample users and the negative sample users into a training set, a verification set and a test set; performing model training by using different training algorithms by using the training set; verifying whether the model training needs to be continued by using the verification set; after stopping the model training, carrying out effect evaluation on the alternative models obtained by each training algorithm by using the test set; based on the effectiveness evaluation, one of the candidate models is selected as the classification model.

11. An electronic device, characterized in that the electronic device comprises:

a memory for storing executable instructions;

a processor for implementing the target user targeting method of any one of claims 1 to 5 when executing executable instructions stored in the memory.

12. A computer-readable storage medium having stored thereon executable instructions for, when executed, implementing a target user targeting method as claimed in any one of claims 1 to 5.