CN116257758A

CN116257758A - Model training method, crowd expanding method, medium, device and computing equipment

Info

Publication number: CN116257758A
Application number: CN202310096971.XA
Authority: CN
Inventors: 肖美丽; 李锦添; 齐妙; 王佳捷; 李勇
Original assignee: Hangzhou Netease Cloud Music Technology Co Ltd
Current assignee: Hangzhou Netease Cloud Music Technology Co Ltd
Priority date: 2023-01-18
Filing date: 2023-01-18
Publication date: 2023-06-13

Abstract

The embodiment of the disclosure provides a model training method, a crowd expanding method, a medium, a device and a computing device, and relates to the technical field of computers, wherein the model training method comprises the following steps: acquiring a seed user corresponding to the media resource, and determining a significance index corresponding to the preset media resource characteristic according to the seed user and the preset media resource characteristic; determining a preset media resource characteristic with the significance index being greater than or equal to an index threshold as a target characteristic; and carrying out iterative training on the crowd expansion model based on the seed user, the first target group index of the target characteristics corresponding to the seed user and the target characteristics to obtain a trained crowd expansion model. The crowd expansion model with higher accuracy can be obtained, and further, when the crowd expansion model is used for crowd expansion, the expanded crowd can be obtained more accurately.

Description

Model training method, crowd expanding method, medium, device and computing equipment

Technical Field

Embodiments of the present disclosure relate to the field of computer technology, and more particularly, embodiments of the present disclosure relate to a model training method, crowd expanding method, medium, device, and computing apparatus.

Background

This section is intended to provide a background or context to the embodiments of the disclosure recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.

Crowd expansion is commonly used for putting media resources such as advertisements or marketing activities of merchants, for example, when the advertisements are put, the crowd expansion is performed based on seed crowd provided by advertisers, so that click conversion rate or purchase conversion rate of the advertisements can be effectively improved.

At present, crowd expansion is generally performed based on supervised learning algorithms. Specifically, a specified sample (namely, seed population) is taken as a positive sample, samples are randomly extracted from the rest samples to be taken as negative samples, a classification model is trained, and candidate populations are screened through the trained classification model, so that an expanded population is obtained. However, the expanded population cannot be accurately obtained by the above method.

Disclosure of Invention

The disclosure provides a model training method, a crowd expanding method, a medium, a device and a computing device, so as to solve the problem that expanded crowd cannot be accurately obtained in the current mode.

In a first aspect of embodiments of the present disclosure, there is provided a model training method, including:

Acquiring seed users corresponding to media resources;

determining a significance index corresponding to the preset media resource characteristics according to the seed user and the preset media resource characteristics;

determining a preset media resource characteristic with the significance index being greater than or equal to an index threshold as a target characteristic;

iterative training is carried out on the crowd expansion model based on the seed user, a first target group index of the seed user corresponding to the target characteristic and the target characteristic, so that a trained crowd expansion model is obtained, and the crowd expansion model is used for crowd expansion based on the seed user.

In a second aspect, an embodiment of the present disclosure provides a crowd expanding method, including:

acquiring candidate users corresponding to the media resources;

determining target population indexes of target features corresponding to candidate users according to seed users and target features corresponding to the candidate users and a crowd expansion model, wherein the crowd expansion model is trained by using the model training method according to the first aspect of the disclosure;

inputting the candidate users and the target group indexes of the target characteristics corresponding to the candidate users into a crowd expansion model to obtain predicted probability values corresponding to the candidate users, wherein the predicted probability values are used for determining whether the candidate users are expandable users or not.

In a third aspect, an embodiment of the present disclosure provides a model training apparatus, including:

the acquisition module is used for acquiring seed users corresponding to the media resources;

the first determining module is used for determining a significance index corresponding to the preset media resource characteristics according to the seed user and the preset media resource characteristics;

the second determining module is used for determining that the preset media resource characteristics with the significance index being greater than or equal to the index threshold value are target characteristics;

the training module is used for carrying out iterative training on the crowd expansion model based on the seed user, the first target group index of the corresponding target characteristic of the seed user and the target characteristic to obtain a trained crowd expansion model, and the crowd expansion model is used for carrying out crowd expansion based on the seed user.

In a fourth aspect, an embodiment of the present disclosure provides a crowd expanding device, including:

the acquisition module is used for acquiring candidate users corresponding to the media resources;

the determining module is used for determining target population indexes of target features corresponding to candidate users according to seed users and target features corresponding to the candidate users and a crowd expansion model, wherein the crowd expansion model is trained by the model training method according to the first aspect of the disclosure;

And the processing module is used for inputting the candidate users and the target group indexes of the target characteristics corresponding to the candidate users into the crowd expansion model to obtain the prediction probability values corresponding to the candidate users.

In a fifth aspect, embodiments of the present disclosure provide a computing device comprising: a processor, a memory communicatively coupled to the processor;

the memory stores computer-executable instructions;

the processor executes the computer-executable instructions stored in the memory to implement the model training method as described in the first aspect of the present disclosure or the crowd expansion method as described in the second aspect of the present disclosure.

In a sixth aspect, an embodiment of the present disclosure provides a storage medium, where computer program instructions are stored, and when the computer program instructions are executed, implement the model training method according to the first aspect or the crowd expansion method according to the second aspect of the present disclosure.

In a seventh aspect, embodiments of the present disclosure provide a computer program product comprising a computer program which, when executed by a processor, implements the model training method according to the first aspect of the present disclosure or the crowd expansion method according to the second aspect.

According to the model training method, the crowd expanding method, the medium, the device and the computing equipment provided by the embodiment of the disclosure, through obtaining seed users corresponding to media resources, according to the seed users and preset media resource characteristics, a significance index corresponding to the preset media resource characteristics is determined; determining a preset media resource characteristic with the significance index being greater than or equal to an index threshold as a target characteristic; and carrying out iterative training on the crowd expansion model based on the seed user, the first target group index of the target characteristics corresponding to the seed user and the target characteristics to obtain a trained crowd expansion model. According to the method and the device for training the crowd expansion model, the target characteristics are determined according to the significance indexes corresponding to the preset media resource characteristics determined by the seed users and the preset media resource characteristics, so that the target characteristics can be obtained more accurately, further, training is performed based on the target characteristics, the seed users and the first target group indexes corresponding to the target characteristics by the seed users, a trained crowd expansion model is obtained, the crowd expansion model with higher accuracy can be obtained, and further, when the crowd expansion model is used for crowd expansion, the expanded crowd can be obtained more accurately.

Drawings

The above, as well as additional purposes, features, and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description when read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings, in which:

fig. 1 is a schematic view of an application scenario provided in an embodiment of the present disclosure;

FIG. 2 is a flow chart of a model training method provided in an embodiment of the present disclosure;

FIG. 3 is a flow chart of a model training method provided by another embodiment of the present disclosure;

FIG. 4 is a schematic diagram of iterative training of crowd expansion models according to an embodiment of the present disclosure;

FIG. 5 is a flow chart of a crowd expansion method according to an embodiment of the disclosure;

FIG. 6 is a schematic structural diagram of a model training apparatus according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of a crowd expanding device according to an embodiment of the disclosure;

FIG. 8 is a schematic diagram of a storage medium according to an embodiment of the disclosure;

fig. 9 is a schematic structural diagram of a computing device according to an embodiment of the present disclosure.

In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

The principles and spirit of the present disclosure will be described below with reference to several exemplary embodiments. It should be understood that these embodiments are presented merely to enable one skilled in the art to better understand and practice the present disclosure and are not intended to limit the scope of the present disclosure in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Those skilled in the art will appreciate that embodiments of the present disclosure may be implemented as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the following forms, namely: complete hardware, complete software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.

According to an embodiment of the disclosure, a model training method, a crowd expanding method, a medium, a device and a computing device are provided.

In this context, it is to be understood that the terms involved:

target group index (Target Group Index, TGI), tgi= [ proportion of people with certain characteristics in target group/proportion of people with the same characteristics in the whole ] = canonical number 100.

Furthermore, any number of elements in the figures is for illustration and not limitation, and any naming is used for distinction only and not for any limiting sense.

In addition, the data related to the disclosure may be data authorized by the user or fully authorized by each party, and the collection, transmission, use and the like of the data all conform to the requirements of national related laws and regulations, and the embodiments of the disclosure may be mutually combined.

The principles and spirit of the present disclosure are explained in detail below with reference to several representative embodiments thereof.

Summary of The Invention

The inventor finds that when the crowd is expanded, the crowd can be expanded based on a semi-supervised learning algorithm or a supervised learning algorithm in a related technology. Taking crowd expansion based on a supervised learning algorithm as an example, specifically taking a designated sample (namely a seed crowd) as a positive sample, randomly extracting samples from other samples as negative samples to train a classification model, screening candidate crowds through the trained classification model to obtain expanded crowds, and outputting feature importance based on the trained classification model. However, the expanded crowd can not be accurately obtained through the mode, and the following defects are particularly caused: (1) Taking the media resource as an advertisement as an example, because the advertisement content is not fed back by the user in the advertisement putting process, the advertisement content cannot be interested by the user, so that a real negative sample cannot be obtained by randomly extracting samples as the negative sample, the ratio of the number of samples to the number of specified samples directly influences the training and the prediction of the classification model, and in a real scene, the ratio is always unknown; (2) In the training of the classification model, model parameters need to be adjusted based on a test set, a large amount of calculation time and manual intervention are needed, and the similar calculation of large-scale crowds cannot be quickly realized, namely the crowd expansion cannot be quickly performed.

In another related technology, taking a media resource as an advertisement as an example, collecting a user finally obtained aiming at the delivery of a certain advertisement, sampling the user for a plurality of times, performing cluster analysis, recording the characteristics used by each cluster, evaluating the importance degree of the characteristics according to the number of times the characteristics are used, and further expanding the crowd according to the important characteristics. The related art has mainly the following disadvantages: (1) The clustering method cannot identify the importance degree of the features, only records the times of using the features, but cannot record the distinguishing degree of single features in the clusters, and the importance evaluation of the features is biased, so that expanded people cannot be accurately obtained when people are expanded according to the important features; (2) Multiple clustering of samples is required, especially in large data environments where resource consumption is relatively large.

Based on the above problems, the present disclosure provides a model training method, a crowd expansion method, a medium, a device and a computing device, which determine the significance of features through an unsupervised algorithm, assign new feature values to the features, construct a crowd expansion model based on seed users, determine the weight of each feature, and obtain a prediction probability value corresponding to the seed users through weighted accumulation, so that a crowd expansion model with higher accuracy can be obtained, and further, when the crowd expansion model is used for crowd expansion, an expanded crowd can be obtained more accurately.

Application scene overview

An application scenario of the solution provided in the present disclosure is first illustrated with reference to fig. 1. Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present disclosure, as shown in fig. 1, where the application scenario includes: a server cluster 11 and a terminal 12. The server cluster 11 includes a plurality of servers 111 and a memory 112, and the terminal 12 may be a tablet computer, a notebook computer, a desktop computer, a smart home appliance, or the like. The server 111 is used for training the crowd expansion model, acquiring data from the memory 112 during training, and storing the generated data in the memory 112. In addition, the training process and the terminal 12 communicate via a wireless network or a wired network.

In addition, the embodiment of the disclosure can be applied to crowd expansion scenes. For example, in advertising, population expansion is performed based on the seed population provided by the advertiser.

It should be noted that fig. 1 is only a schematic diagram of an application scenario provided by an embodiment of the present disclosure, and the embodiment of the present disclosure does not limit the devices included in fig. 1 or limit the positional relationship between the devices in fig. 1. The model training method provided by the embodiment of the disclosure can be applied to a server, and the server can be an independent server or can also be a service cluster or the like.

Exemplary method

A method for model training according to an exemplary embodiment of the present disclosure is described below with reference to fig. 2 in conjunction with the application scenario of fig. 1. It should be noted that the above application scenario is only shown for the convenience of understanding the spirit and principles of the present disclosure, and the embodiments of the present disclosure are not limited in any way in this respect. Rather, embodiments of the present disclosure may be applied to any scenario where applicable.

First, a model training method is described by way of specific embodiments.

Fig. 2 is a flowchart of a model training method according to an embodiment of the present disclosure. The method of the embodiments of the present disclosure may be applied in a computing device, which may be a server or a server cluster, or the like. As shown in fig. 2, the method of the embodiment of the present disclosure includes:

s201, acquiring a seed user corresponding to the media resource.

In the embodiment of the disclosure, the seed user corresponding to the media resource may be input by the user to the electronic device executing the embodiment of the method, or may be sent by other devices to the electronic device executing the embodiment of the method. Illustratively, the media resource is an advertisement, and after the advertiser uploads the seed user corresponding to the advertisement to the electronic device executing the embodiment of the method, the seed user corresponding to the advertisement may be obtained.

S202, according to the seed user and the preset media resource characteristics, determining a significance index corresponding to the preset media resource characteristics.

In this step, the preset media asset characteristic is, for example, the number of days of song playing of approximately 30 days. After the seed users corresponding to the media resources are obtained, the significance indexes corresponding to the preset media resource characteristics can be determined according to the seed users and the preset media resource characteristics, and the significance indexes are used for representing the significance degrees of the preset media resource characteristics. For how to determine the significance index corresponding to the preset media resource feature according to the seed user and the preset media resource feature, reference may be made to the subsequent embodiments, which are not described herein.

S203, determining the preset media resource characteristics with the significance index being greater than or equal to the index threshold as target characteristics.

In this step, the index threshold may be determined as needed, which is not limited by the present disclosure. After the significance indexes corresponding to the preset media resource features are determined, the preset media resource features can be ordered according to the order of the significance indexes from high to low, and the preset media resource features with the significance indexes being larger than or equal to the index threshold value are determined as target features.

Optionally, determining the target feature may include: displaying the preset media resource characteristics according to the sequence from high to low of the significance index; and determining target characteristics in response to the selected operation facing the preset media resource characteristics.

For example, after the significance index corresponding to the preset media resource feature is determined, the preset media resource feature may be displayed according to the order of the significance index from high to low, and after the preset media resource feature with the higher significance index is selected by the user, the target feature may be determined to be the preset media resource feature selected by the user.

S204, performing iterative training on the crowd expansion model based on the seed user, the first target population index of the target feature corresponding to the seed user and the target feature to obtain a trained crowd expansion model.

The crowd expansion model is used for carrying out crowd expansion based on seed users.

In this step, the first target population index of the seed user corresponding to the target feature may be obtained based on the seed user and the preset media resource feature, which may be referred to in the following embodiments. Taking an advertisement as an example, taking a media resource as an advertisement, aiming at the specificity of an advertisement scene, only a batch of positive feedback user samples (namely seed users) can be collected after each advertisement delivery, in this case, the image analysis is required to be performed through the mastered seed users based on a pure unsupervised learning algorithm, and similar people are searched through the core image characteristics (namely target characteristics), so that the next advertisement delivery is performed, and the delivery effect is improved. In the advertisement business scene, the analysis of the user portraits is only the first step of analyzing the advertisement putting effect, more importantly, the advertisement strategy is adjusted according to the current advertisement effect, then the accurate putting crowd is selected, namely, the crowd expansion is carried out through a known advertisement audience crowd through a similar crowd expansion (lookalike) method. According to the embodiment of the disclosure, the target characteristics with higher significance indexes in the seed population are obtained, so that the population expansion model is constructed. Wherein the default label value for a given sample (i.e., seed user) is 1, i.e., the probability that the batch of samples is positive samples is 100%. In the construction and iteration of the crowd expansion model, the goal is to train out the weight of the target feature, the probability that the sample is a positive sample is obtained through the weighted calculation of the target feature, and the difference between the probability and the actual probability (100%) is minimized. In the step, after the target characteristics are determined, iterative training can be performed on the crowd expansion model based on the seed users, the first target population indexes of the target characteristics corresponding to the seed users and the target characteristics, and the trained crowd expansion model is obtained. For specific how to obtain the trained crowd expansion model, reference may be made to the following examples, which are not described here again.

According to the model training method provided by the embodiment of the disclosure, the significance index corresponding to the preset media resource characteristic is determined according to the seed user and the preset media resource characteristic by acquiring the seed user corresponding to the media resource; determining a preset media resource characteristic with the significance index being greater than or equal to an index threshold as a target characteristic; and carrying out iterative training on the crowd expansion model based on the seed user, the first target group index of the target characteristics corresponding to the seed user and the target characteristics to obtain a trained crowd expansion model. According to the embodiment of the disclosure, the target characteristics are determined according to the significance indexes corresponding to the preset media resource characteristics determined by the seed users and the preset media resource characteristics, so that the target characteristics can be obtained more accurately, further, training is performed based on the target characteristics, the seed users and the first target population indexes corresponding to the target characteristics by the seed users, a trained population expansion model is obtained, a population expansion model with higher accuracy can be obtained, and further, when the population expansion model is used for population expansion, expanded population can be obtained more accurately.

Fig. 3 is a flowchart of a model training method according to another embodiment of the present disclosure. On the basis of the above embodiments, the embodiment of the present disclosure further describes a model training method. As shown in fig. 3, a method of an embodiment of the present disclosure may include:

S301, acquiring a seed user corresponding to the media resource.

A detailed description of this step may be referred to the related description of S201 in the embodiment shown in fig. 2, and will not be repeated here.

In the embodiment of the present disclosure, step S202 in fig. 2 may further include five steps S302 to S306 as follows:

s302, acquiring the total number of users in a preset time range.

Illustratively, the preset time range is, for example, one month, one quarter, one week, or the like, and the full amount of users within the preset time range is, for example, the full amount of users living in months.

S303, based on a preset box division rule, determining a first user proportion of each box division corresponding to the preset media resource characteristics of the total users and a second user proportion of each box division corresponding to the preset media resource characteristics of the seed users.

Illustratively, after a full number of active users are acquired, preset binning rules such as: based on the total number of the monthly active users, the preset media resource characteristics are classified by combining the service scene, or the preset media resource characteristics can be classified by selecting an equal-frequency classification mode, the user proportion of each classified case is not lower than 5% in theory, and the number of the classified cases of each preset media resource characteristic is usually less than or equal to 20 so as to ensure the stability of a calculation result. After the preset media asset characteristics are binned, a first user duty cycle (e.g., represented by ratio_all) of the total number of users corresponding to each bin of the preset media asset characteristics may be determined, and a second user duty cycle (e.g., represented by ratio_sample) of the seed user corresponding to each bin of the preset media asset characteristics may be determined.

S304, determining a second target group index corresponding to each sub-box of the preset media resource characteristics and a first target group index corresponding to the preset media resource characteristics of the seed users according to the first user duty ratio and the second user duty ratio.

Illustratively, after the first user duty cycle and the second user duty cycle are obtained, a second target population index (i.e., TGI) corresponding to each bin of the preset media asset profile may be determined according to the following formula one:

it can be appreciated that after the second target population index corresponding to each bin is determined according to the above formula one, since each seed user belongs to the corresponding bin, the first target population index of the seed user corresponding to the preset media resource feature can be determined according to the second target population index corresponding to each bin.

S305, determining a summary value of a second user duty ratio corresponding to a second target population index which is larger than the target population index mean value under the preset media resource characteristics.

Wherein the target population index mean is obtained from the second target population index.

In this step, the target population index mean may be obtained according to the second target population index corresponding to each bin of all the preset media resource features. Optionally, the target population index mean is obtained according to at least one of: determining the average value of the target population indexes as the average value of the second target population indexes; weighting the second target group index and the second user duty ratio corresponding to the second target group index to obtain a target group index average value; sorting the second target population indexes, and determining the median in the sorted second target population indexes as the target population index mean value; and sequencing the number of the total users corresponding to the second target population indexes, and determining the second target population index corresponding to the maximum number as a target population index mean value.

For example, an average value of the second target population indexes corresponding to each bin of all the preset media resource features may be taken as a target population index average value; the product of the second target population index and the second user duty ratio corresponding to the second target population index can be added, and then the average value is taken as the average value of the target population indexes; the second target group indexes can be ranked from high to low, and the median in the ranked second target group indexes is determined to be the target group index mean value; the number of the total number of users corresponding to the second target population index can be ranked, and the second target population index corresponding to the maximum number is determined to be the target population index average value.

After determining the target population index average (for example, avg_tgi), the step may determine a summary value of the second user duty ratio corresponding to the second target population index that is greater than the target population index average under the preset media resource feature according to the following formula two, where the summary value may also be referred to as a significant duty ratio:

ratio_importent = Σ (ratio_tgi), if TGI > avg_tgi equation two

Wherein ratio_important represents a significant duty cycle; ratio TGI represents a second user duty cycle corresponding to a second target population index that is greater than the target population index mean.

For example, assume that the preset media asset characteristic is about 30 days of song playing, and the preset media asset characteristic corresponds to six sub-boxes, which are respectively: 1 day, 2 to 7 days, 7 to 14 days, 14 to 21 days, 21 to 28 days and 30 days. TGI (i.e., second target population index for each bin) and significance for comparison determined after the bins were performed are shown in table 1:

TABLE 1

Based on table 1, assuming that the target population index average is 150, it may be determined that the summary value (i.e., the significant duty) of the second user duty corresponding to the second target population index greater than the target population index average 150 under the preset media resource characteristic is 30%.

S306, determining a significance index corresponding to the preset media resource feature according to the summary value and a second target group index corresponding to each sub-box of the preset media resource feature.

In the step, after the summary value is determined, a significance index corresponding to the preset media resource feature can be determined according to the summary value and a second target group index corresponding to each sub-bin of the preset media resource feature.

Further, optionally, determining the significance index corresponding to the preset media resource feature according to the summary value and the second target population index corresponding to each sub-bin of the preset media resource feature may include: determining standard deviation of the second target population index under the preset media resource characteristics according to the average value of the second target population index under the preset media resource characteristics and the second target population index; and determining the significance index corresponding to the preset media resource characteristics as the product of the total value and the standard deviation.

For example, the average value (for example, avg_ tgi) of the second target population index under the preset media resource characteristic may be determined according to the second target population indexes corresponding to different bins under the preset media resource characteristic, and then the standard deviation of the second target population index under the preset media resource characteristic may be determined according to the following formula three:

wherein tgi represents a second target population index corresponding to different bins under the preset media resource characteristics, and n is the total number of bins.

Based on the above example of table 1, the average value of the second target population index under the preset media asset characteristic may be determined to be 107, and std=94 may be determined according to the above formula three.

The significance index (e.g., represented by score_importent) corresponding to the preset media asset feature may be determined as the product of the summary value and the standard deviation according to the following equation four:

score_importantizer_important×std equation four

Based on the example of table 1 above, after determining that the aggregate value is 30% and std=94, the significance index may be determined as: score_importent=94×0.3=28.2.

S307, determining the preset media resource characteristics with the significance index being greater than or equal to the index threshold as target characteristics.

A detailed description of this step may be referred to the related description of S203 in the embodiment shown in fig. 2, and will not be repeated here.

In the embodiment of the present disclosure, the step S204 in fig. 2 may further include three steps S308 to S310 as follows:

s308, carrying out normalization processing on the first target population index of the target feature corresponding to the seed user to obtain a normalized first target population index.

Wherein the normalization processing includes maximum value normalization processing or normal distribution normalization processing.

It will be appreciated that assignment of the target features is required before the construction of a suitable crowd-sourcing model can begin. In the embodiment of the disclosure, a set of assignment schemes with small calculation amount and close to positive sample probability (namely 100%) are constructed through an unsupervised calculation method. Based on the above equation one and the probability equation, the equation derivation shown in the following equation five can be obtained:

formula five

Wherein M represents a full sample; m is M _i Representing a specified sample; m is M _ij Expressed in a designated sample M _i A sample having a feature j; j represents a feature; m is M _j A sample having a feature j in the total number of samples M; p (y= 1|F) _j ) Representing the probability that Y (i.e., the sample tag) is equal to 1, provided that Feature (F) is j; c represents a constant.

The formula derivation shown in the above formula five demonstrates that TGI of a feature can be used as an approximation of the probability that a sample becomes a positive sample under a given bin of the feature, given that the TGI is unchanged for the sample (i.e., seed user).

Since TGIs of different characteristics have larger difference in values, an excessive value can lead to a smaller parameter estimation value in the subsequent crowd expansion model iterative training, so that normalization is needed first. Considering that the probability value of a positive sample is equal to the product of TGI and a constant, the constant is the ratio of the total number of samples to the specified number of samples, and the value of the ratio is more than or equal to 1, the probability value of the sample is in direct proportion to the TGI index. In the embodiment of the disclosure, the first target population index may be normalized, that is, global normalized, to obtain the normalized first target population index. Wherein the normalization processing includes maximum value normalization processing or normal distribution normalization processing. The normalized first target population index may be assigned to different bins of the seed user's corresponding target feature as target features for model training and prediction.

S309, traversing the target features according to the order of the saliency indexes from high to low, and executing the following operations on the traversed target features: determining a loss function value corresponding to current training, a target weight corresponding to target characteristics and a second residual error based on the normalized first target population index, a first residual error obtained by the crowd expansion model of the last training and a loss function; wherein the target weight is determined based on when the obtained loss function value is minimum; in the first training of the crowd expansion model, the corresponding weight is a preset weight, and the preset initial probability is used as a first residual error.

Illustratively, in the embodiments of the present disclosure, the preset initial probability is, for example, 100%; the framework of the crowd expansion model is an algorithm framework adopting a lifting method (boosting), for example, and a regression tree model based on conditional probability is constructed. The linear relation between the characteristic value and the target variable is verified through the conversion of conditional probability; by applying a regression tree algorithm, the problem of multiple collinearity in the linear model is avoided; and calculating the probability of the sample through a boosting algorithm framework. Based on the example of step S308, the currently assigned target feature value approximates to the conditional probability of the sample, and if the condition is the corresponding target feature, then the linear combination of the multiple target features of the final target output of the crowd expansion model is the probability that the sample is a positive sample. Thus, each target feature will be considered as a separate sub-model, each sub-model comprising the feature value of the target feature, the target weight corresponding to the target feature, and the sample prediction probability of the sub-model. The optimization target of each iteration is the residual error of the previous optimization, and the final prediction probability is maximally close to the real probability. And finally, obtaining the final prediction probability of the sample based on linear fusion of the plurality of sub-models. According to the principle of the two classification models, samples with the default prediction probability larger than 0.5 are marked as 1, samples with the prediction probability smaller than or equal to 0.5 are marked as 0, and the accuracy of the whole crowd expansion model can be determined.

Since there is no fixed positive and negative label for each sample obtained, all the specified samples (i.e., seed users) in the embodiment of the disclosure are defaulted to positive samples, and the preset initial probability of positive samples is 100% (corresponding to the original label being 1). In the first training of the crowd expansion model, the corresponding weight is a preset weight (for example, 1), and the preset initial probability is 100% as a first residual error. Each iteration only needs to use one target feature, and according to the boosting algorithm framework, the iteration target of each round is the residual error of the previous round of iteration. Thus, the iteration objective of the disclosed embodiments is:

wherein, the liquid crystal display device comprises a liquid crystal display device,

f(x _i ) A first target population index representing a corresponding target feature of the seed user; w (w) _i Representing the target weight corresponding to the target feature; the value range of i is 1 to N, and N represents the number of seed users; y represents an iteration target, y _t And representing an iteration target obtained by the t-th round of iteration.

In this step, the target feature may be traversed in order of the saliency index from high to low, and the following operations are performed on the traversed target feature: determining a loss function value corresponding to current training, a target weight corresponding to target characteristics and a second residual error based on the normalized first target population index, a first residual error obtained by the crowd expansion model of the last training and a loss function; wherein the target weight is determined based on when the obtained loss function value is minimum.

Optionally, the loss function value is determined by: determining a mean square error according to a first target population index and a first residual error of the target feature corresponding to the seed user; the loss function value is determined according to a mean square error, a first rule for representing a sum of modes of the target weights corresponding to the target features, and a second rule for representing a sum of squares of the target weights corresponding to the target features.

For example, assuming that the framework of the crowd expansion model adopts a boosting algorithm framework, and a regression tree model based on conditional probability is constructed, the initial loss function can adopt a mean square error as shown in the following formula six:

wherein MSE represents the initial loss function; yi represents the first residual obtained by the last training of the crowd expansion model.

Meanwhile, considering that the complexity of the crowd expansion model is increased due to the increase of the number of target features, in order to prevent overfitting, a first regular and a second regular are added for limiting, and the final loss function is represented by the following formula seven:

where Loss represents the final Loss function; l1 represents a first rule; l2 represents a second canonical; alpha represents the weight of L1; beta represents the weight of l 2.

The loss function value may be determined according to the loss function determined by the above equation seven.

And S310, when the reduction ratio of the loss function value is smaller than the reduction threshold value, obtaining a trained crowd expansion model and a predicted probability value corresponding to the seed user.

Illustratively, the drop threshold is, for example, 0.001. And after continuous repeated iterative training, stopping training when the falling proportion of the loss function value is smaller than the falling threshold value, and obtaining a trained crowd expansion model. It can be appreciated that the trained crowd expansion model determines the target weights corresponding to the target features.

For example, fig. 4 is a schematic diagram of iterative training of a crowd expansion model according to an embodiment of the present disclosure, as shown in fig. 4, with a seed user as a designated sample, the corresponding original label is 1 (i.e., the preset initial probability is 1). Training the crowd expansion model in an iteration mode 1, taking a preset initial probability as a first residual (namely an iteration target), taking a preset weight value as 1 as a target weight of a target feature 1 in the iteration mode 1, taking a first designated sample as an example, wherein a first target population index of the target feature 1 corresponding to the first designated sample is 0.2, determining that the product of the first target population index of the target feature 1 corresponding to the first designated sample and the target weight is 0.2, and determining that a second residual (namely the residual 1 shown in fig. 4) corresponding to the first designated sample in the iteration mode 1 is 0.8. When the training of iteration 1 is completed, the loss function value corresponding to iteration 1 is 2.0. And (3) carrying out iteration 2 training on the crowd expansion model, wherein the iteration target of the first specified sample is 0.8 of a second residual error determined by the iteration 1, the target weight corresponding to the target feature 2 is 0.5, the first target population index corresponding to the target feature 2 of the first specified sample is 0.6, the product of the first target population index corresponding to the target feature 2 of the first specified sample and the target weight is 0.3, and the second residual error corresponding to the iteration 2 (namely, the residual error 2 shown in fig. 4) is 0.5. When the training of the iteration 2 is completed, the loss function value corresponding to the iteration 2 is 1.3. From the loss function value 2.0 corresponding to the iteration 1 and the loss function value 1.3 corresponding to the iteration 2, it can be determined that the reduction ratio of the loss function value is 0.35. And so on, an iteration target corresponding to each appointed sample in each round of iteration, a second residual error and a loss function value corresponding to each iteration can be obtained. After T iterations are continued, stopping the iteration when the falling proportion of the loss function value is smaller than the falling threshold value 0.001, and obtaining a trained crowd expansion model. The prediction probability value corresponding to the seed user can be obtained while the trained crowd expansion model is obtained.

Based on the above embodiment, assuming that the crowd expansion model is a model based on conditional probability, the final output result of the crowd expansion model may be approximately equal to the predicted probability value of the positive sample (i.e., the seed user), and the predicted probability value output by the crowd expansion model may be processed as follows:

wherein r is _i Representing a predictive probability value, r, corresponding to each seed user output by the crowd expansion model _i ＝∑f(x _i )×w _i And adding the product of the first target population index and the target weight obtained by each seed user corresponding to each iteration to obtain a prediction probability value corresponding to each seed user.

Therefore, the prediction accuracy (such as rate) of the whole crowd expansion model can be determined as follows:

rate＝∑R _i /N*100％

the higher the prediction accuracy is, the better the crowd expansion model is, and the more accurate the selected target crowd is.

According to the model training method provided by the embodiment of the disclosure, based on the preset box division rule, the significance index corresponding to the preset media resource characteristic is determined according to the seed user and the preset media resource characteristic, the target characteristic is determined according to the significance index corresponding to the preset media resource characteristic, the target characteristic can be obtained more accurately, further training is performed based on the target characteristic, the seed user and the first target group index corresponding to the target characteristic of the seed user, a trained crowd expansion model is obtained, wherein the optimization target of each iteration of the crowd expansion model is the residual error of the previous optimization, and the final prediction probability is closest to the real probability to the greatest extent; the model structure of the crowd expansion model is clear and simple, the target characteristics of each layer are similar to the sample probability, so that each step of iteration of the crowd expansion model only needs to record the final prediction probability value, the calculation complexity is reduced, large-scale lookalike calculation can be realized, meanwhile, the restriction of the first regularization and the second regularization is added to the loss function of the crowd expansion model, the complexity of the crowd expansion model is greatly reduced, and the calculation speed of the sample prediction probability value can be accelerated. Therefore, the crowd expansion model with higher accuracy and higher efficiency can be obtained, and further, when the crowd expansion model is used for crowd expansion, the expanded crowd can be obtained more accurately and rapidly.

Taking the media resource as an advertisement as an example based on the model training method provided by the embodiment of the disclosure, the saliency sequencing of the corresponding features of the crowd expansion model can be output for any advertisement audience group sample under the unsupervised condition, and the magnitude of the saliency difference between different features can be quantified. In addition, since the number of samples that can be recovered is generally small in advertisement delivery, better training and prediction cannot be obtained in conventional models. The model training method provided by the embodiment of the disclosure is also applicable to supervised learning, and can adopt the algorithm framework and theory provided by the embodiment of the disclosure to realize the construction of the model aiming at the supervised condition of insufficient sample quantity or fluctuation of positive and negative samples along with time, and finally, the model can be subjected to secondary checksum optimization through the label.

Fig. 5 is a flowchart of a crowd expansion method according to an embodiment of the disclosure. The method of the embodiments of the present disclosure may be applied in a computing device, which may be a server or a server cluster, or the like. As shown in fig. 5, the method of the embodiment of the present disclosure includes:

s501, obtaining candidate users corresponding to the media resources.

In general, the candidate users corresponding to the media resources may be input by the users (such as advertisers) to the electronic device executing the method embodiment, or may be sent by other devices to the electronic device executing the method embodiment. It is understood that the candidate users corresponding to the media assets may be any media asset audience users.

S502, determining a target population index of target features corresponding to the candidate users according to the candidate users and the seed users and the target features corresponding to the crowd expansion model.

The crowd expansion model is obtained by training the model training method in any method embodiment.

In the step, the crowd expansion model is trained by adopting the model training method in any method embodiment, namely, seed users and target characteristics corresponding to the crowd expansion model are determined, so that the target group index of the target characteristics corresponding to the candidate users can be determined according to the candidate users and the seed users and the target characteristics corresponding to the crowd expansion model.

Further, optionally, determining, according to the candidate user and the seed user and the target feature corresponding to the crowd expansion model, a target population index of the target feature corresponding to the candidate user may include: determining a third user duty ratio of each sub-box of the target features corresponding to the candidate users and a fourth user duty ratio of each sub-box of the target features corresponding to the seed users based on a preset sub-box rule; and determining target group indexes of target features corresponding to the candidate users according to the third user duty ratio and the fourth user duty ratio.

Illustratively, referring to the example of step S303 in the above embodiment, the third user duty ratio of each bin of the target feature corresponding to the candidate user and the fourth user duty ratio of each bin of the target feature corresponding to the seed user may be determined based on a preset bin rule. With reference to the above formula one, the target population index of the target feature corresponding to the candidate user may be determined according to the third user duty ratio and the fourth user duty ratio.

S503, inputting the candidate users and the target group indexes of the target features corresponding to the candidate users into a crowd expansion model to obtain the prediction probability values corresponding to the candidate users.

The predicted probability value is used for determining whether the candidate user is a expandable user.

In the step, after determining the target group index of the target feature corresponding to the candidate user, the candidate user and the target group index of the target feature corresponding to the candidate user can be input into a crowd expansion model to obtain a prediction probability value corresponding to the candidate user.

Further, optionally, inputting the candidate users and the target group indexes of the target features corresponding to the candidate users into the crowd expansion model to obtain the prediction probability values corresponding to the candidate users, which may include: inputting the candidate users and the target group indexes of the target features corresponding to the candidate users into a crowd expansion model, and adding and processing products of the target group indexes of the target features corresponding to the candidate users and the target weights corresponding to the target features to obtain the prediction probability values corresponding to the candidate users.

The target group index of the target feature corresponding to the candidate user and the target group index of the target feature corresponding to the candidate user are input into the crowd expansion model, and the target weight of the target feature corresponding to the crowd expansion model is determined through training, so that the product of the target group index of the target feature corresponding to the candidate user and the target weight corresponding to the target feature can be obtained, and the prediction probability value corresponding to the candidate user can be obtained by summing all the products.

After obtaining the predicted probability value corresponding to the candidate user, whether the candidate user is an expandable user or not can be determined according to the predicted probability value. For example, candidate users having a predicted probability value greater than a probability threshold may be determined as expandable users.

According to the crowd expansion method provided by the embodiment of the disclosure, the candidate users corresponding to the media resources are obtained, and the target crowd index of the target characteristics corresponding to the candidate users is determined according to the candidate users and the seed users and the target characteristics corresponding to the crowd expansion model; inputting the candidate users and the target group indexes of the target characteristics corresponding to the candidate users into a crowd expansion model to obtain predicted probability values corresponding to the candidate users, wherein the predicted probability values are used for determining whether the candidate users are expandable users or not. Because the crowd expansion model of the embodiment of the disclosure has higher accuracy and better generalization capability, expandable users can be more accurately determined through the crowd expansion model.

Based on the embodiment, taking the media resource as an advertisement as an example, the automatic group image display of the advertisement audience group can be realized. Specifically, based on feature significance ranking, the most significant features of the specified crowd and the large-disc crowd can be output in real time aiming at audience groups of any advertisements, and secondary drill-down operation of TGI (time division indicator) of feature specific boxes can be realized; based on the fact that the calculation of sample distribution is completed when the TGI is determined, a distribution map corresponding to the characteristic can be automatically produced while the obvious characteristic is displayed, and the characteristic bin with the largest distribution difference is marked. The method can realize the real-time calculation of the advertisement putting crowd, specifically, based on the crowd expansion model obtained by training by the model training method provided by the embodiment of the disclosure, the number of expandable crowd can be rapidly calculated, and the secondary accurate screening of crowd can be realized according to the prediction probability value.

Exemplary apparatus

Having described the medium of the exemplary embodiments of the present disclosure, next, a model training apparatus of the exemplary embodiments of the present disclosure will be described with reference to fig. 6. The device of the exemplary embodiment of the disclosure can realize each process in the model training method embodiment and achieve the same functions and effects.

Fig. 6 is a schematic structural diagram of a model training apparatus according to an embodiment of the present disclosure, and as shown in fig. 6, a model training apparatus 600 according to an embodiment of the present disclosure includes: an acquisition module 601, a first determination module 602, a second determination module 603, and a training module 604. Wherein:

the obtaining module 601 is configured to obtain a seed user corresponding to the media resource.

The first determining module 602 is configured to determine a significance index corresponding to the preset media resource feature according to the seed user and the preset media resource feature.

A second determining module 603 is configured to determine a preset media resource feature with a significance index greater than or equal to the index threshold as a target feature.

The training module 604 is configured to iteratively train the crowd expansion model based on the seed user, the first target population index of the seed user corresponding to the target feature, and the target feature, to obtain a trained crowd expansion model, where the crowd expansion model is used for crowd expansion based on the seed user.

In one possible implementation, the first determining module 602 may be specifically configured to: acquiring the total number of users in a preset time range; determining a first user proportion of each sub-box of the total users corresponding to the preset media resource characteristics and a second user proportion of each sub-box of the seed users corresponding to the preset media resource characteristics based on preset sub-box rules; determining a second target group index corresponding to each sub-box of the preset media resource characteristics and a first target group index corresponding to the preset media resource characteristics of the seed users according to the first user duty ratio and the second user duty ratio; determining a total value of a second user duty ratio corresponding to a second target population index larger than a target population index mean under preset media resource characteristics, wherein the target population index mean is obtained according to the second target population index; and determining a significance index corresponding to the preset media resource feature according to the total value and the second target group index corresponding to each sub-box of the preset media resource feature.

In a possible implementation manner, the first determining module 602 may be specifically configured to, when determining the significance index corresponding to the preset media resource feature according to the summary value and the second target population index corresponding to each sub-bin of the preset media resource feature: determining standard deviation of the second target population index under the preset media resource characteristics according to the average value of the second target population index under the preset media resource characteristics and the second target population index; and determining the significance index corresponding to the preset media resource characteristics as the product of the total value and the standard deviation.

In one possible implementation, the first determination module 602 may obtain the target population index mean according to at least one of: determining the average value of the target population indexes as the average value of the second target population indexes; weighting the second target group index and the second user duty ratio corresponding to the second target group index to obtain a target group index average value; sorting the second target population indexes, and determining the median in the sorted second target population indexes as the target population index mean value; and sequencing the number of the total users corresponding to the second target population indexes, and determining the second target population index corresponding to the maximum number as a target population index mean value.

In one possible implementation, training module 604 may be specifically configured to: traversing the target features according to the order of the saliency index from high to low, and executing the following operations on the traversed target features: determining a loss function value corresponding to the current training, a target weight corresponding to the target feature and a second residual error based on a first target population index of the target feature corresponding to the seed user, a first residual error obtained by the crowd expansion model of the last training and a loss function; wherein the target weight is determined based on when the obtained loss function value is minimum; in the first training of the crowd expansion model, the corresponding weight is a preset weight, and the preset initial probability is used as a first residual error; and when the reduction ratio of the loss function value is smaller than the reduction threshold value, obtaining a trained crowd expansion model and a predicted probability value corresponding to the seed user.

In one possible implementation, training module 604 may determine the loss function value by: determining a mean square error according to a first target population index and a first residual error of the target feature corresponding to the seed user; the loss function value is determined according to a mean square error, a first rule for representing a sum of modes of the target weights corresponding to the target features, and a second rule for representing a sum of squares of the target weights corresponding to the target features.

In one possible implementation, training module 604 may be specifically configured to: normalizing the first target population index to obtain a normalized first target population index, wherein the normalization comprises maximum normalization or normal distribution normalization; and carrying out iterative training on the expansion model based on the seed user, the normalized first target population index and the target characteristics to obtain a trained crowd expansion model.

In one possible implementation, the second determining module 603, when configured to determine the target feature, may be specifically configured to: displaying the preset media resource characteristics according to the sequence from high to low of the significance index; and determining target characteristics in response to the selected operation facing the preset media resource characteristics.

The device of the embodiment of the disclosure may be used to implement the scheme of the model training method in any of the embodiments of the method, and its implementation principle and technical effects are similar, and will not be described herein.

Fig. 7 is a schematic structural diagram of a crowd expanding device according to an embodiment of the disclosure, and as shown in fig. 7, a crowd expanding device 700 according to an embodiment of the disclosure includes: an acquisition module 701, a determination module 702 and a processing module 703. Wherein:

An obtaining module 701, configured to obtain a candidate user corresponding to the media resource.

The determining module 702 is configured to determine a target population index of a target feature corresponding to the candidate user according to the candidate user and a seed user and the target feature corresponding to a population expansion model, where the population expansion model is trained by using the model training method in any one of the method embodiments.

And the processing module 703 is used for inputting the candidate users and the target group indexes of the target features corresponding to the candidate users into the crowd expansion model to obtain the prediction probability values corresponding to the candidate users.

In one possible implementation, the processing module 703 may be specifically configured to: inputting the candidate users and the target group indexes of the target features corresponding to the candidate users into a crowd expansion model, and adding and processing products of the target group indexes of the target features corresponding to the candidate users and the target weights corresponding to the target features to obtain the prediction probability values corresponding to the candidate users.

In one possible implementation, the determining module 702 may be specifically configured to: determining a third user duty ratio of each sub-box of the target features corresponding to the candidate users and a fourth user duty ratio of each sub-box of the target features corresponding to the seed users based on a preset sub-box rule; and determining target group indexes of target features corresponding to the candidate users according to the third user duty ratio and the fourth user duty ratio.

The device of the embodiment of the disclosure may be used to implement the scheme of the crowd expanding method in any of the above method embodiments, and its implementation principle and technical effects are similar, and are not repeated here.

Exemplary Medium

Having described the method of the exemplary embodiments of the present disclosure, next, a storage medium of the exemplary embodiments of the present disclosure will be described with reference to fig. 8.

Fig. 8 is a schematic diagram of a storage medium according to an embodiment of the disclosure. Referring to fig. 8, a storage medium 800, in which a program product for implementing the above-described method according to an embodiment of the present disclosure is stored, may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device such as a personal computer. However, the program product of the present disclosure is not limited thereto.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. The readable signal medium may also be any readable medium other than a readable storage medium.

Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the context of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN).

Exemplary computing device

Having described the methods, media, and apparatus of exemplary embodiments of the present disclosure, a computing device of exemplary embodiments of the present disclosure is next described with reference to fig. 9.

The computing device 900 shown in fig. 9 is merely an example and should not be taken as limiting the functionality and scope of use of embodiments of the present disclosure.

Fig. 9 is a schematic structural diagram of a computing device according to an embodiment of the disclosure, and as shown in fig. 9, the computing device 900 is in the form of a general-purpose computing device. Components of computing device 900 may include, but are not limited to: the at least one processing unit 901, the at least one storage unit 902, and a bus 903 connecting different system components (including the processing unit 901 and the storage unit 902). For example, the processing unit 901 may be specifically a processor, the storage unit 902 stores computer-executable instructions, and the processing unit 901 executes the computer-executable instructions stored in the storage unit 902 to implement the above-mentioned scheme of the model training method and the scheme of the crowd expanding method.

Bus 903 includes a data bus, a control bus, and an address bus.

The storage unit 902 may include readable media in the form of volatile memory, such as Random Access Memory (RAM) 9021 and/or cache memory 9022, and may further include readable media in the form of non-volatile memory, such as Read Only Memory (ROM) 9023.

The storage unit 902 may also include a program/utility 9025 having a set (at least one) of program modules 9024, such program modules 9024 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

Computing device 900 can also communicate with one or more external devices 904 (e.g., keyboard, pointing device, etc.). Such communication may occur through an input/output (I/O) interface 905. Moreover, computing device 900 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, for example, the Internet, through network adapter 906. As shown in fig. 9, the network adapter 906 communicates with other modules of the computing device 900 over the bus 903. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with computing device 900, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

It should be noted that while in the above detailed description, several units/modules or sub-units/modules of a model training device or crowd expanding device are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more units/modules described above may be embodied in one unit/module in accordance with embodiments of the present disclosure. Conversely, the features and functions of one unit/module described above may be further divided into ones that are embodied by a plurality of units/modules.

Furthermore, although the operations of the methods of the present disclosure are depicted in the drawings in a particular order, this is not required to or suggested that these operations must be performed in this particular order or that all of the illustrated operations must be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.

While the spirit and principles of the present disclosure have been described with reference to several particular embodiments, it is to be understood that this disclosure is not limited to the particular embodiments disclosed nor does it imply that features in these aspects are not to be combined to benefit from this division, which is done for convenience of description only. The disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. A model training method, comprising:

acquiring seed users corresponding to media resources;

determining the preset media resource characteristics of which the significance index is greater than or equal to an index threshold as target characteristics;

And performing iterative training on the crowd expansion model based on the seed user, a first target population index of the seed user corresponding to the target feature and the target feature to obtain a trained crowd expansion model, wherein the crowd expansion model is used for crowd expansion based on the seed user.

2. The model training method according to claim 1, wherein the determining, according to the seed user and the preset media resource feature, the significance index corresponding to the preset media resource feature includes:

acquiring the total number of users in a preset time range;

determining a first user ratio of each sub-box of the total number of users corresponding to the preset media resource characteristics and a second user ratio of each sub-box of the seed users corresponding to the preset media resource characteristics based on a preset sub-box rule;

determining a second target group index corresponding to each sub-bin of the preset media resource characteristics and a first target group index corresponding to the seed user to the preset media resource characteristics according to the first user duty ratio and the second user duty ratio;

determining a total value of a second user duty ratio corresponding to a second target population index larger than a target population index mean under the preset media resource characteristics, wherein the target population index mean is obtained according to the second target population index;

And determining a significance index corresponding to the preset media resource feature according to the total value and a second target population index corresponding to each sub-bin of the preset media resource feature.

3. The model training method according to claim 2, wherein the determining the significance index corresponding to the preset media resource feature according to the total value and the second target population index corresponding to each sub-bin of the preset media resource feature comprises:

determining a standard deviation of a second target population index under the preset media resource characteristics according to the average value of the second target population index under the preset media resource characteristics and the second target population index;

and determining the significance index corresponding to the preset media resource characteristic as the product of the total value and the standard deviation.

4. The model training method of claim 2, the target population index mean being obtained according to at least one of:

determining the average value of the target population indexes as the average value of the second target population indexes;

weighting the second target group index and the second user duty ratio corresponding to the second target group index to obtain the target group index average value;

Sorting the second target population indexes, and determining that the median in the sorted second target population indexes is the target population index mean value;

and sequencing the number of the total users corresponding to the second target population index, and determining that the second target population index corresponding to the maximum number is the target population index mean value.

5. The model training method according to any one of claims 1 to 4, wherein the iterative training is performed on the crowd-expansion model based on the seed user, the first target population index of the seed user corresponding to the target feature, and the target feature, to obtain a trained crowd-expansion model, and the method comprises:

traversing the target features according to the order of the saliency index from high to low, and executing the following operations on the traversed target features:

determining a loss function value corresponding to the current training, a target weight corresponding to the target feature and a second residual error based on a first target population index of the seed user corresponding to the target feature, a first residual error obtained by a crowd expansion model of the last training and a loss function; wherein the target weight is determined based on when the obtained loss function value is minimum; in the first training of the crowd expansion model, the corresponding weight is a preset weight, and the preset initial probability is used as a first residual error;

And when the reduction ratio of the loss function value is smaller than the reduction threshold value, obtaining a trained crowd expansion model and a predicted probability value corresponding to the seed user.

6. The model training method of claim 5, the loss function value being determined by:

determining a mean square error according to a first target population index of the seed user corresponding to the target feature and the first residual error;

and determining the loss function value according to the mean square error, a first regular and a second regular, wherein the first regular is used for representing the sum of the modes of the target weights corresponding to the target features, and the second regular is used for representing the sum of the squares of the target weights corresponding to the target features.

7. The model training method according to any one of claims 1 to 4, wherein the iterative training is performed on the crowd-expansion model based on the seed user, the first target population index of the seed user corresponding to the target feature, and the target feature, to obtain a trained crowd-expansion model, and the method comprises:

normalizing the first target population index to obtain a normalized first target population index, wherein the normalization comprises maximum normalization or normal distribution normalization;

And performing iterative training on the expansion model based on the seed user, the normalized first target population index and the target characteristic to obtain a trained crowd expansion model.

8. The model training method of any of claims 1-4, determining target features, comprising:

displaying the preset media resource characteristics according to the order of the saliency indexes from high to low;

and determining target characteristics in response to the selected operation facing the preset media resource characteristics.

9. A crowd expansion method comprising:

acquiring candidate users corresponding to the media resources;

determining a target population index of the candidate user corresponding to the target feature according to the candidate user and seed users and target features corresponding to a population expansion model, wherein the population expansion model is trained by adopting the model training method according to any one of claims 1 to 8;

and inputting the candidate users and the target group indexes of the candidate users corresponding to the target characteristics into the crowd expansion model to obtain predicted probability values corresponding to the candidate users, wherein the predicted probability values are used for determining whether the candidate users are expandable users or not.

10. The crowd expansion method of claim 9, wherein the inputting the candidate users and the target group index of the candidate users corresponding to the target feature into the crowd expansion model to obtain the predicted probability value corresponding to the candidate users comprises:

inputting the candidate users and the target group indexes of the target characteristics corresponding to the candidate users into the crowd expansion model, and adding products of the target group indexes of the target characteristics corresponding to the candidate users and the target weights corresponding to the target characteristics to obtain the prediction probability values corresponding to the candidate users.