CN113688313A

CN113688313A - Training method of prediction model, information pushing method and device

Info

Publication number: CN113688313A
Application number: CN202110923918.3A
Authority: CN
Inventors: 吴强; 王海涛; 张亚鹏
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2021-08-12
Filing date: 2021-08-12
Publication date: 2021-11-23

Abstract

The specification discloses a training method of a prediction model, an information pushing method and a device, and specifically discloses that for each service scene, according to historical pushing information pushed to a user and related information of the user, scene characteristics and comprehensive characteristics corresponding to a target training sample are determined, according to identification information corresponding to the service scene, a scene weight matrix is determined, the scene characteristics and the comprehensive characteristics weighted by the scene weight matrix are input into a sub-prediction layer corresponding to the service scene, a predicted click rate aiming at the historical pushing information is obtained, and the prediction model is trained according to the predicted click rate and label information corresponding to the target training sample. Therefore, when information is pushed, the prediction click rate corresponding to each candidate information is determined by using the prediction model, and the target information pushed to the user is selected according to the prediction click rate, so that the accuracy of the prediction result output by the prediction model is improved, and the pushed target information is more in line with the preference of the user.

Description

Training method of prediction model, information pushing method and device

Technical Field

The specification relates to the technical field of internet, in particular to a training method of a prediction model, and an information pushing method and device.

Background

With the development of network technology, the data volume of network information has exhibited a characteristic of explosive growth. Therefore, in order to enable a user to acquire information of interest of the user more quickly and easily, the current service platform generally pushes information to the user according to information such as user preference, information display environment (such as a mobile phone model, an account login environment, and the like) and the like for each user.

Currently, when a service platform pushes information to a user, a pre-trained prediction model is used to predict click rates corresponding to various information to be pushed to the user according to user preferences and a display environment of the pushed information, and the information is pushed to the user based on the predicted click rates.

However, when the same information is pushed to a user in different push scenarios, the user may exhibit different reflections. For example, a user may be more likely to click to view a keyboard advertisement pushed on an order generation interface where the user purchases a computer than a keyboard advertisement pushed directly on a client home interface to a user.

In this way, if the corresponding prediction models are respectively constructed for each push scenario, the prediction models corresponding to the push scenarios with less training data may have insufficient model training, which results in low accuracy of the prediction results output by the prediction models corresponding to the push scenarios with less training data. However, if all the push scenes use one prediction model, the prediction model excessively fits the push scenes with more training data, and thus the push scenes with less training data still have the problem of poor accuracy of the prediction result.

Therefore, how to improve the accuracy of the prediction result output by the prediction model in each push scene is an urgent problem to be solved.

Disclosure of Invention

The present specification provides a training method of a prediction model, and an information pushing method and apparatus, to partially solve the above problems in the prior art.

The technical scheme adopted by the specification is as follows:

the present specification provides a training method for a prediction model, where the prediction model includes a sub-prediction layer corresponding to each service scenario, and a scenario weight layer, and includes:

acquiring a target training sample under each service scene, wherein the target training sample comprises historical pushing information pushed to a user and related information of the user under the service scene;

according to the historical push information and the relevant information, determining scene characteristics corresponding to the target training sample in the service scene, corresponding comprehensive characteristics of the target training sample in all service scenes, inputting identification information corresponding to the service scene into the scene weight layer, and determining a scene weight matrix required by weighting in the service scene according to the comprehensive characteristics;

inputting the scene characteristics and the comprehensive characteristics weighted by the scene weight matrix into a sub-prediction layer corresponding to the service scene to obtain a predicted click rate aiming at the historical push information;

and training the prediction model according to the predicted click rate and the label information corresponding to the target training sample.

Optionally, each sub-prediction layer corresponds to a data preprocessing layer, and the related information includes a historical service record of the user and attribute information of the user;

determining scene characteristics corresponding to the target training sample in the service scene according to the historical push information and the related information, specifically comprising:

determining behavior characteristics corresponding to the user according to the historical service record, determining portrait characteristics corresponding to the user according to the attribute information, and determining push information characteristics corresponding to the historical push information according to the historical push information;

splicing the behavior characteristic, the portrait characteristic and the push information characteristic to obtain a sample characteristic corresponding to the target training sample;

and inputting the sample characteristics into a data preprocessing layer corresponding to a sub-prediction layer under the service scene to obtain the scene characteristics.

Optionally, inputting the sample feature into a data preprocessing layer corresponding to a sub-prediction layer in the service scene to obtain the scene feature, specifically including:

inputting the sample characteristics into a data preprocessing layer corresponding to a sub-prediction layer under the service scene, so that the data preprocessing layer normalizes the sample characteristics according to a first sample distribution met by a target training sample under the service scene to obtain the scene characteristics, wherein the first sample distribution is learned by the data preprocessing layer according to the input target training sample.

Optionally, the prediction model further includes a shared data preprocessing layer, and the related information includes a historical service record of the user and attribute information of the user;

determining, according to the historical push information and the relevant information, corresponding comprehensive features of the target training sample in all service scenarios, which specifically includes:

determining a behavior characteristic corresponding to the user according to the historical service record, determining an portrait characteristic corresponding to the user according to the attribute information, and determining a push information characteristic corresponding to the historical push information according to the historical push information;

and inputting the sample characteristics into the shared data preprocessing layer to obtain the comprehensive characteristics.

Optionally, inputting the sample feature into the shared data preprocessing layer to obtain the comprehensive feature, specifically including:

and inputting the sample characteristics into the shared data preprocessing layer, so that the shared data preprocessing layer normalizes the sample characteristics according to second sample distribution met by target training samples in all service scenes to obtain the comprehensive characteristics, wherein the second sample distribution is learned by the shared data preprocessing layer according to the input target training samples under the condition of not distinguishing the service scenes.

Optionally, the prediction model further comprises: a shared prediction layer;

according to the predicted click rate and the label information corresponding to the target training sample, training the prediction model specifically comprises:

acquiring a first network parameter corresponding to the shared prediction layer which is trained in advance and a second network parameter corresponding to the sub prediction layer corresponding to the service scene, wherein the shared prediction layer is obtained by training through the deviation between the predicted click rate of a training sample under the shared prediction layer and the label information corresponding to the training sample;

according to the first network parameter and the second network parameter, determining a parameter deviation between the pre-trained shared prediction layer and a sub-prediction layer corresponding to the service scene;

and training the prediction model by taking the minimized parameter deviation and the deviation between the predicted click rate of the target training sample in the service scene and the label information corresponding to the target training sample as optimization targets.

The present specification provides an information pushing method, where a prediction model includes a sub-prediction layer corresponding to each service scenario and a scenario weight layer, and the method includes:

determining candidate information needing to be pushed to a user in a current service scene, and acquiring related information of the user and identification information corresponding to the current service scene;

for each candidate information, inputting the candidate information, the related information and the identification information into a pre-trained prediction model, so that the prediction model determines, according to the candidate information and the related information, a scene characteristic corresponding to the candidate information in the current service scene, a comprehensive characteristic corresponding to the candidate information in all service scenes, and the identification information into the scene weight layer, determines a scene weight matrix required by the current service scene for weighting the comprehensive characteristic, and inputs the scene characteristic and the comprehensive characteristic weighted by the scene weight matrix into a sub-prediction layer corresponding to the current service scene to obtain a predicted click rate for the candidate information, wherein the prediction model is obtained by training through the training method;

and selecting candidate information pushed to the user from the candidate information as target information according to the predicted click rate corresponding to the candidate information, and pushing the target information to the user.

The present specification provides a training apparatus for a prediction model, where the prediction model includes a sub-prediction layer corresponding to each service scenario, and a scenario weight layer, and includes:

the acquisition module is used for acquiring a target training sample in the service scene, wherein the target training sample comprises historical pushing information pushed to a user and related information of the user in the service scene;

the determining module is used for determining scene characteristics corresponding to the target training sample in the service scene, corresponding comprehensive characteristics of the target training sample in all service scenes according to the historical push information and the related information, inputting identification information corresponding to the service scene into the scene weight layer, and determining a scene weight matrix required by weighting in the service scene according to the comprehensive characteristics;

the prediction module is used for inputting the scene characteristics and the comprehensive characteristics weighted by the scene weight matrix into a sub-prediction layer corresponding to the service scene to obtain a predicted click rate aiming at the historical push information;

and the training model is used for training the prediction model according to the predicted click rate and the label information corresponding to the target training sample.

The present specification provides an information pushing apparatus, where a prediction model includes a sub-prediction layer corresponding to each service scenario, and a scenario weight layer, and includes:

the data acquisition module is used for determining each candidate information needing to be pushed to a user in a current service scene and acquiring the related information of the user and the identification information corresponding to the current service scene;

a prediction module for inputting the candidate information, the related information and the identification information into a pre-trained prediction model for each candidate information to make the prediction model, determining scene characteristics corresponding to the candidate information in the current service scene according to the candidate information and the related information, and the candidate information aims at the corresponding comprehensive characteristics under all the service scenes, and the scene weight layer which inputs the identification information into the prediction model determines a scene weight matrix which is required by the current service scene for weighting the comprehensive characteristics, and the scene characteristics and the comprehensive characteristics weighted by the scene weight matrix are input into a sub-prediction layer corresponding to the current service scene, obtaining a predicted click rate aiming at the candidate information, wherein the prediction model is obtained by training through the training method;

and the pushing module is used for selecting the candidate information pushed to the user from the candidate information according to the predicted click rate corresponding to the candidate information to serve as the target information and pushing the target information to the user.

The present specification provides a computer-readable storage medium, which stores a computer program, and the computer program realizes the training method of the prediction model and the information pushing method when being executed by a processor.

The present specification provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the above-mentioned training method for the prediction model and the information pushing method.

The technical scheme adopted by the specification can achieve the following beneficial effects:

in the training method of the prediction model and the information pushing method provided in this specification, the prediction model includes a sub-prediction layer corresponding to each service scenario and a scenario weight layer. And for each service scene, determining scene characteristics corresponding to the target training sample in the service scene and comprehensive characteristics corresponding to all service scenes in the target training sample according to historical push information pushed to the user and related information of the user, and meanwhile, inputting identification information corresponding to the service scene into a scene weight layer to determine a scene weight matrix required by the service scene for weighting the comprehensive characteristics. And then, inputting the scene characteristics and the comprehensive characteristics weighted by the scene weight matrix into a sub-prediction layer corresponding to the service scene to obtain a predicted click rate aiming at the historical push information, and then training a prediction model according to the predicted click rate and label information corresponding to a target training sample. In this way, when information is pushed to a user, for each candidate information, the relevant information of the user and the identification information corresponding to the current service scene, which need to be pushed to the user in the current service scene, are input into a pre-trained prediction model, so that the predicted click rate for the candidate information is determined through the prediction model, the candidate information pushed to the user is selected from the candidate information as target information according to the predicted click rate corresponding to the candidate information, and the target information is pushed to the user.

It can be seen from the above method that, when the prediction model is trained, the scene characteristics corresponding to the target training sample in the service scene and the comprehensive characteristics corresponding to the target training sample in all service scenes are simultaneously used as the input items of the sub-prediction layers corresponding to the service scenes, and before the comprehensive characteristics are input into the sub-prediction layers, the scene weight matrix required for weighting the comprehensive characteristics in the service scenes is learned by the scene weight layer, so that the sub-prediction layers can assist in predicting the predicted click rate corresponding to the target training sample by using the characteristics of the target training sample in the service scenes, which are enhanced from the comprehensive characteristics by the scene weight matrix, thereby improving the accuracy of the predicted result output by the prediction model and the service efficiency.

Drawings

The accompanying drawings, which are included to provide a further understanding of the specification and are incorporated in and constitute a part of this specification, illustrate embodiments of the specification and together with the description serve to explain the specification and not to limit the specification in a non-limiting sense. In the drawings:

FIG. 1A is a schematic diagram of a prediction model in the prior art;

FIG. 1B is a schematic diagram of a prediction model provided herein;

FIG. 2 is a schematic flow chart of a method for training a predictive model according to the present disclosure;

fig. 3 is a schematic flowchart of a method for pushing information in this specification;

FIG. 4 is a schematic diagram of a training apparatus for a prediction model according to the present disclosure;

fig. 5 is a schematic diagram of an information pushing apparatus in the present specification;

fig. 6 is a schematic diagram of an electronic device corresponding to fig. 2 or fig. 3 provided in the present specification.

Detailed Description

In order to make the objects, technical solutions and advantages of the present disclosure more clear, the technical solutions of the present disclosure will be clearly and completely described below with reference to the specific embodiments of the present disclosure and the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present specification without any creative effort belong to the protection scope of the present specification.

Currently, a service platform generally pushes information to a user in a plurality of different service scenarios. For example, hot goods are pushed to a user at an opening interface of an Application (APP) or a client. As another example, the order presentation interface after the user places an order pushes the user with the goods that the user may like. When the service platform pushes information to the user in each service scene, the predicted click rate of the user for clicking the information is predicted for each information needing to be pushed to the user, and the information pushed to the user is selected from all the information according to the predicted click rate corresponding to each information and pushed to the user.

The method aims at performing combined modeling on a plurality of service scenes and is provided as an idea for improving the accuracy of the output result of the prediction model. At present, as shown in fig. 1A, a prediction model constructed by combining multiple scenes mainly includes a shared prediction layer shared by all the service scenes, and a sub-prediction layer independently set for each service scene. When the click rate is predicted under each service scene, for each information needing to be pushed to a user, the shared prediction layer is used for determining the predicted click rate aiming at the information under the condition of not distinguishing the scenes, and meanwhile, the sub prediction layer corresponding to the service scene is used for determining the predicted click rate aiming at the information under the service scene. And then, according to the predicted click rates output by the two prediction layers, determining the predicted click rate of the user for the information.

At this time, for a service scenario with less training data, the shared prediction layer still has a possibility of excessively fitting a service scenario with more training data, and further, the prediction result corresponding to the service scenario with less training data may be inaccurate.

In order to solve the above problem, a prediction model is provided in this specification, as shown in fig. 1B, in which the prediction model includes a sub-prediction layer (i.e., prediction network-a, prediction network-B in fig. 1B) corresponding to each service scenario, and a scenario weight layer. And for each service scene, when the click rate is predicted by utilizing the sub-prediction layer corresponding to the service scene, determining each candidate information needing to be pushed to the user in the current service scene, and acquiring the relevant information of the user and the identification information corresponding to the current service scene. Then, for each candidate information, inputting the candidate information, the relevant information of the user and the identification information into a pre-trained prediction model, determining, by the prediction model, according to the candidate information and the relevant information, to push the scene features (i.e., scene a-scene feature and scene B-scene feature in fig. 1) corresponding to the candidate information in the current service scene to the user, and pushing the comprehensive features corresponding to the candidate information in all service scenes to the user.

And meanwhile, determining a scene weight matrix required by the current service scene for weighting the comprehensive characteristics according to the identification information by the scene weight layer. And then, carrying out weighting processing on the comprehensive characteristic by using the scene weight matrix, and inputting the scene characteristic and the weighted comprehensive characteristic into a sub-prediction layer corresponding to the current service scene to obtain a predicted click rate aiming at the candidate information. And finally, selecting target information pushed to the user according to the predicted click rate corresponding to each piece of candidate information and pushing the target information to the user. Therefore, the weighted comprehensive characteristics can be used for assisting in predicting the corresponding prediction click rate of the target training sample, the accuracy of the prediction result output by the prediction model is improved, and the service efficiency is improved.

The following describes in detail a training scheme of the prediction model provided in this specification and a technical scheme of pushing information by using the prediction model obtained by training, with reference to embodiments.

Fig. 2 is a schematic flow chart of a training method of a prediction model in this specification, which specifically includes the following steps:

step S200, aiming at each service scene, obtaining a target training sample under the service scene, wherein the target training sample comprises historical pushing information pushed to a user and related information of the user under the service scene.

For convenience of description, the following describes the training method of the prediction model and the information push method provided in this specification, with only the service platform as the execution subject.

In this specification, the prediction model can be applied to various services. For example, in a news information platform, the click-through rate of a user for pushed news information or advertisements is predicted, and the news information or advertisements are pushed to the user according to the predicted click-through rate. For another example, in an online shopping platform, the click rate of a user for the pushed commodities, subject activities or other user comments is predicted, and the commodities, subject activities or other user comments are pushed to the user according to the predicted click rate. In actual services, services capable of pushing information to a user are various, and in different services, push information capable of being pushed to a user is also various, which is difficult to be exhausted in the description, so that the examples are not necessarily given here.

The service scene may refer to a service block for pushing information to a user by a service platform through an Application (APP) or a client on a terminal device of the user (such as a mobile phone, a tablet computer, and the like) in the same service. For example, the APP or the client may present information of different service boards to the user, and the information presented in each service board may be regarded as information presented in different service scenarios. For another example, in practical applications, the commodity information, the merchant information, or the review information recommended by the service platform to the user belongs to different types of recommendation information, so that the commodity recommendation, the merchant recommendation, and the review information recommendation can be regarded as different service scenarios.

In practical application, service scenes for pushing information to a user by a service platform are various, each service scene is preset, each service scene is configured with unique identification information, and the identification information can be an Identity Document (ID) corresponding to the service scene.

Therefore, the service platform can acquire historical push information pushed to each user in the service scene in history and relevant information of each user according to the identification information corresponding to the service scene and aims at each service scene to construct a target training sample, and then acquire the target training sample constructed aiming at each service scene when the prediction model is trained to train the prediction model. And the relevant information of the user in each target training sample comprises but is not limited to historical business records of the user and attribute information of the user. The historical service records record the service records of each service executed by the user. The attribute information may refer to information that can embody basic characteristics of a user's person, for example, the attribute information of the user may include: the age of the user, the gender of the user, the city in which the user is located, the native place of the user, etc., will not be described in detail herein.

Step S202, according to the historical pushing information and the relevant information, determining scene characteristics corresponding to the target training sample in the service scene, comprehensive characteristics corresponding to the target training sample in all service scenes, inputting identification information corresponding to the service scene into the scene weight layer, and determining a scene weight matrix required by the service scene for weighting the comprehensive characteristics.

In specific implementation, the service platform firstly determines behavior characteristics corresponding to a user according to a historical service record of the user, determines portrait characteristics corresponding to the user according to attribute information of the user, and determines push information characteristics corresponding to historical push information according to the historical push information pushed to the user. And then, splicing the behavior characteristics, the image characteristics and the push information characteristics to obtain the sample characteristics corresponding to the target training sample. And finally, according to the sample characteristics corresponding to the target training sample, determining the scene characteristics corresponding to the target training sample in the service scene and the comprehensive characteristics corresponding to the target training sample in all service scenes.

The behavior characteristics corresponding to the user are used for representing the operation performed by the user in history, and can show the behavior preference of the user. The portrait characteristics corresponding to the user abstractly represent the information of the user age, the user gender, the city where the user is located, the user native place and the like. The push information characteristics corresponding to the historical push information abstractly represent the information such as characters, formats, pictures and the like corresponding to the historical push information pushed to the user. The behavior characteristics, the image characteristics and the push information characteristics can be obtained in a word embedding mode.

Specifically, when the service platform determines the behavior characteristics corresponding to the user, the service platform determines each historical event executed by the user from the historical service records of the user, then sorts each historical event according to the occurrence time sequence corresponding to each historical event to obtain a sorting result, generates a user behavior sequence corresponding to the user according to the sorting result and the historical event information corresponding to each historical event, and finally determines the behavior characteristics corresponding to the user based on the user behavior sequence.

Wherein the historical event may include at least one of a historical search event, a historical click event, and a historical ordering event. The historical event information corresponding to the historical event comprises relevant service data corresponding to the historical event executed by the user, and the historical event information corresponding to the historical event at least comprises time information when the historical event occurs, service environment information when the historical event occurs, identification of information related to the historical event and the like.

Further, the service environment information when the historical event occurs refers to an environment where the user is located when pushing information to the user and a display environment of the pushed information, and the service environment information may include at least information such as a model of a terminal device used when the user executes the historical event, a login manner (e.g., applet login, APP login, web page end login, etc.) in which the user logs in to the service platform when the user executes the historical event, and a time period in which the user executes the historical event.

For example, if a product is pushed to a user on a web shopping platform, the historical events may include historical ordering events (events in which the user historically purchases a product, the historical event information of the historical ordering events at least includes information such as an event occurrence time, an identifier of a product purchased by the user, a product category to which the product purchased by the user belongs, and service environment information when the historical events occur), historical search events (the historical event information of the historical search events at least includes information such as an event occurrence time, a search request product keyword used by the user during search, a product search result corresponding to the search request, which product search results the user clicked to view, and service environment information when the historical events occur), historical click events occurring in the service scenarios (the historical event information of the historical click events at least includes an event occurrence time, a product category, and service environment information when the historical events occur), The identification of the commodity clicked and viewed by the user, the commodity category to which the commodity clicked and viewed by the user belongs, the service environment information when the historical event occurs, and the like).

In practical application, the service platform may set information included in the historical event information corresponding to the historical event according to actual requirements.

Then, the business platform splices the behavior characteristics, the image characteristics and the push information characteristics to obtain sample characteristics corresponding to the target training sample, and performs data processing on the sample characteristics to obtain scene characteristics corresponding to the target training sample in the business scene and comprehensive characteristics corresponding to the target training sample in all the business scenes. The corresponding scene characteristics of the target training sample in the service scene are used for representing the unique characteristics of the target training sample in the service scene. The target training samples are used for representing the common characteristics of the target training samples in all the service scenes aiming at the corresponding comprehensive characteristics in all the service scenes.

When the business platform determines the scene characteristics corresponding to the target training samples, the sample characteristics of the target training samples are input into the data preprocessing layer corresponding to the sub-prediction layer under the business scene, and the data preprocessing layer normalizes the sample characteristics according to the first sample distribution met by the target training samples under the business scene to obtain the scene characteristics corresponding to the target training samples under the business scene. Wherein the first sample distribution is learned by the data preprocessing layer according to the input target training samples. The first sample distribution can be reflected in the click rate corresponding to the target training sample. The business platform uses the first sample distribution to carry out normalization processing on the sample characteristics, so that the difference between the target training samples input in the current batch and all the input target training samples before can be reduced, and the learning speed of the commonality of the prediction model on each batch of target training samples can be further improved.

Meanwhile, when the service platform determines the comprehensive characteristics corresponding to the target training samples, the sample characteristics corresponding to the target training samples are input into the shared data preprocessing layer, and the shared data preprocessing layer performs normalization processing on the sample characteristics according to the second sample distribution met by the target training samples in all service scenes to obtain the comprehensive characteristics corresponding to the target training samples in all service scenes. And the second sample distribution is learned by the shared data preprocessing layer according to the input target training samples under the condition of not distinguishing the service scenes. And the service platform uses the second sample distribution to carry out normalization processing on the sample characteristics, so that the difference between the target training samples input in the current batch and all the input target training samples before can be reduced, and the learning speed of the prediction model on the commonality of the target training samples input in the service scene can be further improved.

In this specification, each service scenario shares one scenario weight layer. When the prediction model is not trained, identification information corresponding to any service scene is input into the scene weight layer, and a unified and initialized scene weight matrix is obtained. That is, at this time, the scene weight layer has not learned the scene characteristics that the identification information corresponding to different service scenes should exhibit in different service scenes. However, through continuous training of a large number of training samples, the scene weight layer can finally learn the scene characteristics of the identification information corresponding to different service scenes in different service scenes. At this time, the identification information corresponding to different service scenes is input into the scene weight layer, and the obtained scene weight matrixes are also different.

After the service platform uses the scene weight matrix corresponding to the service scene output by the scene weight layer to weight the comprehensive characteristics, the characteristics of the service scene can be strengthened from the comprehensive characteristics. Therefore, the sub prediction layer corresponding to the service scene can predict the predicted click rate corresponding to the recommendation information in the service scene by using the scene features corresponding to the service scene, and can assist in predicting the predicted click rate of the recommendation information by using the features of the service scene, which are enhanced from the comprehensive features based on the scene weight matrix.

Therefore, for the service scenes with few training samples in the training stage, the sub-prediction layers of the service scenes can be predicted by combining the comprehensive characteristics determined based on all the service scenes, so that the accuracy of the finally obtained predicted click rate is not too low under the condition of synthesizing all the service scenes. For a service scene with more training samples in the training stage, the comprehensive features are weighted through the scene weight matrix in the service scene, so that the characteristics of the service scene can be further highlighted in the comprehensive features, the interference caused by the characteristics of other service scenes reflected in the comprehensive features is reduced, and the accuracy of the prediction result output by the prediction model aiming at the service scene is ensured.

And step S204, inputting the scene characteristics and the comprehensive characteristics weighted by the scene weight matrix into a sub-prediction layer corresponding to the service scene to obtain the predicted click rate aiming at the historical push information.

And S206, training the prediction model according to the predicted click rate and the label information corresponding to the target training sample.

In specific implementation, the service platform firstly uses the determined scene weight coefficient to weight the comprehensive features to obtain weighted comprehensive features, and then inputs the scene features and the adjusted comprehensive features into a sub-prediction layer corresponding to the service scene to obtain the predicted click rate of the user in the training sample for the historical push information. And then, the service platform trains the prediction model by taking the minimum deviation between the predicted click rate and the label information corresponding to the training sample as an optimization target.

In this specification, for each target training sample, the corresponding label information of the target training sample may be determined according to whether the user clicks the history push information to be pushed to the user, where if the user clicks the history push information, the corresponding label information of the target training sample is 1, and if the user does not click the history push information, the corresponding label information of the target training sample is 0.

In addition, in order to improve the generalization ability of the sub prediction layers corresponding to each service scenario to further improve the accuracy of the prediction result output by the prediction model, in this specification, a shared prediction layer may be further provided, and a prediction model shared by all service scenarios is independently trained for the shared prediction layer, and the prediction model is trained according to the parameter difference between the shared prediction layer and the sub prediction layers corresponding to each service scenario.

Specifically, a service platform acquires a first network parameter corresponding to a pre-trained shared prediction layer and a second network parameter corresponding to a sub-prediction layer corresponding to a service scene, then determines a parameter deviation between the pre-trained shared prediction layer and the sub-prediction layer corresponding to the service scene according to the first network parameter and the second network parameter, and trains a prediction model by taking the minimized parameter deviation and a deviation between a predicted click rate of a target training sample in the service scene and label information corresponding to the target training sample as optimization targets.

That is to say, a shared prediction layer common to all the service scenes needs to be trained, so that the accuracy of the prediction result output by the sub-prediction layers corresponding to each service scene can be guaranteed by training the obtained prediction model, and the second model parameter in the sub-prediction layer corresponding to each service scene is close to the first model parameter in the shared prediction layer based on the first model parameter in the shared prediction layer, thereby improving the generalization capability of the sub-prediction layers corresponding to each service scene in the prediction model, and further improving the model training speed.

When the prediction model is trained, the sum of the deviation of the parameter, the deviation between the predicted click rate of the target training sample in the service scene and the label information corresponding to the target training sample can be minimized as an optimization target, and the prediction model can be trained.

When the shared prediction layer is trained, the training samples can be coded and normalized and then input into the shared prediction layer to obtain the predicted click rate of the training samples under the shared prediction layer, and then the shared prediction layer is trained by taking the minimum deviation between the predicted click rate of the training samples under the shared prediction layer and the label information corresponding to the training samples as an optimization target. The training sample may be the target training sample, or may be generated according to historical push information pushed to the user in each service scenario. In this way, the trained shared prediction layer can be regarded as a prediction model suitable for all business scenarios, which is obtained by independent training according to the training samples input into the prediction model.

In this specification, when the service platform determines a parameter difference between the trained shared prediction layer and the sub-prediction layer corresponding to the service scene, a first parameter matrix corresponding to the shared prediction layer may be constructed according to a first network parameter, a second parameter matrix corresponding to the sub-prediction layer corresponding to the service scene may be constructed according to a second network parameter, then a matrix distance between the first parameter matrix and the second parameter matrix is determined, and a parameter difference between the trained shared prediction layer and the sub-prediction layer corresponding to the service scene is determined according to the matrix distance. The matrix distance may be directly used to represent the parameter difference between the trained shared prediction layer and the sub-prediction layer, or may be used to represent the parameter difference between the trained shared prediction layer and the sub-prediction layer by using a calculation result obtained by performing a certain logical operation on the matrix distance.

In this specification, the service platform determines that the service platform can verify the trained prediction model after the model training of each round is completed, if the verification fails, the model training of the next round is continued, and if the verification passes, the prediction model training is determined to be finished. For example, the trained prediction model is verified by using a verification sample set, if the ratio of the number of the verification samples with accurate prediction results to the total number of the verification samples is greater than a set threshold, the verification is determined to be passed, otherwise, the verification is determined not to be passed. The service platform can also determine that the prediction model training is finished when the model training turns are determined to reach the set times. Other ways are not illustrated.

Aiming at the prediction model obtained by training through the training method, the specification also provides a corresponding use method of the prediction model.

Fig. 3 is a schematic flow chart of an information pushing method in this specification, which specifically includes the following steps:

step 300, determining each candidate information that needs to be pushed to a user in a current service scenario, and acquiring relevant information of the user and identification information corresponding to the current service scenario.

In this specification, each candidate information to be pushed to the user is different in different services. For example, in a news information service, the candidate information may be news information or advertisement, etc. that needs to be pushed to the user. In the online shopping business, the candidate information can be commodities, merchants, comments, subject activities and the like.

In specific implementation, the service platform may determine, according to a display interface of an Application (APP) or a client on a terminal (such as a mobile phone or a tablet computer) owned by a user at the current time, identification information corresponding to a service scene on the display interface, where information needs to be pushed to the user. Then, the service platform can determine each candidate information which needs to be pushed to the user in the current service scene according to the identification information. In each service scenario, the candidate information to be pushed to the user may be preset.

Meanwhile, the service platform may also obtain relevant information of the user who is to push information, where the relevant information of the user may include a historical service record of the user and attribute information of the user.

Step 302, for each candidate information, inputting the candidate information, the related information and the identification information into a pre-trained prediction model, so that the prediction model determines, according to the candidate information and the related information, a scene characteristic corresponding to the candidate information in the current service scene, a comprehensive characteristic corresponding to the candidate information in all service scenes, and the identification information into the scene weight layer, determines a scene weight matrix required by the current service scene for weighting the comprehensive characteristic, and inputs the scene characteristic and the comprehensive characteristic weighted by the scene weight matrix into a sub-prediction layer corresponding to the current service scene, so as to obtain a predicted click rate for the candidate information.

In specific implementation, the service platform inputs the candidate information, the relevant information of the user and the identification information corresponding to the current service scene into a pre-trained prediction model aiming at each candidate information, then the prediction model determines the behavior characteristics corresponding to the user according to the historical service record of the user, determines the portrait characteristics corresponding to the user according to the attribute information of the user, and determines the candidate information characteristics corresponding to the candidate information according to the candidate information. Then, after the behavior feature, the portrait feature and the candidate information feature are spliced by the prediction model, the behavior feature, the portrait feature and the candidate information feature are input into a data preprocessing layer corresponding to a sub-prediction layer corresponding to a current service scene to obtain a scene feature corresponding to the candidate information in the service scene, and meanwhile, the spliced behavior feature, portrait feature and candidate information feature are input into a shared data preprocessing layer to obtain a scene feature corresponding to the candidate information in all service scenes.

Meanwhile, the prediction model encodes the identification information corresponding to the service scene to obtain the identification characteristics corresponding to the identification information, and then the identification characteristics are input into a scene weight layer to obtain a scene weight matrix required by the service scene for weighting the comprehensive characteristics.

And finally, the prediction model weights the comprehensive features according to the scene weight matrix to obtain weighted comprehensive features, and inputs the scene features and the weighted comprehensive features into a sub-prediction layer corresponding to the current service scene to obtain the predicted click rate aiming at the candidate information.

And step 306, selecting the candidate information pushed to the user from the candidate information as target information according to the predicted click rate corresponding to the candidate information, and pushing the target information to the user.

In specific implementation, after the service platform determines the predicted click rate corresponding to each candidate information, the candidate information with the highest predicted click rate can be selected according to the size of the predicted click rate corresponding to each candidate information, and is used as the target information pushed to the user, and the target information is pushed to the user. Of course, the service platform may also randomly select target information from candidate information whose predicted click rate is greater than the set click rate, and push the target information to the user.

Furthermore, when the candidate information is information such as commodities, merchants, theme activities or advertisements, the service platform can train to obtain a conversion rate prediction model by the same method while predicting the click rate. In this way, the service platform can select the candidate information to be pushed to the user from the candidate information to be pushed to the user according to the predicted click rate corresponding to the candidate information and the predicted conversion rate corresponding to the candidate information.

For example, the service platform may select the candidate information with the highest predicted click rate from the candidate information with the predicted conversion rate not lower than the set conversion rate, and push the candidate information to the user. For another example, the service platform may select a set number of candidate information from the candidate information whose predicted click rate is not lower than the set click rate and whose predicted conversion rate is not lower than the set conversion rate, and sequentially display the selected candidate information to the user according to a set order.

For another example, corresponding scoring levels are respectively set for the predicted conversion rate and the predicted click rate, the higher the predicted conversion rate, the higher the scoring level, the higher the predicted click rate and the higher the scoring level, then, when information is pushed to the user, for each candidate information, the scoring level corresponding to the predicted click rate corresponding to the candidate information and the scoring level corresponding to the predicted conversion rate corresponding to the candidate information are determined, then, according to the weighted comprehensive scoring of the scoring level corresponding to the predicted click rate and the scoring level corresponding to the predicted conversion rate, the set number of candidate information is selected from the candidate information needing to be pushed to the user, and pushed to the user. Wherein, the weight coefficient corresponding to the predicted conversion rate can be set to be larger than the weight coefficient corresponding to the predicted click rate.

Through the steps, the corresponding scene characteristics of the target training sample in the service scene and the corresponding comprehensive characteristics of the target training sample in all the service scenes are simultaneously used as input items of the sub-prediction layer corresponding to the service scene, and before the comprehensive characteristics are input into the sub-prediction layer, the scene weight matrix required by the scene weight layer to weight the comprehensive characteristics in the service scene is utilized to weight the comprehensive characteristics, so that the sub-prediction layer can utilize the characteristics of the target training sample, which are enhanced from the comprehensive characteristics through the scene weight matrix, in the service scene to assist in predicting the predicted click rate corresponding to the target training sample, and the accuracy of the predicted result output by the prediction model and the service efficiency are improved.

Based on the same idea, the present specification further provides a training device of the prediction model and a device for pushing information, as shown in fig. 4 and 5.

Fig. 4 is a schematic diagram of a training apparatus for a prediction model provided in this specification, where the prediction model includes a sub-prediction layer corresponding to each service scenario and a scenario weight layer, and specifically includes:

an obtaining module 400, configured to obtain, for each service scenario, a target training sample in the service scenario, where the target training sample includes historical push information pushed to a user and related information of the user in the service scenario;

a determining module 401, configured to determine, according to the historical push information and the relevant information, a scene feature corresponding to the target training sample in the service scene, a comprehensive feature corresponding to the target training sample in all service scenes, and input identification information corresponding to the service scene to the scene weight layer, and determine a scene weight matrix required by the service scene for weighting the comprehensive feature;

a prediction module 402, configured to input the scene characteristics and the comprehensive characteristics weighted by using the scene weight matrix into a sub-prediction layer corresponding to the service scene, so as to obtain a predicted click rate for the historical push information;

a training module 403, configured to train the prediction model according to the predicted click rate and the label information corresponding to the target training sample.

the determining module 401 is specifically configured to determine a behavior feature corresponding to the user according to the historical service record, determine an image feature corresponding to the user according to the attribute information, and determine a push information feature corresponding to the historical push information according to the historical push information; splicing the behavior characteristic, the portrait characteristic and the push information characteristic to obtain a sample characteristic corresponding to the target training sample; and inputting the sample characteristics into a data preprocessing layer corresponding to a sub-prediction layer under the service scene to obtain the scene characteristics.

Optionally, the determining module 401 is specifically configured to input the sample feature into a data preprocessing layer corresponding to a sub-prediction layer in the service scenario, so that the data preprocessing layer performs normalization processing on the sample feature according to a first sample distribution that is satisfied by a target training sample in the service scenario, to obtain the scenario feature, where the first sample distribution is learned by the data preprocessing layer according to an input target training sample.

the determining module 401 is specifically configured to determine a behavior feature corresponding to the user according to the historical service record, determine an image feature corresponding to the user according to the attribute information, and determine a push information feature corresponding to the historical push information according to the historical push information; splicing the behavior characteristic, the portrait characteristic and the push information characteristic to obtain a sample characteristic corresponding to the target training sample; and inputting the sample characteristics into the shared data preprocessing layer to obtain the comprehensive characteristics.

Optionally, the determining module 401 is specifically configured to input the sample feature into the shared data preprocessing layer, so that the shared data preprocessing layer performs normalization processing on the sample feature according to a second sample distribution that is satisfied by target training samples in all service scenarios, to obtain the comprehensive feature, where the second sample distribution is learned by the shared data preprocessing layer according to the input target training samples under a condition that service scenarios are not distinguished.

Optionally, the prediction model further comprises: a shared prediction layer;

the training module 403 is specifically configured to obtain a first network parameter corresponding to the shared prediction layer that is trained in advance, and a second network parameter corresponding to the sub prediction layer corresponding to the service scenario, where the shared prediction layer is obtained by training a training sample according to a deviation between a predicted click rate of the training sample under the shared prediction layer and label information corresponding to the training sample; according to the first network parameter and the second network parameter, determining a parameter deviation between the pre-trained shared prediction layer and a sub-prediction layer corresponding to the service scene; and training the prediction model by taking the minimized parameter deviation and the deviation between the predicted click rate of the target training sample in the service scene and the label information corresponding to the target training sample as optimization targets.

Fig. 5 is a schematic diagram of an information pushing apparatus provided in this specification, where a prediction model includes a sub-prediction layer corresponding to each service scenario and a scenario weight layer, and specifically includes:

a data obtaining module 500, configured to determine candidate information that needs to be pushed to a user in a current service scenario, and obtain relevant information of the user and identification information corresponding to the current service scenario;

a prediction module 501, configured to, for each candidate information, input the candidate information, the relevant information, and the identification information into a pre-trained prediction model, so that the prediction model determines the scene characteristics corresponding to the candidate information in the current service scene according to the candidate information and the relevant information, and the candidate information aims at the corresponding comprehensive characteristics under all the service scenes, and the identification information is input into the scene weight layer to determine a scene weight matrix required by the current service scene for weighting the comprehensive characteristics, and the scene characteristics and the comprehensive characteristics weighted by the scene weight matrix are input into a sub-prediction layer corresponding to the current service scene, obtaining a predicted click rate aiming at the candidate information, wherein the prediction model is obtained by training through the method;

the pushing module 502 is configured to select, according to the predicted click rate corresponding to each candidate information, candidate information pushed to the user from each candidate information as target information, and push the target information to the user.

The present specification also provides a computer-readable storage medium, which stores a computer program, and the computer program can be used to execute the training method of the prediction model and the information pushing method provided in fig. 1.

This specification also provides a schematic block diagram of the electronic device shown in fig. 6. As shown in fig. 6, at the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile memory, but may also include hardware required for other services. The processor reads a corresponding computer program from the non-volatile memory into the memory and then runs the computer program to implement the method for training the prediction model described in fig. 2 or the method for pushing information shown in fig. 3. Of course, besides the software implementation, the present specification does not exclude other implementations, such as logic devices or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may be hardware or logic devices. In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Hardware Description Language), traffic, pl (core universal Programming Language), HDCal (jhdware Description Language), lang, Lola, HDL, laspam, hardward Description Language (vhr Description Language), vhal (Hardware Description Language), and vhigh-Language, which are currently used in most common. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functions of the various elements may be implemented in the same one or more software and/or hardware implementations of the present description.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

This description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only an example of the present specification, and is not intended to limit the present specification. Various modifications and alterations to this description will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present specification should be included in the scope of the claims of the present specification.

Claims

1. A training method of a prediction model is characterized in that the prediction model comprises a sub-prediction layer corresponding to each service scene and a scene weight layer, and comprises the following steps:

according to the historical push information and the relevant information, determining scene characteristics corresponding to the target training sample in the service scene, corresponding comprehensive characteristics of the target training sample in all service scenes, inputting identification information corresponding to the service scene into the scene weight layer, and determining a scene weight matrix required by the service scene for weighting the comprehensive characteristics;

2. The method of claim 1, wherein each sub-prediction layer corresponds to a data preprocessing layer, and the related information comprises historical service records of the user and attribute information of the user;

3. The method of claim 2, wherein the inputting the sample feature into a data preprocessing layer corresponding to a sub-prediction layer under the service scenario to obtain the scenario feature specifically comprises:

4. The method of claim 1, wherein the predictive model further comprises a shared data preprocessing layer, and the related information comprises historical business records of the user and attribute information of the user;

5. The method of claim 4, wherein inputting the sample features into the shared data pre-processing layer to obtain the composite features comprises:

6. The method of claim 1, wherein the predictive model further comprises: a shared prediction layer;

7. A method for pushing information is characterized in that a prediction model comprises a sub-prediction layer corresponding to each service scene and a scene weight layer, and comprises the following steps:

inputting the candidate information, the related information and the identification information into a pre-trained prediction model for each candidate information, so that the prediction model determines the scene characteristics corresponding to the candidate information in the current service scene according to the candidate information and the relevant information, the candidate information aims at the corresponding comprehensive characteristics under all the service scenes, the identification information is input into the scene weight layer, a scene weight matrix required by the current service scene for weighting the comprehensive characteristics is determined, inputting the scene characteristics and the comprehensive characteristics weighted by the scene weight matrix into a sub-prediction layer corresponding to the current service scene to obtain a predicted click rate aiming at the candidate information, the prediction model is obtained by training according to the training method of any one of the claims 1-6;

8. A training device for a prediction model is characterized in that the prediction model comprises a sub-prediction layer corresponding to each business scene and a scene weight layer, and comprises:

the acquisition module is used for acquiring a target training sample under each service scene, wherein the target training sample comprises historical pushing information pushed to a user and related information of the user under the service scene;

9. An information pushing apparatus, wherein a prediction model includes a sub-prediction layer corresponding to each service scenario, and a scenario weight layer, and includes:

a prediction module for inputting the candidate information, the related information and the identification information into a pre-trained prediction model for each candidate information to make the prediction model, determining scene characteristics corresponding to the candidate information in the current service scene according to the candidate information and the related information, and the candidate information aims at the corresponding comprehensive characteristics under all the service scenes, and the scene weight layer which inputs the identification information into the prediction model determines a scene weight matrix which is required by the current service scene for weighting the comprehensive characteristics, inputting the scene characteristics and the comprehensive characteristics weighted by the scene weight matrix into a sub-prediction layer corresponding to the current service scene to obtain a predicted click rate aiming at the candidate information, the prediction model is obtained by training according to the training method of any one of the claims 1-6;

10. A computer-readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-6 or 7.

11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1 to 6 or 7 when executing the program.