WO2023109025A1

WO2023109025A1 - Delivery information processing method, and resource prediction model training method and apparatus

Info

Publication number: WO2023109025A1
Application number: PCT/CN2022/096373
Authority: WO
Inventors: 张弛; 郭远; 李怀宇; 谢淼; 林子钏; 杨森; 刘霁
Original assignee: 北京达佳互联信息技术有限公司
Priority date: 2021-12-15
Filing date: 2022-05-31
Publication date: 2023-06-22
Also published as: CN113918826A; CN113918826B

Abstract

The present disclosure relates to the technical field of information processing, and relates to a delivery information processing method, and a resource prediction model training method and apparatus. The method comprises: determining initial state feature information of target delivery information in a current delivery period; obtaining a resource prediction model, the resource prediction model comprising a conditional variational auto-encoder network and a prediction execution network; inputting the initial state feature information into the conditional variational auto-encoder network for resource prediction to obtain a first resource; inputting the initial state feature information and the first resource into the prediction execution network for resource prediction to obtain a second resource; and obtaining, on the basis of the first resource and the second resource, a target resource corresponding to the target delivery information, the target resource being a prediction resource that enables a delivery revenue of the target delivery information in the current delivery period to satisfy a target delivery revenue.

Description

Delivery information processing method, resource prediction model training method and device

Cross References to Related Applications

This application is based on a Chinese patent application with application number 202111529876.1 and a filing date of December 15, 2021, and claims the priority of this Chinese patent application. The entire content of this Chinese patent application is hereby incorporated by reference into this application.

technical field

The present disclosure relates to the technical field of information processing, and in particular to a delivery information processing method, a resource prediction model training method and a device.

Background technique

In the information delivery system, new delivery information is continuously uploaded to the system and waits for delivery. In order to quickly identify delivery information with great potential from a large number of newly uploaded delivery information, the information delivery platform generally allocates corresponding cold-start resources to the newly uploaded delivery information, so that they can obtain greater delivery opportunities.

In related technologies, cold start resources are generally calculated directly based on the click/conversion unit price and ctr (click through rate, click through rate), without taking into account the long-term benefits of the newly uploaded delivery information on the delivery platform, and due to the newly uploaded The exposure behavior of the delivery information is less, resulting in inaccurate calculation of its ctr, and the corresponding calculation of the cold start resources is also inaccurate; thus, due to the inaccurate calculation of the cold start resources and the failure to consider the long-term benefits of the delivery information, the result is based on It is unreasonable to determine the selection results of the delivery information after the cold start resources are put into information delivery.

Contents of the invention

The present disclosure provides a delivery information processing method, a resource prediction model training method and a device.

According to the first aspect of the embodiments of the present disclosure, there is provided a delivery information processing method, including:

determining the initial state feature information of the target delivery information in the current delivery cycle; the initial state feature information of the target delivery information in the current delivery cycle is based on the initial state feature information of the target delivery information in the previous delivery cycle, and the delivery result information of the target delivery information in the last delivery cycle; the initial state characteristic information includes the historical delivery result information of the target delivery information before the current delivery cycle, and the target delivery information attribute information of the information;

Obtain a resource prediction model; the resource prediction model includes a conditional variational autoencoder network and a prediction execution network;

inputting the initial state feature information of the target delivery information in the current delivery period into the conditional variational self-encoding network for resource prediction to obtain the first resource;

Inputting the initial state feature information of the target delivery information in the current delivery cycle and the first resource into the prediction execution network for resource prediction to obtain a second resource;

The target resource corresponding to the target delivery information is obtained based on the first resource and the second resource; the target resource is a predicted resource that makes the delivery income of the target delivery information in the current delivery cycle meet the target delivery income .

In some embodiments, the initial state characteristic information of determining the target delivery information in the current delivery period includes:

Obtaining the initial state feature information of the target delivery information in the last delivery cycle; the start state feature information in the last delivery cycle includes the historical delivery of the target delivery information before the last delivery cycle result information;

Based on the delivery result information of the target delivery information in the last delivery period, the historical delivery result information is updated, and the initial state characteristic information of the target delivery information in the current delivery period is determined.

In some embodiments, the initial state feature information in the last delivery cycle also includes delivery setting information and category information of the target delivery information; the delivery setting information is used to sort multiple pieces of information to be delivered;

The updating of the historical delivery result information based on the delivery result information of the target delivery information in the previous delivery period, and determining the initial state characteristic information of the target delivery information in the current delivery period, including :

Based on the delivery setting information, the category information, and the updated historical delivery results, the initial state characteristic information of the target delivery information in the current delivery cycle is generated.

In some embodiments, the delivery information processing method further includes:

Calculate resource mean and resource variance in the current delivery cycle based on the predicted resources of each item of information to be delivered in the current delivery period;

calculating a normalization coefficient corresponding to the target resource according to the resource mean, the resource variance, and the target resource;

Based on the normalization coefficient and the preset amount of resources, determine actual resources allocated to the target delivery information within the current delivery cycle;

Based on the actual resources of the items of information to be delivered, the items of information to be delivered are sorted to obtain a sorting result.

In some embodiments, the sorting of the items of information to be delivered based on the actual resources of the items of information to be delivered to obtain a sorting result includes:

Based on the delivery setting information of the items of information to be delivered and the actual resources of the items of information to be delivered, the items of information to be delivered are sorted to obtain the sorting result.

Based on the ranking result, information delivery is performed within the current delivery cycle.

Obtain delivery result information of the target delivery information within the current delivery cycle; the delivery result information includes conversion data and delivery consumption data;

The delivery conversion data and the delivery consumption data are weighted and summed to obtain the delivery revenue of the target delivery information in the current delivery period.

According to a second aspect of the embodiments of the present disclosure, a method for training a resource prediction model is provided, including:

Obtaining sample data; the sample data includes initial state characteristic information of sample delivery information in each historical delivery cycle, and historical resources; the initial state characteristic information is used to characterize the sample delivery information in each historical delivery cycle Historical distribution characteristics before the beginning of the period;

Based on the initial state feature information and the historical resources, the preset condition variational autoencoder network is trained to obtain the target conditional variational autoencoder network;

Inputting the coding information of the historical resource by the target condition variational self-encoding network and the initial state feature information into the preset prediction execution network to perform resource prediction, and obtain the predicted resource corresponding to the historical resource;

Based on the historical resources, the forecast resources corresponding to the historical resources, and the target forecast analysis network, the preset forecast execution network is trained to obtain the target forecast execution network; the forecast resources obtained by the target forecast execution network are as follows: Resources for which the delivery income of the information to be delivered in the delivery cycle meets the target delivery income;

A resource prediction model is obtained based on the target conditional variational self-encoding network and the target prediction execution network.

In some embodiments, the training of the preset conditional variational autoencoder network based on the initial state feature information and the historical resources to obtain the target conditional variational autoencoder network includes:

Input the initial state feature information and the historical resources into the preset conditional variational self-encoding network, through the preset conditional variational autoencoding network, the initial state feature information and the historical resource fitting the data distribution information to obtain probability distribution information, and encoding the historical resources through the preset conditional variational self-encoding network to obtain encoding information corresponding to the historical resources;

Based on the probability distribution information, the historical resources, and the encoding information corresponding to the historical resources, the preset conditional variational autoencoder network is trained to obtain the target conditional variational autoencoder network.

In some embodiments, based on the probability distribution information, the historical resources, and the encoding information corresponding to the historical resources, the preset condition variational autoencoder network is trained to obtain the target condition Variational autoencoder networks include:

Obtaining a first loss component according to the probability distribution information and a standard normal distribution;

Obtaining a second loss component according to the historical resource and the encoding information corresponding to the historical resource;

Obtaining a first loss function based on the first loss component and the second loss component;

Adjusting network parameters of the preset conditional variational autoencoder network based on the first loss function to obtain the target conditional variational autoencoder network.

In some embodiments, the training of the preset forecast execution network based on the historical resources, the forecast resources corresponding to the historical resources, and the target forecast analysis network to obtain the target forecast execution network includes:

Inputting the initial state feature information and the forecast resources corresponding to the historical resources into the target predictive analysis network, and through the target predictive analysis network, the behavior of allocating the forecast resources based on the initial state feature information Perform analysis to obtain first analysis information;

Performing network parameter adjustment on the preset forecasting execution network based on the first analysis information to obtain the target forecasting execution network.

In some embodiments, the sample data also includes the historical delivery revenue of the sample delivery information in each historical delivery cycle, and updated status feature information; the updated status feature information is based on the initial status feature information and The sample delivery information is obtained from delivery result information within the historical delivery period;

The resource prediction model training method also includes:

inputting the initial state feature information and the historical resources into a preset predictive analysis network, analyzing the allocation of the historical resources based on the initial state feature information through the preset predictive analysis network, and obtaining a second analysis information;

Sampling historical resources based on the updated state feature information and the target condition variational self-encoding network to obtain a preset number of sampling resources;

Based on the update state characteristic information, determine the delivery revenue corresponding to the sampling resource;

Determining the sampling resource with the largest delivery revenue as the target sampling resource;

Based on the second analysis information, the historical investment revenue, and the investment revenue corresponding to the target sampling resource, network parameters are adjusted for the preset predictive analysis network to obtain a target predictive analysis network.

In some embodiments, the resource prediction model training method further includes:

Obtain the first delivery income of the delivered information within the target delivery period, and the second delivery income of the delivered information within the preset time period after the target delivery period; the target delivery period is the initial delivery period the last delivery cycle for ;

Based on the first delivery income and the second delivery income, obtaining historical delivery income corresponding to the target delivery period;

Based on the historical delivery revenue corresponding to the target delivery period, a sample corresponding to the delivered information in the target delivery period is generated.

According to a third aspect of the embodiments of the present disclosure, there is provided an apparatus for processing delivery information, including:

The state characteristic information determining unit is configured to determine the initial state characteristic information of the target delivery information in the current delivery cycle; the initial state characteristic information of the target delivery information in the current delivery cycle is based on the target delivery information in the previous The starting state characteristic information of the delivery period, and the delivery result information of the target delivery information in the last delivery period are obtained; the starting state characteristic information includes the history of the target delivery information before the current delivery period delivery result information, and attribute information of the target delivery information;

A resource forecasting model acquisition unit configured to acquire a resource forecasting model; the resource forecasting model includes a conditional variational self-encoding network and a forecasting execution network;

The first prediction unit is configured to input the initial state feature information of the target delivery information in the current delivery cycle into the conditional variational self-encoding network to perform resource prediction and obtain the first resource;

The second prediction unit is configured to input the initial state feature information of the target delivery information in the current delivery cycle and the first resource to the forecast execution network to perform resource prediction, and obtain a second resource;

The target resource determination unit is configured to obtain the target resource corresponding to the target delivery information based on the first resource and the second resource; the target resource is the delivery of the target delivery information in the current delivery period Forecasted resources whose revenue meets the target delivery revenue.

In some embodiments, the state feature information determining unit includes:

The first acquiring unit is configured to acquire the initial state feature information of the target delivery information in the last delivery cycle; the initial state feature information in the last delivery cycle includes the target delivery information in the Historical delivery result information before the previous delivery cycle;

The first update unit is configured to update the historical delivery result information based on the delivery result information of the target delivery information in the last delivery period, and determine that the target delivery information starts from the current delivery period Initial state feature information.

In some embodiments, the initial state feature information in the last delivery cycle also includes delivery setting information and category information of the target delivery information; the delivery setting information is used to set multiple target delivery information to be delivered Sort;

The first update unit includes:

The first generating unit is configured to generate initial state characteristic information of the target delivery information in the current delivery cycle based on the delivery setting information, the category information, and updated historical delivery results.

In some embodiments, the delivery information processing device further includes:

The first calculation unit is configured to calculate resource mean value and resource variance in the current delivery period based on the predicted resources of each item of information to be delivered in the current delivery period;

A second calculation unit configured to calculate a normalization coefficient corresponding to the target resource according to the resource mean, the resource variance, and the target resource;

The actual resource determining unit is configured to determine the actual resource allocated for the target delivery information in the current delivery period based on the normalization coefficient and the preset resource amount;

The first sorting unit is configured to sort the items of information to be delivered based on actual resources of the items of information to be delivered, and obtain a sorting result.

In some embodiments, the first sorting unit includes:

The second sorting unit is configured to sort the items of information to be delivered based on the delivery setting information of the items of information to be delivered and the actual resources of the items of information to be delivered, to obtain the sorting result.

The information delivery unit is configured to deliver information within the current delivery period based on the ranking result.

The second obtaining unit is configured to obtain delivery result information of the target delivery information within the current delivery cycle; the delivery result information includes conversion data and delivery consumption data;

The weighted summing unit is configured to perform weighted summation on the delivery conversion data and the delivery consumption data to obtain the delivery revenue of the target delivery information in the current delivery cycle.

According to a fourth aspect of the embodiments of the present disclosure, a resource prediction model training device is provided, including:

The sample data acquisition unit is configured to acquire sample data; the sample data includes initial state characteristic information of sample delivery information in each historical delivery cycle, and historical resources; the initial state characteristic information is used to characterize the The historical delivery characteristics of the sample delivery information before the start of each historical delivery period;

The first training unit is configured to train a preset conditional variational autoencoder network based on the initial state feature information and the historical resources to obtain a target conditional variational autoencoder network;

The third prediction unit is configured to input the encoding information of the historical resource by the target conditional variational self-encoding network and the initial state feature information into the preset prediction execution network to perform resource prediction, and obtain the same The predicted resources corresponding to the historical resources mentioned above;

The second training unit is configured to train the preset forecasting execution network based on the historical resources, the forecasting resources corresponding to the historical resources, and the target forecasting analysis network to obtain a target forecasting execution network; the target forecasting execution network The forecast resources obtained by network forecasting are the resources that make the delivery income of the information to be delivered in the delivery cycle meet the target delivery income;

The resource forecasting model determination unit is configured to obtain a resource forecasting model based on the target conditional variational autoencoder network and the target forecasting execution network.

In some embodiments, the first training unit includes:

An information input unit configured to input the initial state feature information and the historical resources into the preset conditional variational self-encoding network, through which the initial state Fitting feature information and data distribution information of the historical resources to obtain probability distribution information, and encoding the historical resources through the preset conditional variational self-encoding network to obtain encoding information corresponding to the historical resources ;

The third training unit is configured to train the preset conditional variational autoencoder network based on the probability distribution information, the historical resources, and the encoding information corresponding to the historical resources to obtain the target condition Variational Autoencoder Networks.

In some embodiments, the third training unit includes:

The first loss component determining unit is configured to obtain the first loss component according to the probability distribution information and the standard normal distribution;

The second loss component determining unit is configured to obtain a second loss component according to the historical resource and the encoding information corresponding to the historical resource;

a first loss function determining unit configured to obtain a first loss function based on the first loss component and the second loss component;

The first parameter adjustment unit is configured to perform network parameter adjustment on the preset conditional variational autoencoder network based on the first loss function to obtain the target conditional variational autoencoder network.

In some embodiments, the second training unit includes:

The first analysis information determination unit is configured to input the initial state feature information and the forecast resources corresponding to the historical resources into the target forecast analysis network, and use the target forecast analysis network to The state feature information allocates the behavior of the prediction resource for analysis, and obtains first analysis information;

The second parameter adjustment unit is configured to perform network parameter adjustment on the preset prediction execution network based on the first analysis information to obtain the target prediction execution network.

In some embodiments, the sample data also includes the historical delivery revenue of the sample delivery information in each historical delivery cycle, and updated status feature information; the updated status feature information is based on the initial status feature information and The sample delivery information is obtained from delivery result information within the historical delivery period.

The resource prediction model training device also includes:

The second analysis information determination unit is configured to input the initial state feature information and the historical resources into a preset predictive analysis network, and use the preset predictive analysis network to allocate the resources based on the initial state feature information. Analyze the above-mentioned historical resources to obtain the second analysis information;

The resource sampling unit is configured to perform historical resource sampling based on the update status feature information and the target conditional variational self-encoding network to obtain a preset number of sampling resources;

The delivery revenue determination unit is configured to determine the delivery revenue corresponding to the sampling resource based on the update status feature information;

The target sampling resource determination unit is configured to determine that the sampling resource with the largest delivery revenue is the target sampling resource;

The third parameter adjustment unit is configured to adjust the network parameters of the preset predictive analysis network based on the second analysis information, the historical investment income, and the investment income corresponding to the target sampling resource to obtain a target prediction Analyze the web.

In some embodiments, the resource prediction model training device further includes:

The third obtaining unit is configured to obtain the first delivery income of the delivered information within the target delivery period, and the second delivery income of the delivered information within a preset time period after the target delivery period; the The target delivery period is the last delivery period in the initial delivery phase;

A historical delivery revenue determining unit configured to obtain a historical delivery revenue corresponding to the target delivery cycle based on the first delivery revenue and the second delivery revenue;

The sample generating unit is configured to generate a sample corresponding to the delivered information in the target delivery period based on the historical delivery revenue corresponding to the target delivery period.

According to a fifth aspect of the embodiments of the present disclosure, there is provided an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to execute the instructions to implement The method for processing delivery information or the method for training a resource prediction model as described above.

According to a sixth aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium. When the instructions in the computer-readable storage medium are executed by the processor of the server, the server can execute the method for processing delivery information as described above. Or resource prediction model training method.

According to a seventh aspect of the embodiments of the present disclosure, there is provided a computer program product, the computer program product includes a computer program, the computer program is stored in a readable storage medium, at least one processor of a computer device reads from the The storage medium reads and executes the computer program, so that the device executes the above-mentioned delivery information processing method or resource prediction model training method.

This disclosure first determines the initial state information of the target delivery information in the current delivery cycle, and then inputs the initial state information into the conditional variational self-encoding network in the resource prediction model to perform resource prediction to obtain the first resource; then the initial state information Input the first resource into the resource prediction model and perform resource prediction to obtain the second resource; obtain the target resource corresponding to the target delivery information based on the first resource and the second resource; the target resource is to make the target delivery information in the current delivery The forecasted resource that the delivery income of the period meets the target delivery income. In this disclosure, the resources of the target delivery period are determined according to the delivery period, and different delivery periods correspond to different resources, that is, according to the initial state information of the target delivery period in each delivery period and the resource prediction model, the target delivery information is determined at the current time. The resources allocated in the delivery cycle are forecasted. The predicted resources are the resources that make the target delivery information in the current delivery cycle meet the target income, thus improving the rationality of resource allocation; further, according to the delivery income of the target delivery information in multiple delivery cycles Determine the cold start result, the cold start result meets the condition that the delivery income meets the target income, and select the delivery information that meets the delivery target based on the cold start result, thereby improving the efficiency of delivery information selection.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

Description of drawings

The accompanying drawings here are incorporated into the specification and constitute a part of the specification, show embodiments consistent with the disclosure, and are used together with the description to explain the principle of the disclosure, and do not constitute an improper limitation of the disclosure.

Fig. 1 is a schematic diagram showing an implementation environment according to an exemplary embodiment.

Fig. 2 is a flowchart showing a method for processing delivery information according to an exemplary embodiment.

Fig. 3 is a flow chart of a method for updating initial state feature information of delivery information according to an exemplary embodiment.

Fig. 4 is a flow chart showing a method for sorting delivery information based on forecast resources according to an exemplary embodiment.

Fig. 5 is a flow chart showing a method for calculating placement revenue according to an exemplary embodiment.

Fig. 6 is a schematic structural diagram of a resource prediction model according to an exemplary embodiment.

Fig. 7 is a flow chart showing a method for training a resource prediction model according to an exemplary embodiment.

Fig. 8 is a flowchart of a method for training a conditional variational autoencoder network according to an exemplary embodiment.

Fig. 9 is a flow chart of a method for adjusting parameters of a conditional variational autoencoder network according to an exemplary embodiment.

Fig. 10 is a flow chart of a method for training a target prediction execution network according to an exemplary embodiment.

Fig. 11 is a flow chart of a method for training a target analysis network according to an exemplary embodiment.

Fig. 12 is a flow chart of a sample generation method according to an exemplary embodiment.

Fig. 13 is a block diagram of an apparatus for processing delivery information according to an exemplary embodiment.

Fig. 14 is a block diagram of an apparatus for training a resource prediction model according to an exemplary embodiment.

Fig. 15 is a schematic structural diagram of an electronic device according to an exemplary embodiment.

Detailed ways

In order to enable ordinary persons in the art to better understand the technical solutions of the present disclosure, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings.

It should be noted that the terms "first" and "second" in the specification and claims of the present disclosure and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein can be practiced in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatuses and methods consistent with aspects of the present disclosure as recited in the appended claims.

Please refer to FIG. 1 , which shows a schematic diagram of an implementation environment provided by an embodiment of the present disclosure. The implementation environment may include: at least one first terminal 110 and a second terminal 120, and the first terminal 110 and the second terminal 120 may communicate via a network. data communication.

In some embodiments, the second terminal 120 can deliver the delivery information in the delivery system, and display the delivery information in response to the first terminal 110 receiving the delivery information, so that when the user browses the delivery information, Perform operations such as clicking to browse and converting after clicking; the second terminal 120 counts and analyzes the click data and conversion data of the delivery information according to the user's operation on the delivery information based on the first terminal 110 .

The first terminal 110 may communicate with the second terminal 120 based on a browser/server mode (Browser/Server, B/S) or a client/server mode (Client/Server, C/S). The first terminal 110 may include physical devices such as smart phones, tablet computers, notebook computers, digital assistants, smart wearable devices, vehicle terminals, servers, etc., and may also include software running on the physical devices, such as applications. The operating system running on the first terminal 110 in the embodiment of the present disclosure may include but not limited to Android system, IOS system, linux, windows and so on.

The second terminal 120 and the first terminal 110 can establish a communication connection through wired or wireless, and the second terminal 120 can include an independently operated server, or a distributed server, or a server cluster composed of multiple servers, wherein the server can be a cloud server.

The life cycle of delivery information can generally be divided into several stages such as exploration period, growth period, maturity period, and decline period. The delivery cycle in this disclosure can be a cycle of the cold start phase, and the cold start phase can correspond to the exploration in the life cycle. Expect. During the exploration period of delivery information, new delivery information is uploaded and delivered one after another. After the new delivery information has been delivered for a period of time and has accumulated a certain number of conversions, the delivery information with a better number of conversions can enter the growth period through the exploration period , and the delivery information with poor conversion quantity will fail the cold start, and will not be delivered in the future.

In order to avoid the unreasonable resource allocation and delivery information selection results in the related art, an embodiment of the present disclosure provides a delivery information processing method, please refer to FIG. 2 , the delivery information processing method may include steps S210 to S250.

S210. Determine the initial state characteristic information of the target delivery information in the current delivery period; the initial state characteristic information of the target delivery information in the current delivery period is based on the initial state characteristics of the target delivery information in the previous delivery period information, and the delivery result information of the target delivery information in the last delivery cycle; the initial state characteristic information includes the historical delivery result information of the target delivery information before the current delivery cycle, and the Attribute information of target delivery information.

The delivery period may be one period in the initial delivery period, and an initial delivery period may include multiple delivery periods, and the duration of each delivery period is generally the same. In some embodiments, the initial delivery phase may be a cold start phase of delivering information. For example, the cold start phase is 7 days, and every hour is a delivery cycle.

In some embodiments, since the state of delivery information changes with the delivery of information, at the initial moment of each delivery cycle, the initial state characteristic information of target delivery information can be determined first, and the initial state feature information of the current delivery cycle The information is obtained based on the initial state characteristic information of the target delivery information in the previous delivery period, and the delivery result information of the target delivery information in the previous delivery period. The initial state feature information includes historical delivery result information of the target delivery information before the current delivery cycle, and attribute information of the target delivery information.

S220. Acquire a resource prediction model; the resource prediction model includes a conditional variational autoencoder network and a prediction execution network.

In some embodiments, at the beginning of the delivery period, the resource forecasting model can aim at satisfying the target delivery revenue, and perform resource prediction on the resources that should be allocated to the target delivery information in the current delivery period, so that the target delivery information can Enter the subsequent processing steps of information delivery based on the forecasted resources.

S230. Input the initial state feature information of the target delivery information in the current delivery cycle into the conditional variational autoencoding network to perform resource prediction, and obtain a first resource.

S240. Input the initial state feature information of the target delivery information in the current delivery cycle and the first resource to the prediction execution network to perform resource prediction to obtain a second resource.

S250. Obtain the target resource corresponding to the target delivery information based on the first resource and the second resource; the target resource is such that the delivery income of the target delivery information in the current delivery cycle meets the target delivery income Forecast resources.

In some embodiments, the target resource may refer to a resource that assists in the delivery of information, and may be a resource that enables the target delivery information to be delivered as soon as possible during the information delivery process; that is, the larger the amount of target resources, the faster the delivery.

In some embodiments, the model training can be carried out with the goal of maximizing the delivery revenue of the delivery information, so as to obtain the resource prediction model, that is, the resource prediction model can maximize the delivery income of the current delivery period and/or the future time period as The target performs resource forecasting, so that the target resource can be the resource that maximizes the delivery revenue of the target delivery information in the current delivery cycle and/or in the future time period, where the future time period can refer to one or more delivery in the future Period, can also refer to the time period after the cold start phase.

In some embodiments, the resource prediction model may use an offline reinforcement learning model, since the purpose of resource allocation for new delivery information is to select delivery information with greater potential as soon as possible, and at the same time allocate resources to different delivery information so that its The long-term investment revenue is maximized, and the optimization goal of reinforcement learning is to maximize the overall revenue; in addition, reinforcement learning is a serialized decision-making problem, and in the cold start process of information distribution, it can also be determined in each delivery cycle The resources of the current delivery information in the next delivery cycle can also be regarded as a serialized decision-making problem; it can be seen that the method of reinforcement learning can be used for resource prediction to maximize the delivery revenue. In addition, offline reinforcement learning model training based on accumulated historical data can avoid the impact of data fluctuations during direct online exploration on the training results.

In this disclosure, the resources of the target delivery period are determined according to the delivery period, and different delivery periods correspond to different resources, that is, according to the initial state information of the target delivery period in each delivery period and the resource prediction model, the target delivery information is determined at the current time. The resources allocated in the delivery cycle are forecasted. The predicted resources are the resources that make the target delivery information in the current delivery cycle meet the target income, thus improving the rationality of resource allocation; further, according to the delivery income of the target delivery information in multiple delivery cycles Determine the cold start result, the cold start result meets the condition that the delivery income meets the target income, and select the delivery information that meets the delivery target based on the cold start result, thereby improving the efficiency of delivery information selection.

In some embodiments, please refer to FIG. 3 , which shows a method for updating characteristic information of an initial state of delivery information, and the method for processing delivery information may include steps S310 to S320.

S310. Obtain the initial state feature information of the target delivery information in the last delivery cycle; the start state feature information in the last delivery cycle includes the target delivery information before the last delivery cycle Historical delivery result information.

S320. Based on the delivery result information of the target delivery information in the last delivery period, update the historical delivery result information, and determine the initial state characteristic information of the target delivery information in the current delivery period.

The historical delivery result information may include the conversion information of the target delivery information before the start of the current delivery cycle, and the delivery setting information may include the bidding information of the target delivery information. Since the status of the delivery information changes with the delivery of the information, every At the initial moment of a delivery cycle, the initial state feature information of the target delivery information can be determined first, based on the delivery result information of the target delivery information in the previous cycle, the historical delivery result information in the initial state information of the previous delivery cycle By updating, the initial state characteristic information of the target delivery information in the current delivery cycle can be obtained. At the beginning of each delivery cycle, the initial state characteristic information can be adaptively updated based on the initial state information of the target delivery information in the previous delivery cycle and the delivery result information, thereby improving the impact of the initial state feature information on The accuracy of the current state feature representation of target delivery information.

In some embodiments, the initial state feature information in the last delivery cycle further includes delivery setting information and category information of the target delivery information; the delivery setting information is used to sort multiple pieces of information to be delivered; Therefore, when the initial state feature information is determined, the initial state feature of the target delivery information in the current delivery cycle can be generated based on the delivery setting information, the category information, and the updated historical delivery results. information.

Category information refers to the category characteristics used to characterize target delivery information, such as field category, information category, creative category, etc., where field categories may include e-commerce categories, game categories, education categories, etc.; information categories may include video categories, picture categories, Graphic category, etc.; creative category may include poster category, layout category, etc. Among them, historical delivery result information and delivery setting information are continuous features, and category information is discrete features. In the case of generating state feature information, the values corresponding to the historical delivery result information and delivery setting information can be normalized, and the category information can be generated by one-hot encoding to generate corresponding encoding vectors, based on the normalized and encoded The corresponding state feature information generated by the feature information can facilitate subsequent data processing and improve data processing efficiency.

In addition, the present disclosure uses multi-dimensional feature information to describe the initial state feature information of target delivery information from different angles, which can improve the ability to represent target delivery information, thereby improving the accuracy of subsequent data processing based on state feature information. sex.

In some embodiments, please refer to FIG. 4 , which shows a method for sorting delivery information based on forecast resources, which may include steps S410 to S440.

S410. Based on the predicted resources of each item of information to be delivered in the current delivery period, calculate the resource mean value and resource variance in the current delivery period.

S420. Calculate a normalization coefficient corresponding to the target resource according to the resource mean value, the resource variance, and the target resource.

S430. Based on the normalization coefficient and the preset resource amount, determine actual resources allocated to the target delivery information within the current delivery cycle.

S440. Based on the actual resources of the items of information to be delivered, sort the items of information to be delivered, and obtain a sorting result.

In this disclosure, the resources predicted by the resource prediction model are the amount of resources that should be obtained in the delivery cycle, but the amount of resources for each delivery information in each delivery cycle is limited, so directly using the predicted resources obtained by prediction will bring In order to make the resource allocation result match the current total resource amount, the predicted resources can be normalized, and the normalization coefficient corresponding to each delivery information can be obtained. As shown in formula (1):

Among them, a _i is the target resource corresponding to the target delivery information, avg(a) is the resource average value of each delivery information in the current cold start cycle, and std(a) is the resource variance of each delivery information in the current delivery cycle. Therefore, the normalization coefficient corresponding to the target delivery information is applied to the current existing resource allocation strategy to obtain the actual resources corresponding to the target delivery information. Based on the constraints of the total amount of resources, this disclosure normalizes the predicted resources. Since the average value of all a _i ' is 1, it is possible to control the sum of the actual resources allocated to each delivery information to be equal to the total amount of resources. Match to avoid the fact that resources exceed budget.

In some embodiments, in response to determining the actual resources to be allocated to the target delivery information, the target delivery information to be delivered may be sorted based on the actual resources to obtain a sorting result; the sorting result includes a plurality of pieces of information to be delivered sorting situation. Further, based on the above content, it can be known that the various items of information to be delivered can be sorted based on the delivery setting information of the items of information to be delivered and the actual resources of the items of information to be delivered, and the sorting result can be obtained . Based on the sorting results, information delivery can be carried out in the current delivery cycle, that is, which target information needs to be delivered in the current delivery cycle can be determined based on the sorting results, for example, the top N items of target delivery information can be selected from the sorting results. delivery.

In some embodiments, sorting is performed based on the ranking scores of each target delivery information from high to low, and the calculation formula of the ranking scores is shown in formula (2):

rank_benefits=ecpm+bonus+ueq (2)

Among them, ecpm (estimated Cost per Million) is the estimated cost per thousand impressions, which can be obtained based on the above delivery setting information and click-through rate, bonus is the target resource, and ueq (user experience quantity) is the user experience score. From the sorting results, the target delivery information ranked in the top N positions can be selected for delivery.

In some embodiments, please refer to FIG. 5 , which shows a method for calculating advertising revenue, which may include steps S510 to S520. :

S510. Obtain delivery result information of the target delivery information within the current delivery period; the delivery result information includes delivery conversion data and delivery consumption data.

S520. Perform a weighted summation of the delivery conversion data and the delivery consumption data to obtain delivery revenue of the target delivery information in the current delivery period.

Information delivery is carried out based on the predicted resources in each delivery cycle. In response to the end of the current delivery cycle, the delivery income in the current delivery cycle can be determined. The delivery income here can be regarded as the target delivery information in the current state Next, the delivery income obtained by allocating target resources to the target delivery information, the current state is represented by the state characteristic information of the current delivery cycle.

The delivery conversion data can be the conversion rate, and the delivery consumption data can be the delivery bid when the information is delivered. These two pieces of information can be used as the delivery result information, and the weight of the delivery conversion data and delivery consumption data can be determined accordingly, such as the weight of the delivery conversion data. It can be 1, and the weight of delivery consumption data can be 0.05, that is, the delivery data is the main and delivery consumption data is supplemented to determine the delivery revenue in the current delivery cycle. In this way, the delivery income of the target delivery information in the current delivery cycle can be determined based on the weighted sum of the delivery conversion data and the delivery consumption data, which improves the accuracy and convenience of determining the delivery income.

The delivery income of the target delivery information in the cold start phase is determined by the delivery income of the target delivery information in multiple delivery cycles, so that the corresponding cold start result can be determined based on the delivery income in the cold start phase. In some embodiments, in response to the delivery revenue in the cold start phase is greater than or equal to the preset cold start revenue threshold, it is determined that the target delivery information passes the cold start phase and enters the mature stage; in response to the delivery revenue in the cold start phase is less than the preset cold start Start the revenue threshold, determine that the target delivery information has not passed the cold start stage, and will not be delivered in the future. Therefore, the target delivery information passing through the cold start stage is selected to continue delivering, and these target delivery information passing through the cold start stage are all delivery information expected to have relatively large long-term delivery benefits during the cold start stage.

In some embodiments, please refer to FIG. 6, which shows a schematic structural diagram of a resource prediction model, which may include a conditional variational autoencoder network, a prediction execution network, and a prediction analysis network. The state state corresponds to the state feature information, and the action corresponds to the assigned The return r corresponds to the investment income; the predictive execution network is responsible for predicting the appropriate action for the current state, and the predictive analysis network evaluates the quality of the currently predicted action according to the current state and action.

In some embodiments, the conditional VAE network (Conditional VAE) G _ω includes an encoder (encoder) and a decoder (decoder), the function of the encoder is to encode the state and action, so that the encoding result is consistent with the standard The function of the decoder is to restore the encoder, so that the standard normal distribution can be close to the actual action and state distribution after the decoder. In the conditional variational autoencoder network, the input is the current state state and action. These features pass through 2 layers of MLP (multi-layer perceptron, multi-layer neural network) to obtain a set of mean and variance; from the set of mean and A sample is sampled by the variance, and then a 2-layer MLP is used to obtain a set of restoration information for the encoder input information.

For the predictive execution network actor, its input includes two parts, one part is the current state, and the other part is the output of the decoder of the conditional variational autoencoder network; the output of the network is a'=G _ω (s)+w*ε _φ (G _ω (s),s). In the embodiment of the present disclosure, w may be 0.001. When w is set too large, it is easy for the algorithm not to converge, and if it is too small, the influence of the output of the predictive execution network on the final result is limited.

The input of the predictive analysis network critic includes the current state state and the current action action, and the goal that the predictive analysis network needs to fit is the overall income under the action, that is, the current action income plus the future income, that is

Here r is the return. The future payoff is the maximum value of Q _θ (s') at the next state (s').

In some embodiments, please refer to FIG. 7 , which shows a method for training a resource prediction model, which may include steps S710 to S750.

S710. Acquire sample data; the sample data includes the initial state characteristic information of the sample delivery information in each historical delivery cycle, and historical resources; the initial state characteristic information is used to represent the sample delivery information in each Historical delivery characteristics before the start of the historical delivery period.

S720. Based on the initial state feature information and the historical resources, train a preset conditional variational autoencoder network to obtain a target conditional variational autoencoder network.

S730. Input the encoding information of the historical resource by the target conditional variational self-encoding network and the initial state feature information into the preset prediction execution network for resource prediction, and obtain the prediction corresponding to the historical resource resource.

S740. Based on the historical resources, the forecast resources corresponding to the historical resources, and the target forecast analysis network, train the preset forecast execution network to obtain a target forecast execution network; the forecast resources obtained by the target forecast execution network Resources to make the delivery income of the information to be delivered meet the target delivery income in the delivery cycle.

In some embodiments, in the process of training the preset prediction execution network, the target prediction analysis network can be used to evaluate the prediction results of the current prediction execution network to obtain an evaluation score, that is, when the target prediction analysis network evaluates the current prediction execution network If the evaluation score of the network is greater than or equal to the preset score, it can be considered that the training of the preset predictive execution network has reached the convergence condition, the training of the preset predictive execution network is ended, and the current predictive execution network is determined as the target predictive execution network.

In addition, when the target forecasting analysis network evaluates the current forecasting execution network, it is determined based on the resources predicted by the current forecasting execution network, so that the delivery income of the information to be delivered in the delivery period meets the target delivery income. That is The prediction resources of the current prediction execution network make the delivery revenue of the information to be delivered in the delivery period closer to the target delivery income, and the corresponding evaluation score is higher.

S750. Obtain a resource prediction model based on the target conditional variational autoencoder network and the target prediction execution network.

In the embodiment of the present disclosure, the conditional variational autoencoder network, the prediction execution network and the prediction analysis network can be trained separately through alternate training, that is, in the case of training one network each time, the other two networks can be kept unchanged . The conditional variational autoencoder network can be trained first. When the conditional variational autoencoder network is trained to a preset level, for example, after training N rounds, the conditional variational autoencoder network, predictive execution network, and predictive analysis can be started. The network is trained and the three are alternately trained. In addition, in the reinforcement learning model, in order to train the predictive analysis network better, the predictive analysis network and the predictive execution network can be trained at a frequency of M:1, M≥2, so that the predictive analysis network can converge faster.

The resource prediction model includes the decoder part of the conditional variational autoencoder network and the prediction execution network, so that based on the trained conditional variational autoencoder network and prediction execution network, a resource prediction model for resource prediction can be obtained.

In some embodiments, the form of training samples for reinforcement learning is mainly (s, a, r, s'), where s represents the current state (state) of the agent and the environment, and a represents the state of the environment. Action (action), r represents the reward (reward) given by the environment after taking action a, and s' represents the next state (next state) reached by the agent and the environment after action a.

In the embodiment of the present disclosure, since the offline reinforcement learning model is used, the sample data used for model training are all historical data; the sample delivery information may refer to the history that has gone through the cold start phase and determined the corresponding cold start result Delivery information, where the historical delivery information data in each delivery cycle and the final cold start result are known, so that sample data can be constructed based on these known data.

Corresponding to the sample form of the offline reinforcement learning model, each delivery cycle is used as a sample unit to obtain a sample pair corresponding to each delivery cycle, and each sample contains the initial state feature information, resources, and delivery revenue of the current delivery cycle , and the starting state characteristic information for the next serving cycle. Therefore, the sample is constructed in the form of a reinforcement learning sample, so that the constructed sample can be applied to the analysis method of the reinforcement learning model, thereby improving the adaptability of the constructed sample and the efficiency of sample construction.

In some embodiments, please refer to FIG. 8 , which shows a method for training a conditional variational autoencoder network, and the method may include steps S810 to S820.

S810. Input the initial state feature information and the historical resources into the preset conditional variational autoencoder network, and use the preset conditional variational autoencoder network to analyze the initial state feature information and the Fitting the data distribution information of historical resources to obtain probability distribution information, and encoding the historical resources through the preset conditional variational self-encoding network to obtain encoding information corresponding to the historical resources.

S820. Based on the probability distribution information, the historical resources, and the encoding information corresponding to the historical resources, train the preset conditional variational autoencoder network to obtain the target conditional variational autoencoder network.

During the model training process, the input of the conditional variational autoencoder network can include initial state feature information and historical resources, and the output can include probability distribution information and encoding information for historical resources.

In some embodiments, the conditional variational autoencoder network may also include a first network and a second network. In the case of inputting initial state feature information and historical resources, the corresponding first network outputs probability distribution information, and the second network The encoding information of historical resources is output, and the first network is connected in series with the second network.

In some embodiments, the conditional variational self-encoding network can be an independent encoding network. In the case of inputting initial state feature information and historical resources, the output of the corresponding independent encoding network includes probability distribution information and encoding of historical resources Information Two pieces of information.

In some embodiments, please refer to FIG. 9 , which shows a method for adjusting parameters of a conditional variational autoencoder network, and the method may include steps S910 to S940.

S910. Obtain a first loss component according to the probability distribution information and the standard normal distribution.

S920. Obtain a second loss component according to the historical resource and the encoding information corresponding to the historical resource.

S930. Obtain a first loss function based on the first loss component and the second loss component.

S940. Perform network parameter adjustment on the preset conditional variational autoencoder network based on the first loss function to obtain the target conditional variational autoencoder network.

It can be seen from Figure 6 that the output of the encoder of the conditional variational autoencoder network is probability distribution information, and the output of the decoder of the conditional variational autoencoder network is the restoration information of the input action action, so that based on these two pieces of information Determine the loss function corresponding to the model, as shown in formula (3):

loss1＝x-x'+KL(N(μ,σ),N(0,1)) (3)

Among them, x corresponds to the action action input by the encoder of the conditional variational autoencoder network, and x' corresponds to the restoration information of the input action action output by the decoder of the conditional variational autoencoder network; N(σ,μ) is the condition The probability distribution information output by the encoder of the variational autoencoder network, N(0,1) is a normal distribution, so that the parameters of the conditional variational autoencoder network can be adjusted based on the loss function, and then the trained conditional variational autoencoder can be obtained. Coding network.

In some embodiments, please refer to FIG. 10 , which shows a method for training a target prediction execution network, which may include steps S1010 to S1020.

S1010. Input the initial state feature information and the forecast resources corresponding to the historical resources into the target predictive analysis network, and allocate the forecast resources based on the initial state feature information through the target predictive analysis network Behavior analysis to obtain the first analysis information.

S1020. Perform network parameter adjustment on the preset prediction execution network based on the first analysis information to obtain the target prediction execution network.

Since the prediction execution network is currently trained, the conditional variational autoencoder network and the predictive analysis network can be kept unchanged, and the conditional variational autoencoder network and the predictive analysis network can be directly used for data processing. The input of the predictive execution network depends on the output of the conditional variational autoencoder network decoder. In the case of training the predictive execution network, the state state and action in the current sample pair are input to the current conditional variational autoencoder network. Obtain the action restoration information for the input action action, input the action restoration information and the state state in the current sample pair to the predictive execution network, and obtain the output action action (that is, the cold start resource output information). Then input the output action action of the predictive execution network and the state state in the current sample pair to the predictive analysis network, and the predictive analysis network will give the evaluation score Q-value (that is, the return) obtained by taking the output action action in the state state. The parameters of the predictive execution network are adjusted based on the evaluation scores of the predictive analysis network, so that the predictive analysis network has a higher evaluation score for the output action action of the predictive execution network; thus, by continuously adjusting the parameters of the predictive execution network, the trained predictive execution network is obtained. network.

In some embodiments, the sample data also includes the historical delivery revenue of the sample delivery information in each historical delivery period, and update status feature information; further, please refer to FIG. 11 , which shows the target analysis network training method , the method may include step S110 to step S1150.

S1110. Input the initial state feature information and the historical resources into a preset predictive analysis network, analyze the allocation of the historical resources based on the initial state feature information through the preset predictive analysis network, and obtain the first 2. Analyzing information.

S1120. Perform historical resource sampling based on the updated state feature information and the target condition variational autoencoder network to obtain a preset number of sampled resources.

S1130. Based on the update state feature information, determine the placement revenue corresponding to the sampling resources.

S1140. Determine that the sampling resource with the largest delivery revenue is the target sampling resource.

S1150. Based on the second analysis information, the historical investment revenue, and the investment revenue corresponding to the target sampling resource, perform network parameter adjustment on the preset predictive analysis network to obtain a target predictive analysis network.

In the case of training the predictive analysis network, the conditional variational autoencoder network and the predictive execution network are kept unchanged, and the training of the predictive analysis network is realized with the training goal of maximizing the investment income. The output of the predictive analysis network can be the income corresponding to the input state and action, that is, the income that can be obtained by taking an action in the state state. In the embodiment of the disclosure, the target of the predictive analysis network is the overall income of the current action, which may include the income of the current action in the current cold start cycle and the income in the next cold start cycle, so that the current income and the future Income, the future income can refer to the maximum value of the investment income in the next state, and the next state can be the state corresponding to the next cold start cycle, that is,

As the target income, compare the evaluation score (that is, the income return) obtained by the current action action and the current state state input with the target income, and update the parameters of the predictive analysis network according to the comparison results, so as to obtain the trained predictive analysis network.

Among them, for r in the target income, it is the delivery income r in the current sample pair. For the maximum delivery income in the next state, the bootstrap technique can be used to sample the action, and based on the next state s', it can be sampled For the preset number of sampling actions, according to the sample data, the delivery income corresponding to each sampling action can be found out, and the maximum delivery income can be determined from it. The maximum delivery income can be used as the future income in the target income.

In some embodiments, please refer to FIG. 12 , which shows a sample generation method, which may include steps S1210 to S1230.

S1210. Obtain the first delivery income of the delivered information within the target delivery period, and the second delivery income of the delivered information within a preset time period after the target delivery period; the target delivery period is the initial delivery The last flight cycle in the stage.

S1220. Based on the first delivery income and the second delivery income, obtain historical delivery income corresponding to the target delivery period.

S1230. Based on the historical delivery revenue corresponding to the target delivery period, generate a sample of the delivered information in the target delivery period.

In some embodiments, the target delivery information may be delivery information that has successfully passed the cold start phase, so that for the last delivery cycle of the initial delivery phase, its corresponding sample delivery revenue may include delivery revenue in the last delivery cycle, and delivery earnings after the initial delivery period; for example, delivery earnings for the last delivery cycle may include delivery earnings for the last delivery cycle and delivery earnings for the next three hours. Because the delivery income obtained after the last delivery period is brought by the target resources allocated in the last delivery period, when determining the delivery income of the last delivery cycle, the delivery income at the future moment is considered in the Within, the accuracy of the sample data is improved.

Fig. 13 is a block diagram of a delivery information processing device according to an exemplary embodiment, the device includes a state characteristic information determination unit 1310, a resource prediction model acquisition unit 1320, a first prediction unit 1330, a second prediction unit 1340, and a target resource Determining unit 1350 . The state characteristic information determining unit 1310 is configured to determine the initial state characteristic information of the target delivery information in the current delivery period. The initial state characteristic information of the target delivery information in the current delivery cycle is based on the initial state feature information of the target delivery information in the previous delivery cycle, and the delivery of the target delivery information in the last delivery cycle Result information is obtained; the initial state feature information includes historical delivery result information of the target delivery information before the current delivery period, and attribute information of the target delivery information.

The resource forecasting model acquiring unit 1320 is configured to acquire a resource forecasting model. The resource prediction model includes a conditional variational autoencoder network and a prediction execution network.

The first prediction unit 1330 is configured to input the initial state feature information of the target delivery information in the current delivery period into the conditional variational self-encoding network to perform resource prediction and obtain the first resource.

The second prediction unit 1340 is configured to input the initial state feature information of the target delivery information in the current delivery period and the first resource to the forecast execution network to perform resource prediction to obtain a second resource.

The target resource determining unit 1350 is configured to obtain the target resource corresponding to the target delivery information based on the first resource and the second resource; The forecast resources whose delivery revenue meets the target delivery revenue.

In some embodiments, the state feature information determining unit 1310 includes:

The first updating unit includes: a first generating unit configured to generate the target delivery information at the beginning of the current delivery cycle based on the delivery setting information, the category information, and the updated historical delivery results. Initial state feature information.

In some embodiments, the first sorting unit includes: a second sorting unit configured to, based on the delivery setting information of the items of information to be delivered and the actual resources of the items of information to be delivered, sort the The information to be delivered is sorted to obtain the sorting result.

In some embodiments, the device for processing information delivery further includes: an information delivery unit configured to deliver information within the current delivery period based on the sorting result.

The weighted summing unit is configured to perform weighted summation on the delivery conversion data and the delivery consumption data to obtain the delivery revenue of the target delivery information in the current delivery period.

Please refer to FIG. 14 , which shows a block diagram of a resource prediction model training device, including a sample data acquisition unit 1410 , a first training unit 1420 , a third prediction unit 1430 , a second training unit 1440 and a resource prediction model determination unit 1450 .

The sample data acquiring unit 1410 is configured to acquire sample data. The sample data includes the initial state characteristic information of the sample delivery information in each historical delivery period, and historical resources; the initial state characteristic information is used to represent the start of the sample delivery information in each historical delivery period Historical delivery characteristics before the moment.

The first training unit 1420 is configured to train a preset conditional variational autoencoder network based on the initial state feature information and the historical resources to obtain a target conditional variational autoencoder network.

The third prediction unit 1430 is configured to input the encoding information of the historical resource by the target condition variational self-encoding network and the initial state characteristic information into the preset prediction execution network for resource prediction, and obtain the same The predicted resource corresponding to the historical resource.

The second training unit 1440 is configured to train the preset forecast execution network based on the historical resource, the forecast resource corresponding to the historical resource, and the target forecast analysis network to obtain a target forecast execution network. The forecast resources obtained by performing network forecasting in the target forecast are resources that make the delivery revenue of the information to be delivered in the delivery cycle meet the target delivery income.

The resource prediction model determining unit 1450 is configured to obtain a resource prediction model based on the target conditional variational autoencoder network and the target prediction execution network.

In some embodiments, the first training unit 1420 includes:

In some embodiments, the third training unit includes:

In some embodiments, the second training unit 1440 includes:

The resource prediction model training device also includes:

The resource sampling unit is configured to perform historical resource sampling based on the update status feature information and the target condition variational self-encoding network to obtain a preset number of sampling resources;

With regard to the apparatus in the foregoing embodiments, the manner in which each module executes operations has been described in detail in embodiments related to the method, and will not be described in detail here.

In an exemplary embodiment, a computer-readable storage medium including instructions is also provided. In some embodiments, the computer-readable storage medium can be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, and optical data storage device, etc.; When executed by the processor, the electronic device can execute any one of the above methods.

In an exemplary embodiment, there is also provided a computer program product comprising a computer program stored in a readable storage medium from which at least one processor of a computer device reads Reading and executing the computer program causes the device to perform any of the above methods.

This embodiment also provides a device, its structural diagram please refer to Figure 15, the device 1500 may have relatively large differences due to different configurations or performances, and may include one or more central processing units (central processing units, CPU) 1522 (eg, one or more processors) and memory 1532, one or more storage media 1530 (eg, one or more mass storage devices) for storing application programs 1542 or data 1544. Wherein, the memory 1532 and the storage medium 1530 may be temporary storage or persistent storage. The program stored in the storage medium 1530 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the device. Furthermore, the central processing unit 1522 may be configured to communicate with the storage medium 1530 , and execute a series of instruction operations in the storage medium 1530 on the device 1500 . Device 1500 may also include one or more power sources 1526, one or more wired or wireless network interfaces 1550, one or more input and output interfaces 1558, and/or, one or more operating systems 1541, such as Windows Server ^™ , Mac OS X ^™ , Unix ^™ , Linux ^™ , FreeBSD ^™ , etc. Any of the above-mentioned methods in this embodiment can be implemented based on the device shown in FIG. 15 .

All the embodiments of the present disclosure can be implemented independently or in combination with other embodiments, which are all regarded as the scope of protection required by the present disclosure.

Claims

A delivery information processing method, comprising:

determining the initial state feature information of the target delivery information in the current delivery cycle; the initial state feature information of the target delivery information in the current delivery cycle is based on the initial state feature information of the target delivery information in the previous delivery cycle, and the delivery result information of the target delivery information in the last delivery cycle; the initial state characteristic information includes the historical delivery result information of the target delivery information before the current delivery cycle, and the target delivery information attribute information of the information;

Obtain a resource prediction model; the resource prediction model includes a conditional variational autoencoder network and a prediction execution network;

inputting the initial state feature information of the target delivery information in the current delivery period into the conditional variational self-encoding network for resource prediction to obtain the first resource;

Inputting the initial state feature information of the target delivery information in the current delivery cycle and the first resource into the prediction execution network for resource prediction to obtain a second resource;

The target resource corresponding to the target delivery information is obtained based on the first resource and the second resource; the target resource is a predicted resource that makes the delivery income of the target delivery information in the current delivery cycle meet the target delivery income .
The method according to claim 1, wherein said determining the initial state feature information of the target delivery information in the current delivery cycle includes:

Obtaining the initial state feature information of the target delivery information in the last delivery cycle; the start state feature information in the last delivery cycle includes the historical delivery of the target delivery information before the last delivery cycle result information;

Based on the delivery result information of the target delivery information in the last delivery period, the historical delivery result information is updated, and the initial state characteristic information of the target delivery information in the current delivery period is determined.
The method according to claim 2, wherein the initial state characteristic information in the last delivery cycle further includes delivery setting information and category information of the target delivery information; the delivery setting information is used for multiple waiting Sort the delivery information;

The updating of the historical delivery result information based on the delivery result information of the target delivery information in the previous delivery period, and determining the initial state characteristic information of the target delivery information in the current delivery period, including :

Based on the delivery setting information, the category information, and the updated historical delivery results, the initial state characteristic information of the target delivery information in the current delivery cycle is generated.
The method according to claim 3, further comprising:

Calculate resource mean and resource variance in the current delivery cycle based on the predicted resources of each item of information to be delivered in the current delivery period;

calculating a normalization coefficient corresponding to the target resource according to the resource mean, the resource variance, and the target resource;

Based on the normalization coefficient and the preset amount of resources, determine actual resources allocated to the target delivery information within the current delivery cycle;

Based on the actual resources of the items of information to be delivered, the items of information to be delivered are sorted to obtain a sorting result.
The method according to claim 4, wherein, based on the actual resources of each item of information to be delivered, sorting the items of information to be delivered to obtain a sorting result includes:

Based on the delivery setting information of the items of information to be delivered and the actual resources of the items of information to be delivered, the items of information to be delivered are sorted to obtain the sorting result.
The method according to claim 5, further comprising:

Based on the ranking result, information delivery is performed within the current delivery period.
The method according to claim 4, further comprising:

Obtain delivery result information of the target delivery information within the current delivery cycle; the delivery result information includes conversion data and delivery consumption data;

The delivery conversion data and the delivery consumption data are weighted and summed to obtain the delivery revenue of the target delivery information in the current delivery period.
A resource prediction model training method, comprising:

Obtaining sample data; the sample data includes initial state characteristic information of sample delivery information in each historical delivery cycle, and historical resources; the initial state characteristic information is used to characterize the sample delivery information in each historical delivery cycle Historical distribution characteristics before the beginning of the period;

Based on the initial state feature information and the historical resources, the preset condition variational autoencoder network is trained to obtain the target conditional variational autoencoder network;

Inputting the coding information of the historical resource by the target condition variational self-encoding network and the initial state feature information into the preset prediction execution network to perform resource prediction, and obtain the predicted resource corresponding to the historical resource;

Based on the historical resources, the forecast resources corresponding to the historical resources, and the target forecast analysis network, the preset forecast execution network is trained to obtain the target forecast execution network; the forecast resources obtained by the target forecast execution network are as follows: Resources for which the delivery income of the information to be delivered in the delivery cycle meets the target delivery income;

A resource prediction model is obtained based on the target conditional variational self-encoding network and the target prediction execution network.
The method according to claim 8, wherein the training of the preset conditional variational autoencoder network based on the initial state feature information and the historical resources to obtain the target conditional variational autoencoder network comprises:

Input the initial state feature information and the historical resources into the preset conditional variational self-encoding network, through the preset conditional variational autoencoding network, the initial state feature information and the historical resource fitting the data distribution information to obtain probability distribution information, and encoding the historical resources through the preset conditional variational self-encoding network to obtain encoding information corresponding to the historical resources;

Based on the probability distribution information, the historical resources, and the encoding information corresponding to the historical resources, the preset conditional variational autoencoder network is trained to obtain the target conditional variational autoencoder network.
The method according to claim 9, wherein, based on the probability distribution information, the historical resources, and the encoding information corresponding to the historical resources, the preset conditional variational autoencoder network is trained, Obtaining the target conditional variational autoencoder network includes:

Obtaining a first loss component according to the probability distribution information and a standard normal distribution;

Obtaining a second loss component according to the historical resource and the encoding information corresponding to the historical resource;

Obtaining a first loss function based on the first loss component and the second loss component;

Adjusting network parameters of the preset conditional variational autoencoder network based on the first loss function to obtain the target conditional variational autoencoder network.
The method according to claim 8, wherein the preset forecasting execution network is trained based on the historical resources, the forecasting resources corresponding to the historical resources, and the target forecasting analysis network, and the target forecasting execution network includes :

Inputting the initial state feature information and the forecast resources corresponding to the historical resources into the target predictive analysis network, and through the target predictive analysis network, the behavior of allocating the forecast resources based on the initial state feature information Perform analysis to obtain first analysis information;

Performing network parameter adjustment on the preset forecasting execution network based on the first analysis information to obtain the target forecasting execution network.
The method according to claim 8, wherein the sample data further includes the historical delivery revenue of the sample delivery information in each historical delivery period, and update status feature information; the update status feature information is based on the start The initial state feature information and the delivery result information of the sample delivery information in the historical delivery period are obtained;

The method also includes:

inputting the initial state feature information and the historical resources into a preset predictive analysis network, analyzing the allocation of the historical resources based on the initial state feature information through the preset predictive analysis network, and obtaining a second analysis information;

Sampling historical resources based on the updated state feature information and the target condition variational self-encoding network to obtain a preset number of sampling resources;

Based on the update state characteristic information, determine the delivery revenue corresponding to the sampling resources;

Determining the sampling resource with the largest delivery revenue as the target sampling resource;

Based on the second analysis information, the historical investment revenue, and the investment revenue corresponding to the target sampling resource, network parameters are adjusted for the preset predictive analysis network to obtain a target predictive analysis network.
The method of claim 8, further comprising:

Obtain the first delivery income of the delivered information within the target delivery period, and the second delivery income of the delivered information within the preset time period after the target delivery period; the target delivery period is the initial delivery period the last delivery cycle for ;

Based on the first delivery income and the second delivery income, obtaining historical delivery income corresponding to the target delivery period;

Based on the historical delivery revenue corresponding to the target delivery period, a sample corresponding to the delivered information in the target delivery period is generated.
A delivery information processing device, comprising:

The state characteristic information determining unit is configured to determine the initial state characteristic information of the target delivery information in the current delivery cycle; the initial state characteristic information of the target delivery information in the current delivery cycle is based on the target delivery information in the previous The starting state characteristic information of the delivery period, and the delivery result information of the target delivery information in the last delivery period are obtained; the starting state characteristic information includes the history of the target delivery information before the current delivery period delivery result information, and attribute information of the target delivery information;

A resource forecasting model acquisition unit configured to acquire a resource forecasting model; the resource forecasting model includes a conditional variational self-encoding network and a forecasting execution network;

The first prediction unit is configured to input the initial state feature information of the target delivery information in the current delivery cycle into the conditional variational self-encoding network to perform resource prediction and obtain the first resource;

The second prediction unit is configured to input the initial state feature information of the target delivery information in the current delivery cycle and the first resource to the forecast execution network to perform resource prediction, and obtain a second resource;

The target resource determination unit is configured to obtain the target resource corresponding to the target delivery information based on the first resource and the second resource; the target resource is the delivery of the target delivery information in the current delivery period Forecasted resources whose revenue meets the target delivery revenue.
The device according to claim 14, wherein the state characteristic information determining unit comprises:

The first acquiring unit is configured to acquire the initial state feature information of the target delivery information in the last delivery cycle; the initial state feature information in the last delivery cycle includes the target delivery information in the Historical delivery result information before the previous delivery cycle;

The first update unit is configured to update the historical delivery result information based on the delivery result information of the target delivery information in the last delivery period, and determine that the target delivery information starts from the current delivery period Initial state feature information.
The device according to claim 15, wherein the initial state feature information in the last delivery period further includes delivery setting information and category information of the target delivery information; the delivery setting information is used for multiple waiting Sort the target delivery information of delivery;

The first update unit includes:

The first generating unit is configured to generate initial state characteristic information of the target delivery information in the current delivery cycle based on the delivery setting information, the category information, and updated historical delivery results.
The apparatus of claim 16, further comprising:

The first calculation unit is configured to calculate resource mean value and resource variance in the current delivery period based on the predicted resources of each item of information to be delivered in the current delivery period;

A second calculation unit configured to calculate a normalization coefficient corresponding to the target resource according to the resource mean, the resource variance, and the target resource;

The actual resource determining unit is configured to determine the actual resource allocated for the target delivery information in the current delivery period based on the normalization coefficient and the preset resource amount;

The first sorting unit is configured to sort the items of information to be delivered based on actual resources of the items of information to be delivered, and obtain a sorting result.
The device according to claim 17, wherein the first sorting unit comprises:

The second sorting unit is configured to sort the items of information to be delivered based on the delivery setting information of the items of information to be delivered and the actual resources of the items of information to be delivered, to obtain the sorting result.
The apparatus of claim 18, further comprising:

The information delivery unit is configured to deliver information within the current delivery period based on the ranking result.
The apparatus of claim 17, further comprising:

The second obtaining unit is configured to obtain delivery result information of the target delivery information within the current delivery cycle; the delivery result information includes conversion data and delivery consumption data;

The weighted summing unit is configured to perform weighted summation on the delivery conversion data and the delivery consumption data to obtain the delivery revenue of the target delivery information in the current delivery period.
A resource prediction model training device, comprising:

The sample data acquisition unit is configured to acquire sample data; the sample data includes initial state characteristic information of sample delivery information in each historical delivery cycle, and historical resources; the initial state characteristic information is used to characterize the The historical delivery characteristics of the sample delivery information before the start of each historical delivery cycle;

The first training unit is configured to train a preset conditional variational autoencoder network based on the initial state feature information and the historical resources to obtain a target conditional variational autoencoder network;

The third prediction unit is configured to input the encoding information of the historical resource by the target conditional variational self-encoding network and the initial state feature information into the preset prediction execution network to perform resource prediction, and obtain the same The predicted resources corresponding to the historical resources mentioned above;

The second training unit is configured to train the preset forecasting execution network based on the historical resources, the forecasting resources corresponding to the historical resources, and the target forecasting analysis network to obtain a target forecasting execution network; the target forecasting execution network The forecast resources obtained by network forecasting are the resources that make the delivery income of the information to be delivered in the delivery cycle meet the target delivery income;

The resource forecasting model determination unit is configured to obtain a resource forecasting model based on the target conditional variational autoencoder network and the target forecasting execution network.
The apparatus of claim 21, wherein the first training unit comprises:

An information input unit configured to input the initial state feature information and the historical resources into the preset conditional variational self-encoding network, through which the initial state Fitting feature information and data distribution information of the historical resources to obtain probability distribution information, and encoding the historical resources through the preset conditional variational self-encoding network to obtain encoding information corresponding to the historical resources ;

The third training unit is configured to train the preset conditional variational autoencoder network based on the probability distribution information, the historical resources, and the encoding information corresponding to the historical resources to obtain the target condition Variational Autoencoder Networks.
The apparatus of claim 22, wherein the third training unit comprises:

The first loss component determining unit is configured to obtain the first loss component according to the probability distribution information and the standard normal distribution;

The second loss component determining unit is configured to obtain a second loss component according to the historical resource and the encoding information corresponding to the historical resource;

a first loss function determining unit configured to obtain a first loss function based on the first loss component and the second loss component;

The first parameter adjustment unit is configured to perform network parameter adjustment on the preset conditional variational autoencoder network based on the first loss function to obtain the target conditional variational autoencoder network.
The apparatus of claim 21, wherein the second training unit comprises:

The first analysis information determination unit is configured to input the initial state feature information and the forecast resources corresponding to the historical resources into the target forecast analysis network, and use the target forecast analysis network to The state feature information allocates the behavior of the prediction resource for analysis, and obtains first analysis information;

The second parameter adjustment unit is configured to perform network parameter adjustment on the preset prediction execution network based on the first analysis information to obtain the target prediction execution network.
The device according to claim 21, wherein the sample data further includes the historical delivery revenue of the sample delivery information in each historical delivery cycle, and update status feature information; the update status feature information is based on the start The initial state feature information and the delivery result information of the sample delivery information in the historical delivery period are obtained;

The device also includes:

The second analysis information determination unit is configured to input the initial state feature information and the historical resources into a preset predictive analysis network, and use the preset predictive analysis network to allocate the resources based on the initial state feature information. Analyze the above-mentioned historical resources to obtain the second analysis information;

The resource sampling unit is configured to perform historical resource sampling based on the update status feature information and the target condition variational self-encoding network to obtain a preset number of sampling resources;

The delivery revenue determination unit is configured to determine the delivery revenue corresponding to the sampling resource based on the update status feature information;

The target sampling resource determination unit is configured to determine that the sampling resource with the largest delivery revenue is the target sampling resource;

The third parameter adjustment unit is configured to adjust the network parameters of the preset predictive analysis network based on the second analysis information, the historical investment income, and the investment income corresponding to the target sampling resource to obtain a target prediction Analyze the network.
The apparatus of claim 21, further comprising:

The third obtaining unit is configured to obtain the first delivery income of the delivered information within the target delivery period, and the second delivery income of the delivered information within a preset time period after the target delivery period; the The target delivery period is the last delivery period in the initial delivery phase;

A historical delivery revenue determining unit configured to obtain a historical delivery revenue corresponding to the target delivery cycle based on the first delivery revenue and the second delivery revenue;

The sample generating unit is configured to generate a sample corresponding to the delivered information in the target delivery period based on the historical delivery revenue corresponding to the target delivery period.
An electronic device comprising:

processor;

memory for storing said processor-executable instructions;

Wherein, the processor is configured to execute the instructions to achieve the following steps:

determining the initial state feature information of the target delivery information in the current delivery cycle; the initial state feature information of the target delivery information in the current delivery cycle is based on the initial state feature information of the target delivery information in the previous delivery cycle, and the delivery result information of the target delivery information in the last delivery cycle; the initial state characteristic information includes the historical delivery result information of the target delivery information before the current delivery cycle, and the target delivery information attribute information of the information;

Obtain a resource prediction model; the resource prediction model includes a conditional variational autoencoder network and a prediction execution network;

inputting the initial state feature information of the target delivery information in the current delivery period into the conditional variational self-encoding network for resource prediction to obtain the first resource;

Inputting the initial state feature information of the target delivery information in the current delivery cycle and the first resource into the prediction execution network for resource prediction to obtain a second resource;

The target resource corresponding to the target delivery information is obtained based on the first resource and the second resource; the target resource is a predicted resource that makes the delivery income of the target delivery information in the current delivery cycle meet the target delivery income ;

Or implement the following steps:

Obtaining sample data; the sample data includes initial state characteristic information of sample delivery information in each historical delivery cycle, and historical resources; the initial state characteristic information is used to characterize the sample delivery information in each historical delivery cycle Historical distribution characteristics before the beginning of the period;

Based on the initial state feature information and the historical resources, the preset condition variational autoencoder network is trained to obtain the target conditional variational autoencoder network;

Inputting the coding information of the historical resource by the target condition variational self-encoding network and the initial state feature information into the preset prediction execution network to perform resource prediction, and obtain the predicted resource corresponding to the historical resource;

Based on the historical resources, the forecast resources corresponding to the historical resources, and the target forecast analysis network, the preset forecast execution network is trained to obtain the target forecast execution network; the forecast resources obtained by the target forecast execution network are as follows: Resources for which the delivery income of the information to be delivered in the delivery cycle meets the target delivery income;

A resource prediction model is obtained based on the target conditional variational self-encoding network and the target prediction execution network.
A computer-readable storage medium, characterized in that when the instructions in the computer-readable storage medium are executed by a processor of the electronic device, the electronic device can perform the following steps:

determining the initial state feature information of the target delivery information in the current delivery cycle; the initial state feature information of the target delivery information in the current delivery cycle is based on the initial state feature information of the target delivery information in the previous delivery cycle, and the delivery result information of the target delivery information in the last delivery cycle; the initial state characteristic information includes the historical delivery result information of the target delivery information before the current delivery cycle, and the target delivery information attribute information of the information;

Obtain a resource prediction model; the resource prediction model includes a conditional variational autoencoder network and a prediction execution network;

inputting the initial state feature information of the target delivery information in the current delivery period into the conditional variational self-encoding network for resource prediction to obtain the first resource;

Inputting the initial state feature information of the target delivery information in the current delivery cycle and the first resource into the prediction execution network for resource prediction to obtain a second resource;

The target resource corresponding to the target delivery information is obtained based on the first resource and the second resource; the target resource is a predicted resource that makes the delivery income of the target delivery information in the current delivery cycle meet the target delivery income ;

Or perform the following steps:

Obtaining sample data; the sample data includes initial state characteristic information of sample delivery information in each historical delivery cycle, and historical resources; the initial state characteristic information is used to characterize the sample delivery information in each historical delivery cycle Historical distribution characteristics before the beginning of the period;

Based on the initial state feature information and the historical resources, the preset condition variational autoencoder network is trained to obtain the target conditional variational autoencoder network;

Inputting the coding information of the historical resource by the target condition variational self-encoding network and the initial state feature information into the preset prediction execution network to perform resource prediction, and obtain the predicted resource corresponding to the historical resource;

Based on the historical resources, the forecast resources corresponding to the historical resources, and the target forecast analysis network, the preset forecast execution network is trained to obtain the target forecast execution network; the forecast resources obtained by the target forecast execution network are as follows: Resources for which the delivery income of the information to be delivered in the delivery cycle meets the target delivery income;

A resource prediction model is obtained based on the target conditional variational self-encoding network and the target prediction execution network.
A computer program product, the computer program product comprising a computer program stored in a readable storage medium, at least one processor of a computer device reading and executing the computer program from the readable storage medium, Causes the device to perform the following steps:

determining the initial state feature information of the target delivery information in the current delivery cycle; the initial state feature information of the target delivery information in the current delivery cycle is based on the initial state feature information of the target delivery information in the previous delivery cycle, and the delivery result information of the target delivery information in the last delivery cycle; the initial state characteristic information includes the historical delivery result information of the target delivery information before the current delivery cycle, and the target delivery information attribute information of the information;

Obtain a resource prediction model; the resource prediction model includes a conditional variational autoencoder network and a prediction execution network;

inputting the initial state feature information of the target delivery information in the current delivery period into the conditional variational self-encoding network for resource prediction to obtain the first resource;

Inputting the initial state feature information of the target delivery information in the current delivery cycle and the first resource into the prediction execution network for resource prediction to obtain a second resource;

The target resource corresponding to the target delivery information is obtained based on the first resource and the second resource; the target resource is a predicted resource that makes the delivery income of the target delivery information in the current delivery cycle meet the target delivery income ;

Or perform the following steps:

Obtaining sample data; the sample data includes initial state characteristic information of sample delivery information in each historical delivery cycle, and historical resources; the initial state characteristic information is used to characterize the sample delivery information in each historical delivery cycle Historical distribution characteristics before the beginning of the period;

Based on the initial state feature information and the historical resources, the preset condition variational autoencoder network is trained to obtain the target conditional variational autoencoder network;

Inputting the coding information of the historical resource by the target condition variational self-encoding network and the initial state feature information into the preset prediction execution network to perform resource prediction, and obtain the predicted resource corresponding to the historical resource;

Based on the historical resources, the forecast resources corresponding to the historical resources, and the target forecast analysis network, the preset forecast execution network is trained to obtain the target forecast execution network; the forecast resources obtained by the target forecast execution network are as follows: Resources for which the delivery income of the information to be delivered in the delivery cycle meets the target delivery income;

A resource prediction model is obtained based on the target conditional variational self-encoding network and the target prediction execution network.