WO2023109025A1 - Delivery information processing method, and resource prediction model training method and apparatus - Google Patents

Delivery information processing method, and resource prediction model training method and apparatus Download PDF

Info

Publication number
WO2023109025A1
WO2023109025A1 PCT/CN2022/096373 CN2022096373W WO2023109025A1 WO 2023109025 A1 WO2023109025 A1 WO 2023109025A1 CN 2022096373 W CN2022096373 W CN 2022096373W WO 2023109025 A1 WO2023109025 A1 WO 2023109025A1
Authority
WO
WIPO (PCT)
Prior art keywords
delivery
information
target
resource
historical
Prior art date
Application number
PCT/CN2022/096373
Other languages
French (fr)
Chinese (zh)
Inventor
张弛
郭远
李怀宇
谢淼
林子钏
杨森
刘霁
Original Assignee
北京达佳互联信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京达佳互联信息技术有限公司 filed Critical 北京达佳互联信息技术有限公司
Publication of WO2023109025A1 publication Critical patent/WO2023109025A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Definitions

  • the information delivery platform In the information delivery system, new delivery information is continuously uploaded to the system and waits for delivery. In order to quickly identify delivery information with great potential from a large number of newly uploaded delivery information, the information delivery platform generally allocates corresponding cold-start resources to the newly uploaded delivery information, so that they can obtain greater delivery opportunities.
  • the first prediction unit is configured to input the initial state feature information of the target delivery information in the current delivery cycle into the conditional variational self-encoding network to perform resource prediction and obtain the first resource;
  • a computer-readable storage medium When the instructions in the computer-readable storage medium are executed by the processor of the server, the server can execute the method for processing delivery information as described above. Or resource prediction model training method.
  • Fig. 5 is a flow chart showing a method for calculating placement revenue according to an exemplary embodiment.
  • FIG. 7 shows a method for training a resource prediction model, which may include steps S710 to S750.
  • conditional variational self-encoding network can be an independent encoding network.
  • the output of the corresponding independent encoding network includes probability distribution information and encoding of historical resources Information Two pieces of information.
  • the initial state feature information in the last delivery cycle also includes delivery setting information and category information of the target delivery information; the delivery setting information is used to set multiple target delivery information to be delivered Sort;
  • the first updating unit includes: a first generating unit configured to generate the target delivery information at the beginning of the current delivery cycle based on the delivery setting information, the category information, and the updated historical delivery results. Initial state feature information.
  • the actual resource determining unit is configured to determine the actual resource allocated for the target delivery information in the current delivery period based on the normalization coefficient and the preset resource amount;
  • the first sorting unit includes: a second sorting unit configured to, based on the delivery setting information of the items of information to be delivered and the actual resources of the items of information to be delivered, sort the The information to be delivered is sorted to obtain the sorting result.
  • the first training unit 1420 is configured to train a preset conditional variational autoencoder network based on the initial state feature information and the historical resources to obtain a target conditional variational autoencoder network.
  • the third training unit includes:
  • a first loss function determining unit configured to obtain a first loss function based on the first loss component and the second loss component

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present disclosure relates to the technical field of information processing, and relates to a delivery information processing method, and a resource prediction model training method and apparatus. The method comprises: determining initial state feature information of target delivery information in a current delivery period; obtaining a resource prediction model, the resource prediction model comprising a conditional variational auto-encoder network and a prediction execution network; inputting the initial state feature information into the conditional variational auto-encoder network for resource prediction to obtain a first resource; inputting the initial state feature information and the first resource into the prediction execution network for resource prediction to obtain a second resource; and obtaining, on the basis of the first resource and the second resource, a target resource corresponding to the target delivery information, the target resource being a prediction resource that enables a delivery revenue of the target delivery information in the current delivery period to satisfy a target delivery revenue.

Description

投放信息处理方法、资源预测模型训练方法及装置Delivery information processing method, resource prediction model training method and device
相关申请的交叉引用Cross References to Related Applications
本申请基于申请号为202111529876.1、申请日为2021年12月15日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。This application is based on a Chinese patent application with application number 202111529876.1 and a filing date of December 15, 2021, and claims the priority of this Chinese patent application. The entire content of this Chinese patent application is hereby incorporated by reference into this application.
技术领域technical field
本公开涉及信息处理技术领域,尤其涉及一种投放信息处理方法、资源预测模型训练方法及装置。The present disclosure relates to the technical field of information processing, and in particular to a delivery information processing method, a resource prediction model training method and a device.
背景技术Background technique
在信息投放系统中,不断有新的投放信息上传到系统中等待投放。为了从大量新上传的投放信息中快速甄别出潜力较大的投放信息,信息投放平台一般会给新上传的投放信息分配相应的冷启动资源,以使其获得更大的投放机会。In the information delivery system, new delivery information is continuously uploaded to the system and waits for delivery. In order to quickly identify delivery information with great potential from a large number of newly uploaded delivery information, the information delivery platform generally allocates corresponding cold-start resources to the newly uploaded delivery information, so that they can obtain greater delivery opportunities.
相关技术中,冷启动资源一般是直接基于点击/转化单价以及ctr(click through rate,点击率)进行计算得到的,并没有考虑到新上传的投放信息在投放平台的长期收益,并且由于新上传的投放信息曝光行为较少,导致其ctr的计算不准确,相应计算得到的冷启动资源也是不准确的;从而由于冷启动资源的计算不准确,以及没有考虑到投放信息的长期收益,导致基于冷启动资源进行信息投放之后所确定的投放信息遴选结果是不合理的。In related technologies, cold start resources are generally calculated directly based on the click/conversion unit price and ctr (click through rate, click through rate), without taking into account the long-term benefits of the newly uploaded delivery information on the delivery platform, and due to the newly uploaded The exposure behavior of the delivery information is less, resulting in inaccurate calculation of its ctr, and the corresponding calculation of the cold start resources is also inaccurate; thus, due to the inaccurate calculation of the cold start resources and the failure to consider the long-term benefits of the delivery information, the result is based on It is unreasonable to determine the selection results of the delivery information after the cold start resources are put into information delivery.
发明内容Contents of the invention
本公开提供一种投放信息处理方法、资源预测模型训练方法及装置。The present disclosure provides a delivery information processing method, a resource prediction model training method and a device.
根据本公开实施例的第一方面,提供一种投放信息处理方法,包括:According to the first aspect of the embodiments of the present disclosure, there is provided a delivery information processing method, including:
确定目标投放信息在当前投放周期的起始状态特征信息;所述目标投放信息在所述当前投放周期的起始状态特征信息基于所述目标投放信息在上一投放周期的起始状态特征信息,和所述目标投放信息在所述上一投放周期的投放结果信息得到;所述起始状态特征信息包括所述目标投放信息在所述当前投放周期之前的历史投放结果信息,以及所述目标投放信息的属性信息;determining the initial state feature information of the target delivery information in the current delivery cycle; the initial state feature information of the target delivery information in the current delivery cycle is based on the initial state feature information of the target delivery information in the previous delivery cycle, and the delivery result information of the target delivery information in the last delivery cycle; the initial state characteristic information includes the historical delivery result information of the target delivery information before the current delivery cycle, and the target delivery information attribute information of the information;
获取资源预测模型;所述资源预测模型包括条件变分自编码网络和预测执行网络;Obtain a resource prediction model; the resource prediction model includes a conditional variational autoencoder network and a prediction execution network;
将所述目标投放信息在所述当前投放周期的起始状态特征信息,输入到所述条件变分自编码网络进行资源预测,得到第一资源;inputting the initial state feature information of the target delivery information in the current delivery period into the conditional variational self-encoding network for resource prediction to obtain the first resource;
将所述目标投放信息在所述当前投放周期的起始状态特征信息,以及所述第一资源输入到所述预测执行网络进行资源预测,得到第二资源;Inputting the initial state feature information of the target delivery information in the current delivery cycle and the first resource into the prediction execution network for resource prediction to obtain a second resource;
基于所述第一资源和所述第二资源得到所述目标投放信息对应的目标资源;所述目标资源为使得所述目标投放信息在所述当前投放周期的投放收益满足目标投放收益的预测资源。The target resource corresponding to the target delivery information is obtained based on the first resource and the second resource; the target resource is a predicted resource that makes the delivery income of the target delivery information in the current delivery cycle meet the target delivery income .
在一些实施例中,所述确定目标投放信息在当前投放周期的起始状态特征信息包括:In some embodiments, the initial state characteristic information of determining the target delivery information in the current delivery period includes:
获取所述目标投放信息在所述上一投放周期的起始状态特征信息;所述在上一投放周期的起始状态特征信息包括所述目标投放信息在所述上一投放周期之前的历史投放结果信息;Obtaining the initial state feature information of the target delivery information in the last delivery cycle; the start state feature information in the last delivery cycle includes the historical delivery of the target delivery information before the last delivery cycle result information;
基于所述目标投放信息在所述上一投放周期的投放结果信息,对所述历史投放结果信息进行更新,确定所述目标投放信息在所述当前投放周期的起始状态特征信息。Based on the delivery result information of the target delivery information in the last delivery period, the historical delivery result information is updated, and the initial state characteristic information of the target delivery information in the current delivery period is determined.
在一些实施例中,所述在上一投放周期的起始状态特征信息还包括投放设置信息 以及所述目标投放信息的类别信息;所述投放设置信息用于对多项待投放信息进行排序;In some embodiments, the initial state feature information in the last delivery cycle also includes delivery setting information and category information of the target delivery information; the delivery setting information is used to sort multiple pieces of information to be delivered;
所述基于所述目标投放信息在所述上一投放周期的投放结果信息,对所述历史投放结果信息进行更新,确定所述目标投放信息在所述当前投放周期的起始状态特征信息,包括:The updating of the historical delivery result information based on the delivery result information of the target delivery information in the previous delivery period, and determining the initial state characteristic information of the target delivery information in the current delivery period, including :
基于所述投放设置信息、所述类别信息,以及更新后的历史投放结果,生成所述目标投放信息在所述当前投放周期的起始状态特征信息。Based on the delivery setting information, the category information, and the updated historical delivery results, the initial state characteristic information of the target delivery information in the current delivery cycle is generated.
在一些实施例中,所述投放信息处理方法还包括:In some embodiments, the delivery information processing method further includes:
基于各项待投放信息在所述当前投放周期内的预测资源,计算所述当前投放周期内的资源均值和资源方差;Calculate resource mean and resource variance in the current delivery cycle based on the predicted resources of each item of information to be delivered in the current delivery period;
根据所述资源均值、所述资源方差,以及所述目标资源,计算与所述目标资源对应的归一化系数;calculating a normalization coefficient corresponding to the target resource according to the resource mean, the resource variance, and the target resource;
基于所述归一化系数以及预设资源量,确定在所述当前投放周期内为所述目标投放信息分配的实际资源;Based on the normalization coefficient and the preset amount of resources, determine actual resources allocated to the target delivery information within the current delivery cycle;
基于所述各项待投放信息的实际资源,对所述各项待投放信息进行排序,得到排序结果。Based on the actual resources of the items of information to be delivered, the items of information to be delivered are sorted to obtain a sorting result.
在一些实施例中,所述基于各项待投放信息的实际资源,对所述各项待投放信息进行排序,得到排序结果,包括:In some embodiments, the sorting of the items of information to be delivered based on the actual resources of the items of information to be delivered to obtain a sorting result includes:
基于所述各项待投放信息的投放设置信息,以及所述各项待投放信息的实际资源,对所述各项待投放信息进行排序,得到所述排序结果。Based on the delivery setting information of the items of information to be delivered and the actual resources of the items of information to be delivered, the items of information to be delivered are sorted to obtain the sorting result.
在一些实施例中,所述投放信息处理方法还包括:In some embodiments, the delivery information processing method further includes:
基于所述排序结果,在所述当前投放周期内进行信息投放。Based on the ranking result, information delivery is performed within the current delivery cycle.
在一些实施例中,所述投放信息处理方法还包括:In some embodiments, the delivery information processing method further includes:
获取所述目标投放信息在所述当前投放周期内的投放结果信息;所述投放结果信息包括转化数据,以及投放消耗数据;Obtain delivery result information of the target delivery information within the current delivery cycle; the delivery result information includes conversion data and delivery consumption data;
对所述投放转化数据以及所述投放消耗数据进行加权求和,得到所述目标投放信息在所述当前投放周期内的投放收益。The delivery conversion data and the delivery consumption data are weighted and summed to obtain the delivery revenue of the target delivery information in the current delivery period.
根据本公开实施例的第二方面,提供一种资源预测模型训练方法,包括:According to a second aspect of the embodiments of the present disclosure, a method for training a resource prediction model is provided, including:
获取样本数据;所述样本数据包括样本投放信息在每个历史投放周期内的起始状态特征信息,以及历史资源;所述起始状态特征信息用于表征所述样本投放信息在每个历史投放周期的起始时刻之前的历史投放特征;Obtaining sample data; the sample data includes initial state characteristic information of sample delivery information in each historical delivery cycle, and historical resources; the initial state characteristic information is used to characterize the sample delivery information in each historical delivery cycle Historical distribution characteristics before the beginning of the period;
基于所述起始状态特征信息以及所述历史资源对预设条件变分自编码网络进行训练,得到目标条件变分自编码网络;Based on the initial state feature information and the historical resources, the preset condition variational autoencoder network is trained to obtain the target conditional variational autoencoder network;
将所述目标条件变分自编码网络对所述历史资源的编码信息,以及所述起始状态特征信息,输入到预设预测执行网络进行资源预测,得到与所述历史资源对应的预测资源;Inputting the coding information of the historical resource by the target condition variational self-encoding network and the initial state feature information into the preset prediction execution network to perform resource prediction, and obtain the predicted resource corresponding to the historical resource;
基于所述历史资源、所述历史资源对应的预测资源,以及目标预测分析网络对所述预设预测执行网络进行训练,得到目标预测执行网络;所述目标预测执行网络预测得到的预测资源为使得待投放信息在投放周期的投放收益满足目标投放收益的资源;Based on the historical resources, the forecast resources corresponding to the historical resources, and the target forecast analysis network, the preset forecast execution network is trained to obtain the target forecast execution network; the forecast resources obtained by the target forecast execution network are as follows: Resources for which the delivery income of the information to be delivered in the delivery cycle meets the target delivery income;
基于所述目标条件变分自编码网络以及所述目标预测执行网络,得到资源预测模型。A resource prediction model is obtained based on the target conditional variational self-encoding network and the target prediction execution network.
在一些实施例中,所述基于所述起始状态特征信息以及所述历史资源对预设条件变分自编码网络进行训练,得到目标条件变分自编码网络,包括:In some embodiments, the training of the preset conditional variational autoencoder network based on the initial state feature information and the historical resources to obtain the target conditional variational autoencoder network includes:
将所述起始状态特征信息以及所述历史资源输入到所述预设条件变分自编码网络,通过所述预设条件变分自编码网络对所述起始状态特征信息以及所述历史资源的数据 分布信息进行拟合,得到概率分布信息,以及通过所述预设条件变分自编码网络对所述历史资源进行编码,得到与所述历史资源对应的编码信息;Input the initial state feature information and the historical resources into the preset conditional variational self-encoding network, through the preset conditional variational autoencoding network, the initial state feature information and the historical resource fitting the data distribution information to obtain probability distribution information, and encoding the historical resources through the preset conditional variational self-encoding network to obtain encoding information corresponding to the historical resources;
基于所述概率分布信息、所述历史资源、以及与所述历史资源对应的编码信息,对所述预设条件变分自编码网络进行训练,得到所述目标条件变分自编码网络。Based on the probability distribution information, the historical resources, and the encoding information corresponding to the historical resources, the preset conditional variational autoencoder network is trained to obtain the target conditional variational autoencoder network.
在一些实施例中,所述基于所述概率分布信息、所述历史资源、以及与所述历史资源对应的编码信息,对所述预设条件变分自编码网络进行训练,得到所述目标条件变分自编码网络包括:In some embodiments, based on the probability distribution information, the historical resources, and the encoding information corresponding to the historical resources, the preset condition variational autoencoder network is trained to obtain the target condition Variational autoencoder networks include:
根据所述概率分布信息和标准正态分布,得到第一损失分量;Obtaining a first loss component according to the probability distribution information and a standard normal distribution;
根据所述历史资源、以及与所述历史资源对应的编码信息,得到第二损失分量;Obtaining a second loss component according to the historical resource and the encoding information corresponding to the historical resource;
基于所述第一损失分量以及所述第二损失分量,得到第一损失函数;Obtaining a first loss function based on the first loss component and the second loss component;
基于所述第一损失函数对所述预设条件变分自编码网络进行网络参数调整,得到所述目标条件变分自编码网络。Adjusting network parameters of the preset conditional variational autoencoder network based on the first loss function to obtain the target conditional variational autoencoder network.
在一些实施例中,所述基于所述历史资源、所述历史资源对应的预测资源,以及目标预测分析网络对所述预设预测执行网络进行训练,得到目标预测执行网络包括:In some embodiments, the training of the preset forecast execution network based on the historical resources, the forecast resources corresponding to the historical resources, and the target forecast analysis network to obtain the target forecast execution network includes:
将所述起始状态特征信息,以及所述历史资源对应的预测资源输入到所述目标预测分析网络,通过所述目标预测分析网络对基于所述起始状态特征信息分配所述预测资源的行为进行分析,得到第一分析信息;Inputting the initial state feature information and the forecast resources corresponding to the historical resources into the target predictive analysis network, and through the target predictive analysis network, the behavior of allocating the forecast resources based on the initial state feature information Perform analysis to obtain first analysis information;
基于所述第一分析信息对所述预设预测执行网络进行网络参数调整,得到所述目标预测执行网络。Performing network parameter adjustment on the preset forecasting execution network based on the first analysis information to obtain the target forecasting execution network.
在一些实施例中,所述样本数据还包括所述样本投放信息在每个历史投放周期内的历史投放收益、以及更新状态特征信息;所述更新状态特征信息基于所述起始状态特征信息和所述样本投放信息在所述历史投放周期内的投放结果信息得到;In some embodiments, the sample data also includes the historical delivery revenue of the sample delivery information in each historical delivery cycle, and updated status feature information; the updated status feature information is based on the initial status feature information and The sample delivery information is obtained from delivery result information within the historical delivery period;
所述资源预测模型训练方法还包括:The resource prediction model training method also includes:
将所述起始状态特征信息以及所述历史资源输入到预设预测分析网络,通过所述预设预测分析网络对基于所述起始状态特征信息分配所述历史资源进行分析,得到第二分析信息;inputting the initial state feature information and the historical resources into a preset predictive analysis network, analyzing the allocation of the historical resources based on the initial state feature information through the preset predictive analysis network, and obtaining a second analysis information;
基于所述更新状态特征信息以及所述目标条件变分自编码网络进行历史资源采样,得到预设数量的采样资源;Sampling historical resources based on the updated state feature information and the target condition variational self-encoding network to obtain a preset number of sampling resources;
基于所述更新状态特征信息,确定与所述采样资源对应的投放收益;Based on the update state characteristic information, determine the delivery revenue corresponding to the sampling resource;
确定投放收益最大的采样资源为目标采样资源;Determining the sampling resource with the largest delivery revenue as the target sampling resource;
基于所述第二分析信息、所述历史投放收益、以及所述目标采样资源对应的投放收益,对所述预设预测分析网络进行网络参数调整,得到目标预测分析网络。Based on the second analysis information, the historical investment revenue, and the investment revenue corresponding to the target sampling resource, network parameters are adjusted for the preset predictive analysis network to obtain a target predictive analysis network.
在一些实施例中,所述资源预测模型训练方法还包括:In some embodiments, the resource prediction model training method further includes:
获取已投放信息在目标投放周期内的第一投放收益,以及所述已投放信息在所述目标投放周期后的预设时间段内的第二投放收益;所述目标投放周期为初始投放阶段中的最后一个投放周期;Obtain the first delivery income of the delivered information within the target delivery period, and the second delivery income of the delivered information within the preset time period after the target delivery period; the target delivery period is the initial delivery period the last delivery cycle for ;
基于所述第一投放收益以及所述第二投放收益,得到与所述目标投放周期对应的历史投放收益;Based on the first delivery income and the second delivery income, obtaining historical delivery income corresponding to the target delivery period;
基于与所述目标投放周期对应的历史投放收益,生成与所述已投放信息在所述目标投放周期的样本。Based on the historical delivery revenue corresponding to the target delivery period, a sample corresponding to the delivered information in the target delivery period is generated.
根据本公开实施例的第三方面,提供一种投放信息处理装置,包括:According to a third aspect of the embodiments of the present disclosure, there is provided an apparatus for processing delivery information, including:
状态特征信息确定单元,被配置为确定目标投放信息在当前投放周期的起始状态特征信息;所述目标投放信息在所述当前投放周期的起始状态特征信息基于所述目标投放信息在上一投放周期的起始状态特征信息,和所述目标投放信息在所述上一投放周期的投放结果信息得到;所述起始状态特征信息包括所述目标投放信息在所述当前 投放周期之前的历史投放结果信息,以及所述目标投放信息的属性信息;The state characteristic information determining unit is configured to determine the initial state characteristic information of the target delivery information in the current delivery cycle; the initial state characteristic information of the target delivery information in the current delivery cycle is based on the target delivery information in the previous The starting state characteristic information of the delivery period, and the delivery result information of the target delivery information in the last delivery period are obtained; the starting state characteristic information includes the history of the target delivery information before the current delivery period delivery result information, and attribute information of the target delivery information;
资源预测模型获取单元,被配置为获取资源预测模型;所述资源预测模型包括条件变分自编码网络和预测执行网络;A resource forecasting model acquisition unit configured to acquire a resource forecasting model; the resource forecasting model includes a conditional variational self-encoding network and a forecasting execution network;
第一预测单元,被配置为将所述目标投放信息在所述当前投放周期的起始状态特征信息,输入到所述条件变分自编码网络进行资源预测,得到第一资源;The first prediction unit is configured to input the initial state feature information of the target delivery information in the current delivery cycle into the conditional variational self-encoding network to perform resource prediction and obtain the first resource;
第二预测单元,被配置为将所述目标投放信息在所述当前投放周期的起始状态特征信息,以及所述第一资源输入到所述预测执行网络进行资源预测,得到第二资源;The second prediction unit is configured to input the initial state feature information of the target delivery information in the current delivery cycle and the first resource to the forecast execution network to perform resource prediction, and obtain a second resource;
目标资源确定单元,被配置为基于所述第一资源和所述第二资源得到所述目标投放信息对应的目标资源;所述目标资源为使得所述目标投放信息在所述当前投放周期的投放收益满足目标投放收益的预测资源。The target resource determination unit is configured to obtain the target resource corresponding to the target delivery information based on the first resource and the second resource; the target resource is the delivery of the target delivery information in the current delivery period Forecasted resources whose revenue meets the target delivery revenue.
在一些实施例中,所述状态特征信息确定单元包括:In some embodiments, the state feature information determining unit includes:
第一获取单元,被配置为获取所述目标投放信息在所述上一投放周期的起始状态特征信息;所述在上一投放周期的起始状态特征信息包括所述目标投放信息在所述上一投放周期之前的历史投放结果信息;The first acquiring unit is configured to acquire the initial state feature information of the target delivery information in the last delivery cycle; the initial state feature information in the last delivery cycle includes the target delivery information in the Historical delivery result information before the previous delivery cycle;
第一更新单元,被配置为基于所述目标投放信息在所述上一投放周期的投放结果信息,对所述历史投放结果信息进行更新,确定所述目标投放信息在所述当前投放周期的起始状态特征信息。The first update unit is configured to update the historical delivery result information based on the delivery result information of the target delivery information in the last delivery period, and determine that the target delivery information starts from the current delivery period Initial state feature information.
在一些实施例中,所述在上一投放周期的起始状态特征信息还包括投放设置信息以及所述目标投放信息的类别信息;所述投放设置信息用于对多项待投放的目标投放信息进行排序;In some embodiments, the initial state feature information in the last delivery cycle also includes delivery setting information and category information of the target delivery information; the delivery setting information is used to set multiple target delivery information to be delivered Sort;
所述第一更新单元包括:The first update unit includes:
第一生成单元,被配置为基于所述投放设置信息、所述类别信息,以及更新后的历史投放结果,生成所述目标投放信息在所述当前投放周期的起始状态特征信息。The first generating unit is configured to generate initial state characteristic information of the target delivery information in the current delivery cycle based on the delivery setting information, the category information, and updated historical delivery results.
在一些实施例中,所述投放信息处理装置还包括:In some embodiments, the delivery information processing device further includes:
第一计算单元,被配置为基于各项待投放信息在所述当前投放周期内的预测资源,计算所述当前投放周期内的资源均值和资源方差;The first calculation unit is configured to calculate resource mean value and resource variance in the current delivery period based on the predicted resources of each item of information to be delivered in the current delivery period;
第二计算单元,被配置为根据所述资源均值、所述资源方差,以及所述目标资源,计算与所述目标资源对应的归一化系数;A second calculation unit configured to calculate a normalization coefficient corresponding to the target resource according to the resource mean, the resource variance, and the target resource;
实际资源确定单元,被配置为基于所述归一化系数以及预设资源量,确定在所述当前投放周期内为所述目标投放信息分配的实际资源;The actual resource determining unit is configured to determine the actual resource allocated for the target delivery information in the current delivery period based on the normalization coefficient and the preset resource amount;
第一排序单元,被配置为基于各项待投放信息的实际资源,对所述各项待投放信息进行排序,得到排序结果。The first sorting unit is configured to sort the items of information to be delivered based on actual resources of the items of information to be delivered, and obtain a sorting result.
在一些实施例中,所述第一排序单元包括:In some embodiments, the first sorting unit includes:
第二排序单元,被配置为基于所述各项待投放信息的投放设置信息,以及所述各项待投放信息的实际资源,对所述各项待投放信息进行排序,得到所述排序结果。The second sorting unit is configured to sort the items of information to be delivered based on the delivery setting information of the items of information to be delivered and the actual resources of the items of information to be delivered, to obtain the sorting result.
在一些实施例中,所述投放信息处理装置还包括:In some embodiments, the delivery information processing device further includes:
信息投放单元,被配置为基于所述排序结果,在所述当前投放周期内进行信息投放。The information delivery unit is configured to deliver information within the current delivery period based on the ranking result.
在一些实施例中,所述投放信息处理装置还包括:In some embodiments, the delivery information processing device further includes:
第二获取单元,被配置为获取所述目标投放信息在所述当前投放周期内的投放结果信息;所述投放结果信息包括转化数据,以及投放消耗数据;The second obtaining unit is configured to obtain delivery result information of the target delivery information within the current delivery cycle; the delivery result information includes conversion data and delivery consumption data;
加权求和单元,被配置为对所述投放转化数据以及所述投放消耗数据进行加权求和,得到所述目标投放信息在所述当前投放周期内的投放收益。The weighted summing unit is configured to perform weighted summation on the delivery conversion data and the delivery consumption data to obtain the delivery revenue of the target delivery information in the current delivery cycle.
根据本公开实施例的第四方面,提供一种资源预测模型训练装置,包括:According to a fourth aspect of the embodiments of the present disclosure, a resource prediction model training device is provided, including:
样本数据获取单元,被配置为获取样本数据;所述样本数据包括样本投放信息在 每个历史投放周期内的起始状态特征信息,以及历史资源;所述起始状态特征信息用于表征所述样本投放信息在每个历史投放周期的起始时刻之前的历史投放特征;The sample data acquisition unit is configured to acquire sample data; the sample data includes initial state characteristic information of sample delivery information in each historical delivery cycle, and historical resources; the initial state characteristic information is used to characterize the The historical delivery characteristics of the sample delivery information before the start of each historical delivery period;
第一训练单元,被配置为基于所述起始状态特征信息以及所述历史资源对预设条件变分自编码网络进行训练,得到目标条件变分自编码网络;The first training unit is configured to train a preset conditional variational autoencoder network based on the initial state feature information and the historical resources to obtain a target conditional variational autoencoder network;
第三预测单元,被配置为将所述目标条件变分自编码网络对所述历史资源的编码信息,以及所述起始状态特征信息,输入到预设预测执行网络进行资源预测,得到与所述历史资源对应的预测资源;The third prediction unit is configured to input the encoding information of the historical resource by the target conditional variational self-encoding network and the initial state feature information into the preset prediction execution network to perform resource prediction, and obtain the same The predicted resources corresponding to the historical resources mentioned above;
第二训练单元,被配置为基于所述历史资源、所述历史资源对应的预测资源,以及目标预测分析网络对所述预设预测执行网络进行训练,得到目标预测执行网络;所述目标预测执行网络预测得到的预测资源为使得待投放信息在投放周期的投放收益满足目标投放收益的资源;The second training unit is configured to train the preset forecasting execution network based on the historical resources, the forecasting resources corresponding to the historical resources, and the target forecasting analysis network to obtain a target forecasting execution network; the target forecasting execution network The forecast resources obtained by network forecasting are the resources that make the delivery income of the information to be delivered in the delivery cycle meet the target delivery income;
资源预测模型确定单元,被配置为基于所述目标条件变分自编码网络以及所述目标预测执行网络,得到资源预测模型。The resource forecasting model determination unit is configured to obtain a resource forecasting model based on the target conditional variational autoencoder network and the target forecasting execution network.
在一些实施例中,所述第一训练单元包括:In some embodiments, the first training unit includes:
信息输入单元,被配置为将所述起始状态特征信息以及所述历史资源输入到所述预设条件变分自编码网络,通过所述预设条件变分自编码网络对所述起始状态特征信息以及所述历史资源的数据分布信息进行拟合,得到概率分布信息,以及通过所述预设条件变分自编码网络对所述历史资源进行编码,得到与所述历史资源对应的编码信息;An information input unit configured to input the initial state feature information and the historical resources into the preset conditional variational self-encoding network, through which the initial state Fitting feature information and data distribution information of the historical resources to obtain probability distribution information, and encoding the historical resources through the preset conditional variational self-encoding network to obtain encoding information corresponding to the historical resources ;
第三训练单元,被配置为基于所述概率分布信息、所述历史资源、以及与所述历史资源对应的编码信息,对所述预设条件变分自编码网络进行训练,得到所述目标条件变分自编码网络。The third training unit is configured to train the preset conditional variational autoencoder network based on the probability distribution information, the historical resources, and the encoding information corresponding to the historical resources to obtain the target condition Variational Autoencoder Networks.
在一些实施例中,所述第三训练单元包括:In some embodiments, the third training unit includes:
第一损失分量确定单元,被配置为根据所述概率分布信息和标准正态分布,得到第一损失分量;The first loss component determining unit is configured to obtain the first loss component according to the probability distribution information and the standard normal distribution;
第二损失分量确定单元,被配置为根据所述历史资源、以及与所述历史资源对应的编码信息,得到第二损失分量;The second loss component determining unit is configured to obtain a second loss component according to the historical resource and the encoding information corresponding to the historical resource;
第一损失函数确定单元,被配置为基于所述第一损失分量以及所述第二损失分量,得到第一损失函数;a first loss function determining unit configured to obtain a first loss function based on the first loss component and the second loss component;
第一参数调整单元,被配置为基于所述第一损失函数对所述预设条件变分自编码网络进行网络参数调整,得到所述目标条件变分自编码网络。The first parameter adjustment unit is configured to perform network parameter adjustment on the preset conditional variational autoencoder network based on the first loss function to obtain the target conditional variational autoencoder network.
在一些实施例中,所述第二训练单元包括:In some embodiments, the second training unit includes:
第一分析信息确定单元,被配置为将所述起始状态特征信息,以及所述历史资源对应的预测资源输入到所述目标预测分析网络,通过所述目标预测分析网络对基于所述起始状态特征信息分配所述预测资源的行为进行分析,得到第一分析信息;The first analysis information determination unit is configured to input the initial state feature information and the forecast resources corresponding to the historical resources into the target forecast analysis network, and use the target forecast analysis network to The state feature information allocates the behavior of the prediction resource for analysis, and obtains first analysis information;
第二参数调整单元,被配置为基于所述第一分析信息对所述预设预测执行网络进行网络参数调整,得到所述目标预测执行网络。The second parameter adjustment unit is configured to perform network parameter adjustment on the preset prediction execution network based on the first analysis information to obtain the target prediction execution network.
在一些实施例中,所述样本数据还包括所述样本投放信息在每个历史投放周期内的历史投放收益、以及更新状态特征信息;所述更新状态特征信息基于所述起始状态特征信息和所述样本投放信息在所述历史投放周期内的投放结果信息得到。In some embodiments, the sample data also includes the historical delivery revenue of the sample delivery information in each historical delivery cycle, and updated status feature information; the updated status feature information is based on the initial status feature information and The sample delivery information is obtained from delivery result information within the historical delivery period.
所述资源预测模型训练装置还包括:The resource prediction model training device also includes:
第二分析信息确定单元,被配置为将所述起始状态特征信息以及所述历史资源输入到预设预测分析网络,通过所述预设预测分析网络对基于所述起始状态特征信息分配所述历史资源进行分析,得到第二分析信息;The second analysis information determination unit is configured to input the initial state feature information and the historical resources into a preset predictive analysis network, and use the preset predictive analysis network to allocate the resources based on the initial state feature information. Analyze the above-mentioned historical resources to obtain the second analysis information;
资源采样单元,被配置为基于所述更新状态特征信息以及所述目标条件变分自编 码网络进行历史资源采样,得到预设数量的采样资源;The resource sampling unit is configured to perform historical resource sampling based on the update status feature information and the target conditional variational self-encoding network to obtain a preset number of sampling resources;
投放收益确定单元,被配置为基于所述更新状态特征信息,确定与所述采样资源对应的投放收益;The delivery revenue determination unit is configured to determine the delivery revenue corresponding to the sampling resource based on the update status feature information;
目标采样资源确定单元,被配置为确定投放收益最大的采样资源为目标采样资源;The target sampling resource determination unit is configured to determine that the sampling resource with the largest delivery revenue is the target sampling resource;
第三参数调整单元,被配置为基于所述第二分析信息、所述历史投放收益、以及所述目标采样资源对应的投放收益,对所述预设预测分析网络进行网络参数调整,得到目标预测分析网络。The third parameter adjustment unit is configured to adjust the network parameters of the preset predictive analysis network based on the second analysis information, the historical investment income, and the investment income corresponding to the target sampling resource to obtain a target prediction Analyze the web.
在一些实施例中,所述资源预测模型训练装置还包括:In some embodiments, the resource prediction model training device further includes:
第三获取单元,被配置为获取已投放信息在目标投放周期内的第一投放收益,以及所述已投放信息在所述目标投放周期后的预设时间段内的第二投放收益;所述目标投放周期为初始投放阶段中的最后一个投放周期;The third obtaining unit is configured to obtain the first delivery income of the delivered information within the target delivery period, and the second delivery income of the delivered information within a preset time period after the target delivery period; the The target delivery period is the last delivery period in the initial delivery phase;
历史投放收益确定单元,被配置为基于所述第一投放收益以及所述第二投放收益,得到与所述目标投放周期对应的历史投放收益;A historical delivery revenue determining unit configured to obtain a historical delivery revenue corresponding to the target delivery cycle based on the first delivery revenue and the second delivery revenue;
样本生成单元,被配置为基于与所述目标投放周期对应的历史投放收益,生成与所述已投放信息在所述目标投放周期的样本。The sample generating unit is configured to generate a sample corresponding to the delivered information in the target delivery period based on the historical delivery revenue corresponding to the target delivery period.
根据本公开实施例的第五方面,提供一种电子设备,包括:处理器;用于存储所述处理器可执行指令的存储器;其中,所述处理器被配置为执行所述指令,以实现如上所述的投放信息处理方法或者资源预测模型训练方法。According to a fifth aspect of the embodiments of the present disclosure, there is provided an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to execute the instructions to implement The method for processing delivery information or the method for training a resource prediction model as described above.
根据本公开实施例的第六方面,提供一种计算机可读存储介质,当所述计算机可读存储介质中的指令由服务器的处理器执行时,使得服务器能够执行如上所述的投放信息处理方法或者资源预测模型训练方法。According to a sixth aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium. When the instructions in the computer-readable storage medium are executed by the processor of the server, the server can execute the method for processing delivery information as described above. Or resource prediction model training method.
根据本公开实施例的第七方面,提供一种计算机程序产品,所述计算机程序产品包括计算机程序,所述计算机程序存储在可读存储介质中,计算机设备的至少一个处理器从所述可读存储介质读取并执行所述计算机程序,使得设备执行上述的投放信息处理方法或者资源预测模型训练方法。According to a seventh aspect of the embodiments of the present disclosure, there is provided a computer program product, the computer program product includes a computer program, the computer program is stored in a readable storage medium, at least one processor of a computer device reads from the The storage medium reads and executes the computer program, so that the device executes the above-mentioned delivery information processing method or resource prediction model training method.
本公开首先确定目标投放信息在当前投放周期的起始状态信息,然后将起始状态信息输入资源预测模型中的条件变分自编码网络进行资源预测,得到第一资源;再将起始状态信息和第一资源输入到资源预测模型中的预测执行网络进行资源预测,得到第二资源;基于第一资源和第二资源得到目标投放信息对应的目标资源;目标资源为使得目标投放信息在当前投放周期的投放收益满足目标投放收益的预测资源。本公开中对目标投放周期的资源按投放周期进行确定,不同的投放周期对应不同的资源,即根据目标投放周期在每个投放周期的起始状态信息以及资源预测模型,对目标投放信息在当前投放周期所分配的资源进行预测,预测资源为使得目标投放信息在当前投放周期收益满足目标收益的资源,从而提高了资源分配的合理性;进一步可根据目标投放信息在多个投放周期的投放收益确定冷启动结果,该冷启动结果符合投放收益满足目标收益的条件,基于冷启动结果遴选出符合投放目标的投放信息,进而提高投放信息遴选的效率。This disclosure first determines the initial state information of the target delivery information in the current delivery cycle, and then inputs the initial state information into the conditional variational self-encoding network in the resource prediction model to perform resource prediction to obtain the first resource; then the initial state information Input the first resource into the resource prediction model and perform resource prediction to obtain the second resource; obtain the target resource corresponding to the target delivery information based on the first resource and the second resource; the target resource is to make the target delivery information in the current delivery The forecasted resource that the delivery income of the period meets the target delivery income. In this disclosure, the resources of the target delivery period are determined according to the delivery period, and different delivery periods correspond to different resources, that is, according to the initial state information of the target delivery period in each delivery period and the resource prediction model, the target delivery information is determined at the current time. The resources allocated in the delivery cycle are forecasted. The predicted resources are the resources that make the target delivery information in the current delivery cycle meet the target income, thus improving the rationality of resource allocation; further, according to the delivery income of the target delivery information in multiple delivery cycles Determine the cold start result, the cold start result meets the condition that the delivery income meets the target income, and select the delivery information that meets the delivery target based on the cold start result, thereby improving the efficiency of delivery information selection.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.
附图说明Description of drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理,并不构成对本公开的不当限定。The accompanying drawings here are incorporated into the specification and constitute a part of the specification, show embodiments consistent with the disclosure, and are used together with the description to explain the principle of the disclosure, and do not constitute an improper limitation of the disclosure.
图1是根据一示例性实施例示出的一种实施环境示意图。Fig. 1 is a schematic diagram showing an implementation environment according to an exemplary embodiment.
图2是根据一示例性实施例示出的一种投放信息处理方法流程图。Fig. 2 is a flowchart showing a method for processing delivery information according to an exemplary embodiment.
图3是根据一示例性实施例示出的一种投放信息的起始状态特征信息更新方法流 程图。Fig. 3 is a flow chart of a method for updating initial state feature information of delivery information according to an exemplary embodiment.
图4是根据一示例性实施例示出的一种基于预测资源对投放信息进行排序的方法流程图。Fig. 4 is a flow chart showing a method for sorting delivery information based on forecast resources according to an exemplary embodiment.
图5是根据一示例性实施例示出的一种投放收益计算方法流程图。Fig. 5 is a flow chart showing a method for calculating placement revenue according to an exemplary embodiment.
图6是根据一示例性实施例示出的资源预测模型结构示意图。Fig. 6 is a schematic structural diagram of a resource prediction model according to an exemplary embodiment.
图7是根据一示例性实施例示出的一种资源预测模型训练方法流程图。Fig. 7 is a flow chart showing a method for training a resource prediction model according to an exemplary embodiment.
图8是根据一示例性实施例示出的对条件变分自编码网络进行训练的方法流程图。Fig. 8 is a flowchart of a method for training a conditional variational autoencoder network according to an exemplary embodiment.
图9是根据一示例性实施例示出的对条件变分自编码网络进行参数调整的方法流程图。Fig. 9 is a flow chart of a method for adjusting parameters of a conditional variational autoencoder network according to an exemplary embodiment.
图10是根据一示例性实施例示出的目标预测执行网络训练方法流程图。Fig. 10 is a flow chart of a method for training a target prediction execution network according to an exemplary embodiment.
图11是根据一示例性实施例示出的目标分析网络训练方法流程图。Fig. 11 is a flow chart of a method for training a target analysis network according to an exemplary embodiment.
图12是根据一示例性实施例示出的一种样本生成方法流程图。Fig. 12 is a flow chart of a sample generation method according to an exemplary embodiment.
图13是根据一示例性实施例示出的一种投放信息处理装置框图。Fig. 13 is a block diagram of an apparatus for processing delivery information according to an exemplary embodiment.
图14是根据一示例性实施例示出的一种资源预测模型训练装置框图。Fig. 14 is a block diagram of an apparatus for training a resource prediction model according to an exemplary embodiment.
图15是根据一示例性实施例示出的电子设备结构示意图。Fig. 15 is a schematic structural diagram of an electronic device according to an exemplary embodiment.
具体实施方式Detailed ways
为了使本领域普通人员更好地理解本公开的技术方案,下面将结合附图,对本公开实施例中的技术方案进行清楚、完整地描述。In order to enable ordinary persons in the art to better understand the technical solutions of the present disclosure, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings.
需要说明的是,本公开的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本公开的实施例能够以除了在这里图示或描述的那些以外的顺序实施。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。It should be noted that the terms "first" and "second" in the specification and claims of the present disclosure and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein can be practiced in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatuses and methods consistent with aspects of the present disclosure as recited in the appended claims.
请参阅图1,其示出了本公开实施例提供的实施环境示意图,该实施环境可包括:至少一个第一终端110和第二终端120,第一终端110和第二终端120可通过网络进行数据通信。Please refer to FIG. 1 , which shows a schematic diagram of an implementation environment provided by an embodiment of the present disclosure. The implementation environment may include: at least one first terminal 110 and a second terminal 120, and the first terminal 110 and the second terminal 120 may communicate via a network. data communication.
在一些实施例中,第二终端120可对投放系统中的投放信息进行投放,响应于第一终端110接收到投放信息,对投放信息进行展示,以使得用户浏览到该投放信息的情况下,进行点击浏览、点击后转化等操作;第二终端120根据用户基于第一终端110对投放信息的操作,对投放信息的点击数据、转化数据等进行统计及分析。In some embodiments, the second terminal 120 can deliver the delivery information in the delivery system, and display the delivery information in response to the first terminal 110 receiving the delivery information, so that when the user browses the delivery information, Perform operations such as clicking to browse and converting after clicking; the second terminal 120 counts and analyzes the click data and conversion data of the delivery information according to the user's operation on the delivery information based on the first terminal 110 .
第一终端110可以基于浏览器/服务器模式(Browser/Server,B/S)或客户端/服务器模式(Client/Server,C/S)与第二终端120进行通信。第一终端110可以包括:智能手机、平板电脑、笔记本电脑、数字助理、智能可穿戴设备、车载终端、服务器等类型的实体设备,也可以包括运行于实体设备中的软体,例如应用程序等。本公开实施例中的第一终端110上运行的操作系统可以包括但不限于安卓系统、IOS系统、linux、windows等。The first terminal 110 may communicate with the second terminal 120 based on a browser/server mode (Browser/Server, B/S) or a client/server mode (Client/Server, C/S). The first terminal 110 may include physical devices such as smart phones, tablet computers, notebook computers, digital assistants, smart wearable devices, vehicle terminals, servers, etc., and may also include software running on the physical devices, such as applications. The operating system running on the first terminal 110 in the embodiment of the present disclosure may include but not limited to Android system, IOS system, linux, windows and so on.
第二终端120与第一终端110可以通过有线或者无线建立通信连接,第二终端120可以包括一个独立运行的服务器,或者分布式服务器,或者由多个服务器组成的服务器集群,其中服务器可以是云端服务器。The second terminal 120 and the first terminal 110 can establish a communication connection through wired or wireless, and the second terminal 120 can include an independently operated server, or a distributed server, or a server cluster composed of multiple servers, wherein the server can be a cloud server.
投放信息的生命周期一般可分为探索期、成长期、成熟期和衰退期等几个阶段,本公开中的投放周期可以冷启动阶段的一个周期,冷启动阶段即可对应生命周期中的探索期。在投放信息的探索期,新的投放信息被上传并陆续投放,当新的投放信息被投放一段时间并积累了一定的转化数量后,转化数量较好的投放信息可以进入通过探索期进入成长期,而转化数量较差的投放信息则冷启动失败,未来也将不再投放。The life cycle of delivery information can generally be divided into several stages such as exploration period, growth period, maturity period, and decline period. The delivery cycle in this disclosure can be a cycle of the cold start phase, and the cold start phase can correspond to the exploration in the life cycle. Expect. During the exploration period of delivery information, new delivery information is uploaded and delivered one after another. After the new delivery information has been delivered for a period of time and has accumulated a certain number of conversions, the delivery information with a better number of conversions can enter the growth period through the exploration period , and the delivery information with poor conversion quantity will fail the cold start, and will not be delivered in the future.
为了避免相关技术中资源分配,以及投放信息遴选结果不合理的事实,本公开实施例提供了一种投放信息处理方法,请参阅图2,该投放信息处理方法可包括步骤S210至步骤S250。In order to avoid the unreasonable resource allocation and delivery information selection results in the related art, an embodiment of the present disclosure provides a delivery information processing method, please refer to FIG. 2 , the delivery information processing method may include steps S210 to S250.
S210.确定目标投放信息在当前投放周期的起始状态特征信息;所述目标投放信息在所述当前投放周期的起始状态特征信息基于所述目标投放信息在上一投放周期的起始状态特征信息,和所述目标投放信息在所述上一投放周期的投放结果信息得到;所述起始状态特征信息包括所述目标投放信息在所述当前投放周期之前的历史投放结果信息,以及所述目标投放信息的属性信息。S210. Determine the initial state characteristic information of the target delivery information in the current delivery period; the initial state characteristic information of the target delivery information in the current delivery period is based on the initial state characteristics of the target delivery information in the previous delivery period information, and the delivery result information of the target delivery information in the last delivery cycle; the initial state characteristic information includes the historical delivery result information of the target delivery information before the current delivery cycle, and the Attribute information of target delivery information.
投放周期可以为初始投放阶段中的一个周期,一个初始投放阶段中可包括多个投放周期,每个投放周期的时长一般是相同的。在一些实施例中,初始投放阶段可以为投放信息的冷启动阶段。例如冷启动阶段为7天,以每个小时为一个投放周期。The delivery period may be one period in the initial delivery period, and an initial delivery period may include multiple delivery periods, and the duration of each delivery period is generally the same. In some embodiments, the initial delivery phase may be a cold start phase of delivering information. For example, the cold start phase is 7 days, and every hour is a delivery cycle.
在一些实施例中,由于投放信息的状态随着信息的投放发生变化,从而在每个投放周期的初始时刻,可首先确定目标投放信息的起始状态特征信息,当前投放周期的起始状态特征信息基于目标投放信息在上一投放周期的起始状态特征信息,和目标投放信息在所述上一投放周期的投放结果信息得到。起始状态特征信息包括目标投放信息在所述当前投放周期之前的历史投放结果信息,以及目标投放信息的属性信息。In some embodiments, since the state of delivery information changes with the delivery of information, at the initial moment of each delivery cycle, the initial state characteristic information of target delivery information can be determined first, and the initial state feature information of the current delivery cycle The information is obtained based on the initial state characteristic information of the target delivery information in the previous delivery period, and the delivery result information of the target delivery information in the previous delivery period. The initial state feature information includes historical delivery result information of the target delivery information before the current delivery cycle, and attribute information of the target delivery information.
S220.获取资源预测模型;所述资源预测模型包括条件变分自编码网络和预测执行网络。S220. Acquire a resource prediction model; the resource prediction model includes a conditional variational autoencoder network and a prediction execution network.
在一些实施例中,在投放周期的起始时刻,资源预测模型能够以满足目标投放收益为目标,对目标投放信息在当前投放周期内应该被分配的资源进行资源预测,从而使得目标投放信息能够基于预测资源进入到信息投放的后续处理步骤中。In some embodiments, at the beginning of the delivery period, the resource forecasting model can aim at satisfying the target delivery revenue, and perform resource prediction on the resources that should be allocated to the target delivery information in the current delivery period, so that the target delivery information can Enter the subsequent processing steps of information delivery based on the forecasted resources.
S230.将所述目标投放信息在所述当前投放周期的起始状态特征信息,输入到所述条件变分自编码网络进行资源预测,得到第一资源。S230. Input the initial state feature information of the target delivery information in the current delivery cycle into the conditional variational autoencoding network to perform resource prediction, and obtain a first resource.
S240.将所述目标投放信息在所述当前投放周期的起始状态特征信息,以及所述第一资源输入到所述预测执行网络进行资源预测,得到第二资源。S240. Input the initial state feature information of the target delivery information in the current delivery cycle and the first resource to the prediction execution network to perform resource prediction to obtain a second resource.
S250.基于所述第一资源和所述第二资源得到所述目标投放信息对应的目标资源;所述目标资源为使得所述目标投放信息在所述当前投放周期的投放收益满足目标投放收益的预测资源。S250. Obtain the target resource corresponding to the target delivery information based on the first resource and the second resource; the target resource is such that the delivery income of the target delivery information in the current delivery cycle meets the target delivery income Forecast resources.
在一些实施例中,目标资源可以是指协助投放信息进行投放的资源,可以是在信息投放过程中能够使得目标投放信息尽快被投放的资源;即目标资源量越大,则越被尽快投放。In some embodiments, the target resource may refer to a resource that assists in the delivery of information, and may be a resource that enables the target delivery information to be delivered as soon as possible during the information delivery process; that is, the larger the amount of target resources, the faster the delivery.
在一些实施例中,可以投放信息的投放收益最大化为目标进行模型训练,从而得到资源预测模型,即资源预测模型能够以最大化所述当前投放周期,和/或未来时间段的投放收益为目标进行资源预测,使得目标资源即可为使得目标投放信息在当前投放周期,和/或未来时间段内的投放收益最大化的资源,这里的未来时间段可以是指未来的一个或者多个投放周期,也可是指冷启动阶段之后的时间段。In some embodiments, the model training can be carried out with the goal of maximizing the delivery revenue of the delivery information, so as to obtain the resource prediction model, that is, the resource prediction model can maximize the delivery income of the current delivery period and/or the future time period as The target performs resource forecasting, so that the target resource can be the resource that maximizes the delivery revenue of the target delivery information in the current delivery cycle and/or in the future time period, where the future time period can refer to one or more delivery in the future Period, can also refer to the time period after the cold start phase.
在一些实施例中,资源预测模型可采用离线强化学习模型,由于对新的投放信息进行资源分配的目的是为了尽快遴选出有较大潜力的投放信息,同时给不同的投放信息分配资源使得其长期的投放收益最大化,而强化学习的优化目标即是最大化整体收益;另外,强化学习是一个序列化决策问题,而在信息投放的冷启动过程中,也可以每个投放周期内来确定当前投放信息在下一投放周期内的资源,也可看成是一个序列化决策问题;由此可见,可采用强化学习的方法来进行资源预测,以使得投放收益最大化。另外,通过已经积累的历史数据进行离线强化学习模型训练,能够避免直接线上探索时的数据波动对训练结果产生的影响。In some embodiments, the resource prediction model may use an offline reinforcement learning model, since the purpose of resource allocation for new delivery information is to select delivery information with greater potential as soon as possible, and at the same time allocate resources to different delivery information so that its The long-term investment revenue is maximized, and the optimization goal of reinforcement learning is to maximize the overall revenue; in addition, reinforcement learning is a serialized decision-making problem, and in the cold start process of information distribution, it can also be determined in each delivery cycle The resources of the current delivery information in the next delivery cycle can also be regarded as a serialized decision-making problem; it can be seen that the method of reinforcement learning can be used for resource prediction to maximize the delivery revenue. In addition, offline reinforcement learning model training based on accumulated historical data can avoid the impact of data fluctuations during direct online exploration on the training results.
本公开中对目标投放周期的资源按投放周期进行确定,不同的投放周期对应不同 的资源,即根据目标投放周期在每个投放周期的起始状态信息以及资源预测模型,对目标投放信息在当前投放周期所分配的资源进行预测,预测资源为使得目标投放信息在当前投放周期收益满足目标收益的资源,从而提高了资源分配的合理性;进一步可根据目标投放信息在多个投放周期的投放收益确定冷启动结果,该冷启动结果符合投放收益满足目标收益的条件,基于冷启动结果遴选出符合投放目标的投放信息,进而提高投放信息遴选的效率。In this disclosure, the resources of the target delivery period are determined according to the delivery period, and different delivery periods correspond to different resources, that is, according to the initial state information of the target delivery period in each delivery period and the resource prediction model, the target delivery information is determined at the current time. The resources allocated in the delivery cycle are forecasted. The predicted resources are the resources that make the target delivery information in the current delivery cycle meet the target income, thus improving the rationality of resource allocation; further, according to the delivery income of the target delivery information in multiple delivery cycles Determine the cold start result, the cold start result meets the condition that the delivery income meets the target income, and select the delivery information that meets the delivery target based on the cold start result, thereby improving the efficiency of delivery information selection.
在一些实施例中,请参阅图3,其示出了一种投放信息的起始状态特征信息更新方法,该投放信息处理方法可包括步骤S310至步骤S320。In some embodiments, please refer to FIG. 3 , which shows a method for updating characteristic information of an initial state of delivery information, and the method for processing delivery information may include steps S310 to S320.
S310.获取所述目标投放信息在所述上一投放周期的起始状态特征信息;所述在上一投放周期的起始状态特征信息包括所述目标投放信息在所述上一投放周期之前的历史投放结果信息。S310. Obtain the initial state feature information of the target delivery information in the last delivery cycle; the start state feature information in the last delivery cycle includes the target delivery information before the last delivery cycle Historical delivery result information.
S320.基于所述目标投放信息在所述上一投放周期的投放结果信息,对所述历史投放结果信息进行更新,确定所述目标投放信息在所述当前投放周期的起始状态特征信息。S320. Based on the delivery result information of the target delivery information in the last delivery period, update the historical delivery result information, and determine the initial state characteristic information of the target delivery information in the current delivery period.
历史投放结果信息可包括在当前投放周期的起始时刻之前目标投放信息的转化信息,投放设置信息可包括目标投放信息的竞价信息,由于投放信息的状态随着信息的投放发生变化,从而在每个投放周期的初始时刻,可首先确定目标投放信息的起始状态特征信息,可以基于目标投放信息在上一周期的投放结果信息,对上一投放周期的起始状态信息中的历史投放结果信息进行更新,即可得到目标投放信息在当前投放周期的起始状态特征信息。在每个投放周期的起始时刻,均可基于目标投放信息在上一投放周期的起始状态信息以及投放结果信息进行起始状态特征信息的适应性更新,从而能够提高起始状态特征信息对目标投放信息的当前状态特征表征的准确性。The historical delivery result information may include the conversion information of the target delivery information before the start of the current delivery cycle, and the delivery setting information may include the bidding information of the target delivery information. Since the status of the delivery information changes with the delivery of the information, every At the initial moment of a delivery cycle, the initial state feature information of the target delivery information can be determined first, based on the delivery result information of the target delivery information in the previous cycle, the historical delivery result information in the initial state information of the previous delivery cycle By updating, the initial state characteristic information of the target delivery information in the current delivery cycle can be obtained. At the beginning of each delivery cycle, the initial state characteristic information can be adaptively updated based on the initial state information of the target delivery information in the previous delivery cycle and the delivery result information, thereby improving the impact of the initial state feature information on The accuracy of the current state feature representation of target delivery information.
在一些实施例中,所述在上一投放周期的起始状态特征信息还包括投放设置信息以及所述目标投放信息的类别信息;所述投放设置信息用于对多项待投放信息进行排序;从而在确定起始状态特征信息的情况下,可基于所述投放设置信息、所述类别信息,以及更新后的历史投放结果,生成所述目标投放信息在所述当前投放周期的起始状态特征信息。In some embodiments, the initial state feature information in the last delivery cycle further includes delivery setting information and category information of the target delivery information; the delivery setting information is used to sort multiple pieces of information to be delivered; Therefore, when the initial state feature information is determined, the initial state feature of the target delivery information in the current delivery cycle can be generated based on the delivery setting information, the category information, and the updated historical delivery results. information.
类别信息即用于表征目标投放信息的类别特征,例如领域类别、信息类别、创意类别等,其中领域类别可包括电商类别、游戏类别、教育类别等;信息类别可包括视频类别、图片类别、图文类别等;创意类别可包括海报类别、版面类别等。其中历史投放结果信息以及投放设置信息为连续特征,类别信息为离散特征。在生成状态特征信息的情况下,可对历史投放结果信息以及投放设置信息对应的数值进行归一化处理,对类别信息采用one-hot编码生成相应的编码向量,基于经过归一化以及编码之后的特征信息生成相应的状态特征信息能够便于后续进行数据处理,提高数据处理效率。Category information refers to the category characteristics used to characterize target delivery information, such as field category, information category, creative category, etc., where field categories may include e-commerce categories, game categories, education categories, etc.; information categories may include video categories, picture categories, Graphic category, etc.; creative category may include poster category, layout category, etc. Among them, historical delivery result information and delivery setting information are continuous features, and category information is discrete features. In the case of generating state feature information, the values corresponding to the historical delivery result information and delivery setting information can be normalized, and the category information can be generated by one-hot encoding to generate corresponding encoding vectors, based on the normalized and encoded The corresponding state feature information generated by the feature information can facilitate subsequent data processing and improve data processing efficiency.
另外,本公开中通过多维特征信息从不同的角度分别对目标投放信息的起始状态特征信息进行描述,能够提高对目标投放信息的表征能力,从而提高了后续基于状态特征信息进行数据处理的准确性。In addition, the present disclosure uses multi-dimensional feature information to describe the initial state feature information of target delivery information from different angles, which can improve the ability to represent target delivery information, thereby improving the accuracy of subsequent data processing based on state feature information. sex.
在一些实施例中,请参阅图4,其示出了一种基于预测资源对投放信息进行排序的方法,可包括步骤S410至步骤S440。In some embodiments, please refer to FIG. 4 , which shows a method for sorting delivery information based on forecast resources, which may include steps S410 to S440.
S410.基于各项待投放信息在所述当前投放周期内的预测资源,计算所述当前投放周期内的资源均值和资源方差。S410. Based on the predicted resources of each item of information to be delivered in the current delivery period, calculate the resource mean value and resource variance in the current delivery period.
S420.根据所述资源均值、所述资源方差,以及所述目标资源,计算与所述目标资源对应的归一化系数。S420. Calculate a normalization coefficient corresponding to the target resource according to the resource mean value, the resource variance, and the target resource.
S430.基于所述归一化系数以及预设资源量,确定在所述当前投放周期内为所述目标投放信息分配的实际资源。S430. Based on the normalization coefficient and the preset resource amount, determine actual resources allocated to the target delivery information within the current delivery cycle.
S440.基于所述各项待投放信息的实际资源,对所述各项待投放信息进行排序,得到排序结果。S440. Based on the actual resources of the items of information to be delivered, sort the items of information to be delivered, and obtain a sorting result.
本公开中通过资源预测模型预测得到的资源是在投放周期内应该获得的资源量,但是在每个投放周期内对各投放信息的资源量是有限的,从而直接采用预测得到的预测资源会带来资源超预算或者预算不足的情况;为了使得资源分配结果与当前总的资源量相匹配,可对预测得到的资源进行归一化处理,可得到与每项投放信息对应的归一化系数,如式(1)所示:In this disclosure, the resources predicted by the resource prediction model are the amount of resources that should be obtained in the delivery cycle, but the amount of resources for each delivery information in each delivery cycle is limited, so directly using the predicted resources obtained by prediction will bring In order to make the resource allocation result match the current total resource amount, the predicted resources can be normalized, and the normalization coefficient corresponding to each delivery information can be obtained. As shown in formula (1):
Figure PCTCN2022096373-appb-000001
Figure PCTCN2022096373-appb-000001
其中,a i为目标投放信息对应的目标资源,avg(a)为当前冷启动周期内各项投放信息的资源均值,std(a)为当前投放周期内各项投放信息的资源方差。从而将目标投放信息对应的归一化系数应用于当前已有的资源分配策略中,得到与目标投放信息对应的实际资源。本公开基于资源总量的约束,对预测得到的资源进行归一化处理,由于所有a i'的平均值为1,这样可以控制各投放信息被分配的实际资源之和与总的资源量相匹配,避免资源超预算的事实。 Among them, a i is the target resource corresponding to the target delivery information, avg(a) is the resource average value of each delivery information in the current cold start cycle, and std(a) is the resource variance of each delivery information in the current delivery cycle. Therefore, the normalization coefficient corresponding to the target delivery information is applied to the current existing resource allocation strategy to obtain the actual resources corresponding to the target delivery information. Based on the constraints of the total amount of resources, this disclosure normalizes the predicted resources. Since the average value of all a i ' is 1, it is possible to control the sum of the actual resources allocated to each delivery information to be equal to the total amount of resources. Match to avoid the fact that resources exceed budget.
在一些实施例中,响应于确定目标投放信息将被分配的实际资源,可基于该实际资源对各项待投放的目标投放信息进行排序,得到排序结果;该排序结果中包括多项待投放信息的排序情况。进一步地,基于上述内容可知,可基于所述各项待投放信息的投放设置信息,以及所述各项待投放信息的实际资源,对所述各项待投放信息进行排序,得到所述排序结果。基于排序结果,可在当前投放周期内进行信息投放,即在当前投放周期内需要投放哪些目标信息,可基于排序结果进行确定,例如可从排序结果中选取排序靠前的N项目标投放信息进行投放。In some embodiments, in response to determining the actual resources to be allocated to the target delivery information, the target delivery information to be delivered may be sorted based on the actual resources to obtain a sorting result; the sorting result includes a plurality of pieces of information to be delivered sorting situation. Further, based on the above content, it can be known that the various items of information to be delivered can be sorted based on the delivery setting information of the items of information to be delivered and the actual resources of the items of information to be delivered, and the sorting result can be obtained . Based on the sorting results, information delivery can be carried out in the current delivery cycle, that is, which target information needs to be delivered in the current delivery cycle can be determined based on the sorting results, for example, the top N items of target delivery information can be selected from the sorting results. delivery.
在一些实施例中,在基于各项目标投放信息的排序分数从高到低进行排序,排序分数的计算公式如式(2)所示:In some embodiments, sorting is performed based on the ranking scores of each target delivery information from high to low, and the calculation formula of the ranking scores is shown in formula (2):
rank_benefits=ecpm+bonus+ueq     (2)rank_benefits=ecpm+bonus+ueq (2)
其中,ecpm(estimated Cost per Million)为预估的千次展示计费,可基于上述投放设置信息和点击率得到,bonus即为目标资源,ueq(user experience quantity)为用户体验分数。可从排序结果中选取排序在前N位的目标投放信息进行投放。Among them, ecpm (estimated Cost per Million) is the estimated cost per thousand impressions, which can be obtained based on the above delivery setting information and click-through rate, bonus is the target resource, and ueq (user experience quantity) is the user experience score. From the sorting results, the target delivery information ranked in the top N positions can be selected for delivery.
在一些实施例中,请参阅图5,其示出了一种投放收益计算方法,该方法可包括步骤S510至步骤S520。:In some embodiments, please refer to FIG. 5 , which shows a method for calculating advertising revenue, which may include steps S510 to S520. :
S510.获取所述目标投放信息在所述当前投放周期内的投放结果信息;所述投放结果信息包括投放转化数据,以及投放消耗数据。S510. Obtain delivery result information of the target delivery information within the current delivery period; the delivery result information includes delivery conversion data and delivery consumption data.
S520.对所述投放转化数据以及所述投放消耗数据进行加权求和,得到所述目标投放信息在所述当前投放周期内的投放收益。S520. Perform a weighted summation of the delivery conversion data and the delivery consumption data to obtain delivery revenue of the target delivery information in the current delivery period.
在每个投放周期内基于预测出的资源进行信息投放,响应于当前投放周期结束,可确定出在当前投放周期内的投放收益,这里的投放收益可看成是在目标投放信息在处于当前状态下,对目标投放信息分配目标资源所获得的投放收益,当前状态通过当前投放周期的状态特征信息进行表征。Information delivery is carried out based on the predicted resources in each delivery cycle. In response to the end of the current delivery cycle, the delivery income in the current delivery cycle can be determined. The delivery income here can be regarded as the target delivery information in the current state Next, the delivery income obtained by allocating target resources to the target delivery information, the current state is represented by the state characteristic information of the current delivery cycle.
投放转化数据可以为转化率,投放消耗数据可以为在信息投放时的投放竞价,这两项信息可作为投放结果信息,相应可确定投放转化数据以及投放消耗数据的权重,例如投放转化数据的权重可为1,投放消耗数据的权重可为0.05,即以投放数据为主,投放消耗数 据为辅,确定当前投放周期内的投放收益。从而实现基于投放转化数据和投放消耗数据的加权求和来确定目标投放信息在当前投放周期内的投放收益,提高了投放收益确定的准确性和便利性。The delivery conversion data can be the conversion rate, and the delivery consumption data can be the delivery bid when the information is delivered. These two pieces of information can be used as the delivery result information, and the weight of the delivery conversion data and delivery consumption data can be determined accordingly, such as the weight of the delivery conversion data. It can be 1, and the weight of delivery consumption data can be 0.05, that is, the delivery data is the main and delivery consumption data is supplemented to determine the delivery revenue in the current delivery cycle. In this way, the delivery income of the target delivery information in the current delivery cycle can be determined based on the weighted sum of the delivery conversion data and the delivery consumption data, which improves the accuracy and convenience of determining the delivery income.
通过对目标投放信息在多个投放周期的投放收益来确定目标投放信息在冷启动阶段内的投放收益,从而能够基于在冷启动阶段内的投放收益确定相应的冷启动结果。在一些实施例中,响应于冷启动阶段内的投放收益大于等于预设冷启动收益阈值,确定目标投放信息通过冷启动阶段,进入成熟期;响应于冷启动阶段内的投放收益小于预设冷启动收益阈值,确定目标投放信息没有通过冷启动阶段,后续也将不再对其进行投放。从而通过冷启动阶段的目标投放信息被选出来继续进行投放,且这些通过冷启动阶段的目标投放信息均是在冷启动阶段被预期长期投放收益较大的投放信息。The delivery income of the target delivery information in the cold start phase is determined by the delivery income of the target delivery information in multiple delivery cycles, so that the corresponding cold start result can be determined based on the delivery income in the cold start phase. In some embodiments, in response to the delivery revenue in the cold start phase is greater than or equal to the preset cold start revenue threshold, it is determined that the target delivery information passes the cold start phase and enters the mature stage; in response to the delivery revenue in the cold start phase is less than the preset cold start Start the revenue threshold, determine that the target delivery information has not passed the cold start stage, and will not be delivered in the future. Therefore, the target delivery information passing through the cold start stage is selected to continue delivering, and these target delivery information passing through the cold start stage are all delivery information expected to have relatively large long-term delivery benefits during the cold start stage.
在一些实施例中,请参阅图6,其示出了资源预测模型结构示意图,可包括条件变分自编码网络、预测执行网络和预测分析网络,状态state对应状态特征信息,动作action对应所分配的冷启动资源,回报r对应投放收益;其中预测执行网络负责针对当前状态预测出合适的动作,预测分析网络则根据当前状态和动作来评价当前预测出的动作的好坏程度。In some embodiments, please refer to FIG. 6, which shows a schematic structural diagram of a resource prediction model, which may include a conditional variational autoencoder network, a prediction execution network, and a prediction analysis network. The state state corresponds to the state feature information, and the action corresponds to the assigned The return r corresponds to the investment income; the predictive execution network is responsible for predicting the appropriate action for the current state, and the predictive analysis network evaluates the quality of the currently predicted action according to the current state and action.
在一些实施例中,条件变分自编码网络(Conditional VAE)G ω包括一个编码器(encoder)和一个解码器(decoder),encoder的功能为将状态和动作进行编码,使得编码结果与标准正态分布接近;而decoder的功能则是还原该encoder,使得标准正态分布经过decoder后能够与实际的动作和状态分布接近。在条件变分自编码网络中,输入即是当前的状态state和动作action,这些特征经过2层MLP(multi-layer perceptron,多层神经网络),得到一组均值和方差;从该组均值和方差采样出一个样本,再用一个2层MLP,得到一组对encoder输入信息的还原信息。 In some embodiments, the conditional VAE network (Conditional VAE) G ω includes an encoder (encoder) and a decoder (decoder), the function of the encoder is to encode the state and action, so that the encoding result is consistent with the standard The function of the decoder is to restore the encoder, so that the standard normal distribution can be close to the actual action and state distribution after the decoder. In the conditional variational autoencoder network, the input is the current state state and action. These features pass through 2 layers of MLP (multi-layer perceptron, multi-layer neural network) to obtain a set of mean and variance; from the set of mean and A sample is sampled by the variance, and then a 2-layer MLP is used to obtain a set of restoration information for the encoder input information.
对于预测执行网络actor,其输入包括两部分,一部分是当前的state,另一部分是条件变分自编码网络的decoder的输出;该网络的输出为a'=G ω(s)+w*ε φ(G ω(s),s)。本公开实施例中,w可为0.001。在将w设置过大的情况下,则容易出现算法不收敛的现象,而过小的情况下,则预测执行网络的输出对最后结果的影响则被限制。 For the predictive execution network actor, its input includes two parts, one part is the current state, and the other part is the output of the decoder of the conditional variational autoencoder network; the output of the network is a'=G ω (s)+w*ε φ (G ω (s),s). In the embodiment of the present disclosure, w may be 0.001. When w is set too large, it is easy for the algorithm not to converge, and if it is too small, the influence of the output of the predictive execution network on the final result is limited.
预测分析网络critic的输入包括当前的状态state,以及当前的动作action,而预测分析网络需要拟合的目标为该动作下的总体收益,即当前动作收益加上未来收益,即
Figure PCTCN2022096373-appb-000002
此处r即是回报。未来收益是在下一步状态(s’)下的Q θ(s')最大值。
The input of the predictive analysis network critic includes the current state state and the current action action, and the goal that the predictive analysis network needs to fit is the overall income under the action, that is, the current action income plus the future income, that is
Figure PCTCN2022096373-appb-000002
Here r is the return. The future payoff is the maximum value of Q θ (s') at the next state (s').
在一些实施例中,请参阅图7,其示出了一种资源预测模型训练方法,该方法可包括步骤S710至步骤S750。In some embodiments, please refer to FIG. 7 , which shows a method for training a resource prediction model, which may include steps S710 to S750.
S710.获取样本数据;所述样本数据包括样本投放信息在每个历史投放周期内的起始状态特征信息,以及历史资源;所述起始状态特征信息用于表征所述样本投放信息在每个历史投放周期的起始时刻之前的历史投放特征。S710. Acquire sample data; the sample data includes the initial state characteristic information of the sample delivery information in each historical delivery cycle, and historical resources; the initial state characteristic information is used to represent the sample delivery information in each Historical delivery characteristics before the start of the historical delivery period.
S720.基于所述起始状态特征信息以及所述历史资源对预设条件变分自编码网络进行训练,得到目标条件变分自编码网络。S720. Based on the initial state feature information and the historical resources, train a preset conditional variational autoencoder network to obtain a target conditional variational autoencoder network.
S730.将所述目标条件变分自编码网络对所述历史资源的编码信息,以及所述起始状态特征信息,输入到预设预测执行网络进行资源预测,得到与所述历史资源对应的预测资源。S730. Input the encoding information of the historical resource by the target conditional variational self-encoding network and the initial state feature information into the preset prediction execution network for resource prediction, and obtain the prediction corresponding to the historical resource resource.
S740.基于所述历史资源、所述历史资源对应的预测资源,以及目标预测分析网络对所述预设预测执行网络进行训练,得到目标预测执行网络;所述目标预测执行网络预测得 到的预测资源为使得待投放信息在投放周期的投放收益满足目标投放收益的资源。S740. Based on the historical resources, the forecast resources corresponding to the historical resources, and the target forecast analysis network, train the preset forecast execution network to obtain a target forecast execution network; the forecast resources obtained by the target forecast execution network Resources to make the delivery income of the information to be delivered meet the target delivery income in the delivery cycle.
在一些实施例中,在对预设预测执行网络进行训练的过程中,可采用目标预测分析网络对当前预测执行网络的预测结果进行评价,得到评价分数,即当目标预测分析网络对当前预测执行网络的评价分数大于等于预设分数的情况下,可认为对预设预测执行网络的训练达到了收敛条件,结束对预设预测执行网络的训练,将当前预测执行网络确定为目标预测执行网络。In some embodiments, in the process of training the preset prediction execution network, the target prediction analysis network can be used to evaluate the prediction results of the current prediction execution network to obtain an evaluation score, that is, when the target prediction analysis network evaluates the current prediction execution network If the evaluation score of the network is greater than or equal to the preset score, it can be considered that the training of the preset predictive execution network has reached the convergence condition, the training of the preset predictive execution network is ended, and the current predictive execution network is determined as the target predictive execution network.
另外,目标预测分析网络在对当前预测执行网络进行评价的情况下,是基于当前预测执行网络所预测的资源,使得待投放信息在投放周期的投放收益满足目标投放收益的程度来确定的,即当前预测执行网络的预测资源使得待投放信息在投放周期的投放收益越接近目标投放收益,则相应评价分数越高。In addition, when the target forecasting analysis network evaluates the current forecasting execution network, it is determined based on the resources predicted by the current forecasting execution network, so that the delivery income of the information to be delivered in the delivery period meets the target delivery income. That is The prediction resources of the current prediction execution network make the delivery revenue of the information to be delivered in the delivery period closer to the target delivery income, and the corresponding evaluation score is higher.
S750.基于所述目标条件变分自编码网络以及所述目标预测执行网络,得到资源预测模型。S750. Obtain a resource prediction model based on the target conditional variational autoencoder network and the target prediction execution network.
本公开实施例中,可通过交替训练的方式分别对条件变分自编码网络、预测执行网络以及预测分析网络进行训练,即在每次训练一个网络的情况下,可保持其他两个网络不变。可先对条件变分自编码网络进行训练,在条件变分自编码网络的训练到预设程度的情况下,例如训练N轮之后,开始进行条件变分自编码网络、预测执行网络以及预测分析网络进行训练三者交替训练。另外,在强化学习模型中,为了使得预测分析网络训练得更好,可将预测分析网络以及预测执行网络以M:1的频次进行训练,M≥2,使得预测分析网络能够更快收敛。In the embodiment of the present disclosure, the conditional variational autoencoder network, the prediction execution network and the prediction analysis network can be trained separately through alternate training, that is, in the case of training one network each time, the other two networks can be kept unchanged . The conditional variational autoencoder network can be trained first. When the conditional variational autoencoder network is trained to a preset level, for example, after training N rounds, the conditional variational autoencoder network, predictive execution network, and predictive analysis can be started. The network is trained and the three are alternately trained. In addition, in the reinforcement learning model, in order to train the predictive analysis network better, the predictive analysis network and the predictive execution network can be trained at a frequency of M:1, M≥2, so that the predictive analysis network can converge faster.
资源预测模型包括了条件变分自编码网络的decoder部分,以及预测执行网络,从而基于已训练的条件变分自编码网络以及预测执行网络,能够得到用于资源预测的资源预测模型。The resource prediction model includes the decoder part of the conditional variational autoencoder network and the prediction execution network, so that based on the trained conditional variational autoencoder network and prediction execution network, a resource prediction model for resource prediction can be obtained.
在一些实施例中,强化学习的训练样本的形式主要为(s,a,r,s’),其中,s代表代理人以及环境的当前状态(state),a代表在该环境状态下采取的动作(action),r代表在采取了动作a后,环境给出的回报(reward),s’则代表经过动作a后,代理人和环境到达的下一个状态(next state)。In some embodiments, the form of training samples for reinforcement learning is mainly (s, a, r, s'), where s represents the current state (state) of the agent and the environment, and a represents the state of the environment. Action (action), r represents the reward (reward) given by the environment after taking action a, and s' represents the next state (next state) reached by the agent and the environment after action a.
本公开实施例中,由于采用的是离线强化学习模型,所以用于模型训练的样本数据均是历史数据;样本投放信息可以是指已经经历过冷启动阶段,且已确定相应冷启动结果的历史投放信息,这里历史投放信息在每个投放周期的数据,以及最终的冷启动结果均是已知的,从而可基于这些已知数据构建样本数据。In the embodiment of the present disclosure, since the offline reinforcement learning model is used, the sample data used for model training are all historical data; the sample delivery information may refer to the history that has gone through the cold start phase and determined the corresponding cold start result Delivery information, where the historical delivery information data in each delivery cycle and the final cold start result are known, so that sample data can be constructed based on these known data.
与离线强化学习模型的样本形式相对应,以每个投放周期为样本单元,得到与每个投放周期对应的样本对,每项样本中包含当前投放周期的起始状态特征信息、资源、投放收益,以及下一投放周期的起始状态特征信息。从而采用强化学习样本形式进行样本的构建,使得构建出的样本能够适用于强化学习模型的分析方法,从而能够提高构建出的样本的适应性,以及样本构建的效率。Corresponding to the sample form of the offline reinforcement learning model, each delivery cycle is used as a sample unit to obtain a sample pair corresponding to each delivery cycle, and each sample contains the initial state feature information, resources, and delivery revenue of the current delivery cycle , and the starting state characteristic information for the next serving cycle. Therefore, the sample is constructed in the form of a reinforcement learning sample, so that the constructed sample can be applied to the analysis method of the reinforcement learning model, thereby improving the adaptability of the constructed sample and the efficiency of sample construction.
在一些实施例中,请参阅图8,其示出了对条件变分自编码网络进行训练的方法,该方法可包括步骤S810至步骤S820。In some embodiments, please refer to FIG. 8 , which shows a method for training a conditional variational autoencoder network, and the method may include steps S810 to S820.
S810.将所述起始状态特征信息以及所述历史资源输入到所述预设条件变分自编码网络,通过所述预设条件变分自编码网络对所述起始状态特征信息以及所述历史资源的数据分布信息进行拟合,得到概率分布信息,以及通过所述预设条件变分自编码网络对所述历史资源进行编码,得到与所述历史资源对应的编码信息。S810. Input the initial state feature information and the historical resources into the preset conditional variational autoencoder network, and use the preset conditional variational autoencoder network to analyze the initial state feature information and the Fitting the data distribution information of historical resources to obtain probability distribution information, and encoding the historical resources through the preset conditional variational self-encoding network to obtain encoding information corresponding to the historical resources.
S820.基于所述概率分布信息、所述历史资源、以及与所述历史资源对应的编码信息,对所述预设条件变分自编码网络进行训练,得到所述目标条件变分自编码网络。S820. Based on the probability distribution information, the historical resources, and the encoding information corresponding to the historical resources, train the preset conditional variational autoencoder network to obtain the target conditional variational autoencoder network.
在模型训练过程中,条件变分自编码网络的输入可以包括起始状态特征信息以及历史资源,输出可以包括概率分布信息以及对历史资源的编码信息。During the model training process, the input of the conditional variational autoencoder network can include initial state feature information and historical resources, and the output can include probability distribution information and encoding information for historical resources.
在一些实施例中,条件变分自编码网络还可以包括第一网络和第二网络,在输入起始状态特征信息以及历史资源的情况下,相应的第一网络输出概率分布信息,第二网络输出对历史资源的编码信息,第一网络与第二网络相串联。In some embodiments, the conditional variational autoencoder network may also include a first network and a second network. In the case of inputting initial state feature information and historical resources, the corresponding first network outputs probability distribution information, and the second network The encoding information of historical resources is output, and the first network is connected in series with the second network.
在一些实施例中,条件变分自编码网络可以为一个独立编码网络,在输入起始状态特征信息以及历史资源的情况下,相应的独立编码网络的输出包括概率分布信息和对历史资源的编码信息两项信息。In some embodiments, the conditional variational self-encoding network can be an independent encoding network. In the case of inputting initial state feature information and historical resources, the output of the corresponding independent encoding network includes probability distribution information and encoding of historical resources Information Two pieces of information.
在一些实施例中,请参阅图9,其示出了一种对条件变分自编码网络进行参数调整的方法,该方法可包括步骤S910至步骤S940。In some embodiments, please refer to FIG. 9 , which shows a method for adjusting parameters of a conditional variational autoencoder network, and the method may include steps S910 to S940.
S910.根据所述概率分布信息和标准正态分布,得到第一损失分量。S910. Obtain a first loss component according to the probability distribution information and the standard normal distribution.
S920.根据所述历史资源、以及与所述历史资源对应的编码信息,得到第二损失分量。S920. Obtain a second loss component according to the historical resource and the encoding information corresponding to the historical resource.
S930.基于所述第一损失分量以及所述第二损失分量,得到第一损失函数。S930. Obtain a first loss function based on the first loss component and the second loss component.
S940.基于所述第一损失函数对所述预设条件变分自编码网络进行网络参数调整,得到所述目标条件变分自编码网络。S940. Perform network parameter adjustment on the preset conditional variational autoencoder network based on the first loss function to obtain the target conditional variational autoencoder network.
从图6中可以看出,条件变分自编码网络的encoder的输出为概率分布信息,条件变分自编码网络的decoder的输出为对输入的动作action的还原信息,从而可基于这两项信息确定与该模型对应的损失函数,如式(3)所示:It can be seen from Figure 6 that the output of the encoder of the conditional variational autoencoder network is probability distribution information, and the output of the decoder of the conditional variational autoencoder network is the restoration information of the input action action, so that based on these two pieces of information Determine the loss function corresponding to the model, as shown in formula (3):
loss1=x-x'+KL(N(μ,σ),N(0,1))     (3)loss1=x-x'+KL(N(μ,σ),N(0,1)) (3)
其中,x对应条件变分自编码网络的encoder输入的动作action,x’对应的是条件变分自编码网络的decoder的输出的对输入的动作action的还原信息;N(σ,μ)为条件变分自编码网络的encoder输出的概率分布信息,N(0,1)为正态分布,从而可基于该损失函数对条件变分自编码网络进行参数调整,进而得到已训练的条件变分自编码网络。Among them, x corresponds to the action action input by the encoder of the conditional variational autoencoder network, and x' corresponds to the restoration information of the input action action output by the decoder of the conditional variational autoencoder network; N(σ,μ) is the condition The probability distribution information output by the encoder of the variational autoencoder network, N(0,1) is a normal distribution, so that the parameters of the conditional variational autoencoder network can be adjusted based on the loss function, and then the trained conditional variational autoencoder can be obtained. Coding network.
在一些实施例中,请参阅图10,其示出了一种目标预测执行网络训练方法,该方法可包括步骤S1010至步骤S1020。In some embodiments, please refer to FIG. 10 , which shows a method for training a target prediction execution network, which may include steps S1010 to S1020.
S1010.将所述起始状态特征信息,以及所述历史资源对应的预测资源输入到所述目标预测分析网络,通过所述目标预测分析网络对基于所述起始状态特征信息分配所述预测资源的行为进行分析,得到第一分析信息。S1010. Input the initial state feature information and the forecast resources corresponding to the historical resources into the target predictive analysis network, and allocate the forecast resources based on the initial state feature information through the target predictive analysis network Behavior analysis to obtain the first analysis information.
S1020.基于所述第一分析信息对所述预设预测执行网络进行网络参数调整,得到所述目标预测执行网络。S1020. Perform network parameter adjustment on the preset prediction execution network based on the first analysis information to obtain the target prediction execution network.
由于当前对预测执行网络进行训练,可保持条件变分自编码网络以及预测分析网络不变,可直接使用条件变分自编码网络以及预测分析网络进行数据处理。预测执行网络的输入需要依赖于条件变分自编码网络decoder的输出,在对预测执行网络进行训练的情况下,将当前样本对中的状态state以及动作action输入到当前条件变分自编码网络,得到对输入动作action的动作还原信息,将该动作还原信息,以及当前样本对中的状态state输入到预测执行网络,得到输出动作action(即冷启动资源输出信息)。然后将预测执行网络的输出动作action以及当前样本对中的状态state输入到预测分析网络,预测分析网络会给出在状态state下采取输出动作action所得的评价分数Q-value(即收益回报)。基于预测分析网络的评价分数对预测执行网络的参数进行调整,以使得预测分析网络对预测执行网络的输出动作action的评价分数更高;从而通过不断调整预测执行网络的参数,得到已训练预测执行网络。Since the prediction execution network is currently trained, the conditional variational autoencoder network and the predictive analysis network can be kept unchanged, and the conditional variational autoencoder network and the predictive analysis network can be directly used for data processing. The input of the predictive execution network depends on the output of the conditional variational autoencoder network decoder. In the case of training the predictive execution network, the state state and action in the current sample pair are input to the current conditional variational autoencoder network. Obtain the action restoration information for the input action action, input the action restoration information and the state state in the current sample pair to the predictive execution network, and obtain the output action action (that is, the cold start resource output information). Then input the output action action of the predictive execution network and the state state in the current sample pair to the predictive analysis network, and the predictive analysis network will give the evaluation score Q-value (that is, the return) obtained by taking the output action action in the state state. The parameters of the predictive execution network are adjusted based on the evaluation scores of the predictive analysis network, so that the predictive analysis network has a higher evaluation score for the output action action of the predictive execution network; thus, by continuously adjusting the parameters of the predictive execution network, the trained predictive execution network is obtained. network.
在一些实施例中,样本数据还包括所述样本投放信息在每个历史投放周期内的历史投放收益、以及更新状态特征信息;进一步地,请参阅图11,其示出了目标分析网络训练方法,该方法可包括步骤S110至步骤S1150。In some embodiments, the sample data also includes the historical delivery revenue of the sample delivery information in each historical delivery period, and update status feature information; further, please refer to FIG. 11 , which shows the target analysis network training method , the method may include step S110 to step S1150.
S1110.将所述起始状态特征信息以及所述历史资源输入到预设预测分析网络,通过所述预设预测分析网络对基于所述起始状态特征信息分配所述历史资源进行分析,得到第二分析信息。S1110. Input the initial state feature information and the historical resources into a preset predictive analysis network, analyze the allocation of the historical resources based on the initial state feature information through the preset predictive analysis network, and obtain the first 2. Analyzing information.
S1120.基于所述更新状态特征信息以及所述目标条件变分自编码网络进行历史资源采样,得到预设数量的采样资源。S1120. Perform historical resource sampling based on the updated state feature information and the target condition variational autoencoder network to obtain a preset number of sampled resources.
S1130.基于所述更新状态特征信息,确定与所述采样资源对应的投放收益。S1130. Based on the update state feature information, determine the placement revenue corresponding to the sampling resources.
S1140.确定投放收益最大的采样资源为目标采样资源。S1140. Determine that the sampling resource with the largest delivery revenue is the target sampling resource.
S1150.基于所述第二分析信息、所述历史投放收益、以及所述目标采样资源对应的投放收益,对所述预设预测分析网络进行网络参数调整,得到目标预测分析网络。S1150. Based on the second analysis information, the historical investment revenue, and the investment revenue corresponding to the target sampling resource, perform network parameter adjustment on the preset predictive analysis network to obtain a target predictive analysis network.
在对预测分析网络进行训练的情况下,保持条件变分自编码网络以及预测执行网络不变,对于预测分析网络的训练,即是以投放收益最大为训练目标实现的。对于预测分析网络的输出可以是与输入的状态state和动作action对应的收益,即在状态state下采取动作action所能得到的收益。本公开实施例中预测分析网络的你和目标为当前动作action下的总体收益,可以包括当前动作action在当前冷启动周期的收益以及在下一冷启动周期内的收益,从而可将当前收益和未来收益,未来收益可以指在下一状态下的投放收益最大值,下一状态可以为下一冷启动周期对应的状态,即
Figure PCTCN2022096373-appb-000003
作为目标收益,将当前动作action以及当前状态state输入得到的评价分数(即收益回报)与目标收益进行比较,根据比较结果来更新预测分析网络的参数,从而得到已训练预测分析网络。
In the case of training the predictive analysis network, the conditional variational autoencoder network and the predictive execution network are kept unchanged, and the training of the predictive analysis network is realized with the training goal of maximizing the investment income. The output of the predictive analysis network can be the income corresponding to the input state and action, that is, the income that can be obtained by taking an action in the state state. In the embodiment of the disclosure, the target of the predictive analysis network is the overall income of the current action, which may include the income of the current action in the current cold start cycle and the income in the next cold start cycle, so that the current income and the future Income, the future income can refer to the maximum value of the investment income in the next state, and the next state can be the state corresponding to the next cold start cycle, that is,
Figure PCTCN2022096373-appb-000003
As the target income, compare the evaluation score (that is, the income return) obtained by the current action action and the current state state input with the target income, and update the parameters of the predictive analysis network according to the comparison results, so as to obtain the trained predictive analysis network.
其中,对于目标收益中的r,其为当前样本对中的投放收益r,对于在下一状态下的投放收益最大值,可通过bootstrap技巧进行动作action的采样,基于下一状态s'可采样出预设数量的采样动作action,根据样本数据可找出每个采样动作action对应的投放收益,从中可确定出最大的投放收益,这个最大的投放收益即可作为目标收益中的未来收益。Among them, for r in the target income, it is the delivery income r in the current sample pair. For the maximum delivery income in the next state, the bootstrap technique can be used to sample the action, and based on the next state s', it can be sampled For the preset number of sampling actions, according to the sample data, the delivery income corresponding to each sampling action can be found out, and the maximum delivery income can be determined from it. The maximum delivery income can be used as the future income in the target income.
在一些实施例中,请参阅图12,其示出了一种样本生成方法,该方法可包括步骤S1210至步骤S1230。In some embodiments, please refer to FIG. 12 , which shows a sample generation method, which may include steps S1210 to S1230.
S1210.获取已投放信息在目标投放周期内的第一投放收益,以及所述已投放信息在所述目标投放周期后的预设时间段内的第二投放收益;所述目标投放周期为初始投放阶段中的最后一个投放周期。S1210. Obtain the first delivery income of the delivered information within the target delivery period, and the second delivery income of the delivered information within a preset time period after the target delivery period; the target delivery period is the initial delivery The last flight cycle in the stage.
S1220.基于所述第一投放收益以及所述第二投放收益,得到与所述目标投放周期对应的历史投放收益。S1220. Based on the first delivery income and the second delivery income, obtain historical delivery income corresponding to the target delivery period.
S1230.基于与所述目标投放周期对应的历史投放收益,生成与所述已投放信息在所述目标投放周期的样本。S1230. Based on the historical delivery revenue corresponding to the target delivery period, generate a sample of the delivered information in the target delivery period.
在一些实施例中,目标投放信息可以为成功通过冷启动阶段的已投放信息,从而对于初始投放阶段的最后一个投放周期,其相应的样本投放收益可以包括在最后一个投放周期内的投放收益,以及初始投放阶段之后的投放收益;例如,最后一个投放周期的投放收益可包括最后一个投放周期的投放收益以及未来三小时的投放收益。因为最后一个投放阶段之后所得到的投放收益是由于在最后一个投放收益所分配的目标资源所带来的,从而在确定最后一个投放周期的投放收益的情况下,将未来时刻的投放收益考虑在内,提高了样本数据的准确性。In some embodiments, the target delivery information may be delivery information that has successfully passed the cold start phase, so that for the last delivery cycle of the initial delivery phase, its corresponding sample delivery revenue may include delivery revenue in the last delivery cycle, and delivery earnings after the initial delivery period; for example, delivery earnings for the last delivery cycle may include delivery earnings for the last delivery cycle and delivery earnings for the next three hours. Because the delivery income obtained after the last delivery period is brought by the target resources allocated in the last delivery period, when determining the delivery income of the last delivery cycle, the delivery income at the future moment is considered in the Within, the accuracy of the sample data is improved.
图13是根据一示例性实施例示出的一种投放信息处理装置框图,该装置包括状态特征信息确定单元1310、资源预测模型获取单元1320、第一预测单元1330、第二预测单元1340和目标资源确定单元1350。状态特征信息确定单元1310,被配置为确定目标投放信息在当前投放周期的起始状态特征信息。所述目标投放信息在所述当前投放周期的起始状态特征信息基于所述目标投放信息在上一投放周期的起始状态特征信息,和所述目标投放信息在所述上一投放周期的投放结果信息得到;所述起始状态特征信息包括所述目标投放 信息在所述当前投放周期之前的历史投放结果信息,以及所述目标投放信息的属性信息。Fig. 13 is a block diagram of a delivery information processing device according to an exemplary embodiment, the device includes a state characteristic information determination unit 1310, a resource prediction model acquisition unit 1320, a first prediction unit 1330, a second prediction unit 1340, and a target resource Determining unit 1350 . The state characteristic information determining unit 1310 is configured to determine the initial state characteristic information of the target delivery information in the current delivery period. The initial state characteristic information of the target delivery information in the current delivery cycle is based on the initial state feature information of the target delivery information in the previous delivery cycle, and the delivery of the target delivery information in the last delivery cycle Result information is obtained; the initial state feature information includes historical delivery result information of the target delivery information before the current delivery period, and attribute information of the target delivery information.
资源预测模型获取单元1320,被配置为获取资源预测模型。所述资源预测模型包括条件变分自编码网络和预测执行网络。The resource forecasting model acquiring unit 1320 is configured to acquire a resource forecasting model. The resource prediction model includes a conditional variational autoencoder network and a prediction execution network.
第一预测单元1330,被配置为将所述目标投放信息在所述当前投放周期的起始状态特征信息,输入到所述条件变分自编码网络进行资源预测,得到第一资源。The first prediction unit 1330 is configured to input the initial state feature information of the target delivery information in the current delivery period into the conditional variational self-encoding network to perform resource prediction and obtain the first resource.
第二预测单元1340,被配置为将所述目标投放信息在所述当前投放周期的起始状态特征信息,以及所述第一资源输入到所述预测执行网络进行资源预测,得到第二资源。The second prediction unit 1340 is configured to input the initial state feature information of the target delivery information in the current delivery period and the first resource to the forecast execution network to perform resource prediction to obtain a second resource.
目标资源确定单元1350,被配置为基于所述第一资源和所述第二资源得到所述目标投放信息对应的目标资源;所述目标资源为使得所述目标投放信息在所述当前投放周期的投放收益满足目标投放收益的预测资源。The target resource determining unit 1350 is configured to obtain the target resource corresponding to the target delivery information based on the first resource and the second resource; The forecast resources whose delivery revenue meets the target delivery revenue.
在一些实施例中,所述状态特征信息确定单元1310包括:In some embodiments, the state feature information determining unit 1310 includes:
第一获取单元,被配置为获取所述目标投放信息在所述上一投放周期的起始状态特征信息;所述在上一投放周期的起始状态特征信息包括所述目标投放信息在所述上一投放周期之前的历史投放结果信息;The first acquiring unit is configured to acquire the initial state feature information of the target delivery information in the last delivery cycle; the initial state feature information in the last delivery cycle includes the target delivery information in the Historical delivery result information before the previous delivery cycle;
第一更新单元,被配置为基于所述目标投放信息在所述上一投放周期的投放结果信息,对所述历史投放结果信息进行更新,确定所述目标投放信息在所述当前投放周期的起始状态特征信息。The first update unit is configured to update the historical delivery result information based on the delivery result information of the target delivery information in the last delivery period, and determine that the target delivery information starts from the current delivery period Initial state feature information.
在一些实施例中,所述在上一投放周期的起始状态特征信息还包括投放设置信息以及所述目标投放信息的类别信息;所述投放设置信息用于对多项待投放的目标投放信息进行排序;In some embodiments, the initial state feature information in the last delivery cycle also includes delivery setting information and category information of the target delivery information; the delivery setting information is used to set multiple target delivery information to be delivered Sort;
所述第一更新单元包括:第一生成单元,被配置为基于所述投放设置信息、所述类别信息,以及更新后的历史投放结果,生成所述目标投放信息在所述当前投放周期的起始状态特征信息。The first updating unit includes: a first generating unit configured to generate the target delivery information at the beginning of the current delivery cycle based on the delivery setting information, the category information, and the updated historical delivery results. Initial state feature information.
在一些实施例中,所述投放信息处理装置还包括:In some embodiments, the delivery information processing device further includes:
第一计算单元,被配置为基于各项待投放信息在所述当前投放周期内的预测资源,计算所述当前投放周期内的资源均值和资源方差;The first calculation unit is configured to calculate resource mean value and resource variance in the current delivery period based on the predicted resources of each item of information to be delivered in the current delivery period;
第二计算单元,被配置为根据所述资源均值、所述资源方差,以及所述目标资源,计算与所述目标资源对应的归一化系数;A second calculation unit configured to calculate a normalization coefficient corresponding to the target resource according to the resource mean, the resource variance, and the target resource;
实际资源确定单元,被配置为基于所述归一化系数以及预设资源量,确定在所述当前投放周期内为所述目标投放信息分配的实际资源;The actual resource determining unit is configured to determine the actual resource allocated for the target delivery information in the current delivery period based on the normalization coefficient and the preset resource amount;
第一排序单元,被配置为基于各项待投放信息的实际资源,对所述各项待投放信息进行排序,得到排序结果。The first sorting unit is configured to sort the items of information to be delivered based on actual resources of the items of information to be delivered, and obtain a sorting result.
在一些实施例中,所述第一排序单元包括:第二排序单元,被配置为基于所述各项待投放信息的投放设置信息,以及所述各项待投放信息的实际资源,对所述各项待投放信息进行排序,得到所述排序结果。In some embodiments, the first sorting unit includes: a second sorting unit configured to, based on the delivery setting information of the items of information to be delivered and the actual resources of the items of information to be delivered, sort the The information to be delivered is sorted to obtain the sorting result.
在一些实施例中,所述投放信息处理装置还包括:信息投放单元,被配置为基于所述排序结果,在所述当前投放周期内进行信息投放。In some embodiments, the device for processing information delivery further includes: an information delivery unit configured to deliver information within the current delivery period based on the sorting result.
在一些实施例中,所述投放信息处理装置还包括:In some embodiments, the delivery information processing device further includes:
第二获取单元,被配置为获取所述目标投放信息在所述当前投放周期内的投放结果信息;所述投放结果信息包括转化数据,以及投放消耗数据;The second obtaining unit is configured to obtain delivery result information of the target delivery information within the current delivery cycle; the delivery result information includes conversion data and delivery consumption data;
加权求和单元,被配置为对所述投放转化数据以及所述投放消耗数据进行加权求和,得到所述目标投放信息在所述当前投放周期内的投放收益。The weighted summing unit is configured to perform weighted summation on the delivery conversion data and the delivery consumption data to obtain the delivery revenue of the target delivery information in the current delivery period.
请参阅图14,其示出了一种资源预测模型训练装置框图,包括样本数据获取单元1410、第一训练单元1420、第三预测单元1430、第二训练单元1440和资源预测模型确定单元1450。Please refer to FIG. 14 , which shows a block diagram of a resource prediction model training device, including a sample data acquisition unit 1410 , a first training unit 1420 , a third prediction unit 1430 , a second training unit 1440 and a resource prediction model determination unit 1450 .
样本数据获取单元1410,被配置为获取样本数据。所述样本数据包括样本投放信息在每个历史投放周期内的起始状态特征信息,以及历史资源;所述起始状态特征信息用于表征所述样本投放信息在每个历史投放周期的起始时刻之前的历史投放特征。The sample data acquiring unit 1410 is configured to acquire sample data. The sample data includes the initial state characteristic information of the sample delivery information in each historical delivery period, and historical resources; the initial state characteristic information is used to represent the start of the sample delivery information in each historical delivery period Historical delivery characteristics before the moment.
第一训练单元1420,被配置为基于所述起始状态特征信息以及所述历史资源对预设条件变分自编码网络进行训练,得到目标条件变分自编码网络。The first training unit 1420 is configured to train a preset conditional variational autoencoder network based on the initial state feature information and the historical resources to obtain a target conditional variational autoencoder network.
第三预测单元1430,被配置为将所述目标条件变分自编码网络对所述历史资源的编码信息,以及所述起始状态特征信息,输入到预设预测执行网络进行资源预测,得到与所述历史资源对应的预测资源。The third prediction unit 1430 is configured to input the encoding information of the historical resource by the target condition variational self-encoding network and the initial state characteristic information into the preset prediction execution network for resource prediction, and obtain the same The predicted resource corresponding to the historical resource.
第二训练单元1440,被配置为基于所述历史资源、所述历史资源对应的预测资源,以及目标预测分析网络对所述预设预测执行网络进行训练,得到目标预测执行网络。所述目标预测执行网络预测得到的预测资源为使得待投放信息在投放周期的投放收益满足目标投放收益的资源。The second training unit 1440 is configured to train the preset forecast execution network based on the historical resource, the forecast resource corresponding to the historical resource, and the target forecast analysis network to obtain a target forecast execution network. The forecast resources obtained by performing network forecasting in the target forecast are resources that make the delivery revenue of the information to be delivered in the delivery cycle meet the target delivery income.
资源预测模型确定单元1450,被配置为基于所述目标条件变分自编码网络以及所述目标预测执行网络,得到资源预测模型。The resource prediction model determining unit 1450 is configured to obtain a resource prediction model based on the target conditional variational autoencoder network and the target prediction execution network.
在一些实施例中,所述第一训练单元1420包括:In some embodiments, the first training unit 1420 includes:
信息输入单元,被配置为将所述起始状态特征信息以及所述历史资源输入到所述预设条件变分自编码网络,通过所述预设条件变分自编码网络对所述起始状态特征信息以及所述历史资源的数据分布信息进行拟合,得到概率分布信息,以及通过所述预设条件变分自编码网络对所述历史资源进行编码,得到与所述历史资源对应的编码信息;An information input unit configured to input the initial state feature information and the historical resources into the preset conditional variational self-encoding network, through which the initial state Fitting feature information and data distribution information of the historical resources to obtain probability distribution information, and encoding the historical resources through the preset conditional variational self-encoding network to obtain encoding information corresponding to the historical resources ;
第三训练单元,被配置为基于所述概率分布信息、所述历史资源、以及与所述历史资源对应的编码信息,对所述预设条件变分自编码网络进行训练,得到所述目标条件变分自编码网络。The third training unit is configured to train the preset conditional variational autoencoder network based on the probability distribution information, the historical resources, and the encoding information corresponding to the historical resources to obtain the target condition Variational Autoencoder Networks.
在一些实施例中,所述第三训练单元包括:In some embodiments, the third training unit includes:
第一损失分量确定单元,被配置为根据所述概率分布信息和标准正态分布,得到第一损失分量;The first loss component determining unit is configured to obtain the first loss component according to the probability distribution information and the standard normal distribution;
第二损失分量确定单元,被配置为根据所述历史资源、以及与所述历史资源对应的编码信息,得到第二损失分量;The second loss component determining unit is configured to obtain a second loss component according to the historical resource and the encoding information corresponding to the historical resource;
第一损失函数确定单元,被配置为基于所述第一损失分量以及所述第二损失分量,得到第一损失函数;a first loss function determining unit configured to obtain a first loss function based on the first loss component and the second loss component;
第一参数调整单元,被配置为基于所述第一损失函数对所述预设条件变分自编码网络进行网络参数调整,得到所述目标条件变分自编码网络。The first parameter adjustment unit is configured to perform network parameter adjustment on the preset conditional variational autoencoder network based on the first loss function to obtain the target conditional variational autoencoder network.
在一些实施例中,所述第二训练单元1440包括:In some embodiments, the second training unit 1440 includes:
第一分析信息确定单元,被配置为将所述起始状态特征信息,以及所述历史资源对应的预测资源输入到所述目标预测分析网络,通过所述目标预测分析网络对基于所述起始状态特征信息分配所述预测资源的行为进行分析,得到第一分析信息;The first analysis information determination unit is configured to input the initial state feature information and the forecast resources corresponding to the historical resources into the target forecast analysis network, and use the target forecast analysis network to The state feature information allocates the behavior of the prediction resource for analysis, and obtains first analysis information;
第二参数调整单元,被配置为基于所述第一分析信息对所述预设预测执行网络进行网络参数调整,得到所述目标预测执行网络。The second parameter adjustment unit is configured to perform network parameter adjustment on the preset prediction execution network based on the first analysis information to obtain the target prediction execution network.
在一些实施例中,所述样本数据还包括所述样本投放信息在每个历史投放周期内的历史投放收益、以及更新状态特征信息;所述更新状态特征信息基于所述起始状态特征信息和所述样本投放信息在所述历史投放周期内的投放结果信息得到;In some embodiments, the sample data also includes the historical delivery revenue of the sample delivery information in each historical delivery cycle, and updated status feature information; the updated status feature information is based on the initial status feature information and The sample delivery information is obtained from delivery result information within the historical delivery period;
所述资源预测模型训练装置还包括:The resource prediction model training device also includes:
第二分析信息确定单元,被配置为将所述起始状态特征信息以及所述历史资源输入到预设预测分析网络,通过所述预设预测分析网络对基于所述起始状态特征信息分配所述历史资源进行分析,得到第二分析信息;The second analysis information determination unit is configured to input the initial state feature information and the historical resources into a preset predictive analysis network, and use the preset predictive analysis network to allocate the resources based on the initial state feature information. Analyze the above-mentioned historical resources to obtain the second analysis information;
资源采样单元,被配置为基于所述更新状态特征信息以及所述目标条件变分自编码网 络进行历史资源采样,得到预设数量的采样资源;The resource sampling unit is configured to perform historical resource sampling based on the update status feature information and the target condition variational self-encoding network to obtain a preset number of sampling resources;
投放收益确定单元,被配置为基于所述更新状态特征信息,确定与所述采样资源对应的投放收益;The delivery revenue determination unit is configured to determine the delivery revenue corresponding to the sampling resource based on the update status feature information;
目标采样资源确定单元,被配置为确定投放收益最大的采样资源为目标采样资源;The target sampling resource determination unit is configured to determine that the sampling resource with the largest delivery revenue is the target sampling resource;
第三参数调整单元,被配置为基于所述第二分析信息、所述历史投放收益、以及所述目标采样资源对应的投放收益,对所述预设预测分析网络进行网络参数调整,得到目标预测分析网络。The third parameter adjustment unit is configured to adjust the network parameters of the preset predictive analysis network based on the second analysis information, the historical investment income, and the investment income corresponding to the target sampling resource to obtain a target prediction Analyze the web.
在一些实施例中,所述资源预测模型训练装置还包括:In some embodiments, the resource prediction model training device further includes:
第三获取单元,被配置为获取已投放信息在目标投放周期内的第一投放收益,以及所述已投放信息在所述目标投放周期后的预设时间段内的第二投放收益;所述目标投放周期为初始投放阶段中的最后一个投放周期;The third obtaining unit is configured to obtain the first delivery income of the delivered information within the target delivery period, and the second delivery income of the delivered information within a preset time period after the target delivery period; the The target delivery period is the last delivery period in the initial delivery phase;
历史投放收益确定单元,被配置为基于所述第一投放收益以及所述第二投放收益,得到与所述目标投放周期对应的历史投放收益;A historical delivery revenue determining unit configured to obtain a historical delivery revenue corresponding to the target delivery cycle based on the first delivery revenue and the second delivery revenue;
样本生成单元,被配置为基于与所述目标投放周期对应的历史投放收益,生成与所述已投放信息在所述目标投放周期的样本。The sample generating unit is configured to generate a sample corresponding to the delivered information in the target delivery period based on the historical delivery revenue corresponding to the target delivery period.
关于上述实施例中的装置,其中各个模块执行操作的方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。With regard to the apparatus in the foregoing embodiments, the manner in which each module executes operations has been described in detail in embodiments related to the method, and will not be described in detail here.
在示例性实施例中,还提供了一种包括指令的计算机可读存储介质。在一些实施例中,计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等;当所述计算机可读存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行如上所述的任一方法。In an exemplary embodiment, a computer-readable storage medium including instructions is also provided. In some embodiments, the computer-readable storage medium can be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, and optical data storage device, etc.; When executed by the processor, the electronic device can execute any one of the above methods.
在示例性实施例中,还提供一种计算机程序产品,所述计算机程序产品包括计算机程序,所述计算机程序存储在可读存储介质中,计算机设备的至少一个处理器从所述可读存储介质读取并执行所述计算机程序,使得设备执行上述任一方法。In an exemplary embodiment, there is also provided a computer program product comprising a computer program stored in a readable storage medium from which at least one processor of a computer device reads Reading and executing the computer program causes the device to perform any of the above methods.
本实施例还提供了一种设备,其结构图请参见图15,该设备1500可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)1522(例如,一个或一个以上处理器)和存储器1532,一个或一个以上存储应用程序1542或数据1544的存储媒体1530(例如一个或一个以上海量存储设备)。其中,存储器1532和存储媒体1530可以是短暂存储或持久存储。存储在存储媒体1530的程序可以包括一个或一个以上模块(图示未示出),每个模块可以包括对设备中的一系列指令操作。更进一步地,中央处理器1522可以设置为与存储媒体1530通信,在设备1500上执行存储媒体1530中的一系列指令操作。设备1500还可以包括一个或一个以上电源1526,一个或一个以上有线或无线网络接口1550,一个或一个以上输入输出接口1558,和/或,一个或一个以上操作系统1541,例如Windows Server TM,Mac OS X TM,Unix TM,Linux TM,FreeBSD TM等等。本实施例上述的任一方法均可基于图15所示的设备进行实施。 This embodiment also provides a device, its structural diagram please refer to Figure 15, the device 1500 may have relatively large differences due to different configurations or performances, and may include one or more central processing units (central processing units, CPU) 1522 (eg, one or more processors) and memory 1532, one or more storage media 1530 (eg, one or more mass storage devices) for storing application programs 1542 or data 1544. Wherein, the memory 1532 and the storage medium 1530 may be temporary storage or persistent storage. The program stored in the storage medium 1530 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the device. Furthermore, the central processing unit 1522 may be configured to communicate with the storage medium 1530 , and execute a series of instruction operations in the storage medium 1530 on the device 1500 . Device 1500 may also include one or more power sources 1526, one or more wired or wireless network interfaces 1550, one or more input and output interfaces 1558, and/or, one or more operating systems 1541, such as Windows Server , Mac OS X , Unix , Linux , FreeBSD , etc. Any of the above-mentioned methods in this embodiment can be implemented based on the device shown in FIG. 15 .
本公开所有实施例均可以单独被执行,也可以与其他实施例相结合被执行,均视为本公开要求的保护范围。All the embodiments of the present disclosure can be implemented independently or in combination with other embodiments, which are all regarded as the scope of protection required by the present disclosure.

Claims (29)

  1. 一种投放信息处理方法,包括:A delivery information processing method, comprising:
    确定目标投放信息在当前投放周期的起始状态特征信息;所述目标投放信息在所述当前投放周期的起始状态特征信息基于所述目标投放信息在上一投放周期的起始状态特征信息,和所述目标投放信息在所述上一投放周期的投放结果信息得到;所述起始状态特征信息包括所述目标投放信息在所述当前投放周期之前的历史投放结果信息,以及所述目标投放信息的属性信息;determining the initial state feature information of the target delivery information in the current delivery cycle; the initial state feature information of the target delivery information in the current delivery cycle is based on the initial state feature information of the target delivery information in the previous delivery cycle, and the delivery result information of the target delivery information in the last delivery cycle; the initial state characteristic information includes the historical delivery result information of the target delivery information before the current delivery cycle, and the target delivery information attribute information of the information;
    获取资源预测模型;所述资源预测模型包括条件变分自编码网络和预测执行网络;Obtain a resource prediction model; the resource prediction model includes a conditional variational autoencoder network and a prediction execution network;
    将所述目标投放信息在所述当前投放周期的起始状态特征信息,输入到所述条件变分自编码网络进行资源预测,得到第一资源;inputting the initial state feature information of the target delivery information in the current delivery period into the conditional variational self-encoding network for resource prediction to obtain the first resource;
    将所述目标投放信息在所述当前投放周期的起始状态特征信息,以及所述第一资源输入到所述预测执行网络进行资源预测,得到第二资源;Inputting the initial state feature information of the target delivery information in the current delivery cycle and the first resource into the prediction execution network for resource prediction to obtain a second resource;
    基于所述第一资源和所述第二资源得到所述目标投放信息对应的目标资源;所述目标资源为使得所述目标投放信息在所述当前投放周期的投放收益满足目标投放收益的预测资源。The target resource corresponding to the target delivery information is obtained based on the first resource and the second resource; the target resource is a predicted resource that makes the delivery income of the target delivery information in the current delivery cycle meet the target delivery income .
  2. 根据权利要求1所述的方法,其中,所述确定目标投放信息在当前投放周期的起始状态特征信息包括:The method according to claim 1, wherein said determining the initial state feature information of the target delivery information in the current delivery cycle includes:
    获取所述目标投放信息在所述上一投放周期的起始状态特征信息;所述在上一投放周期的起始状态特征信息包括所述目标投放信息在所述上一投放周期之前的历史投放结果信息;Obtaining the initial state feature information of the target delivery information in the last delivery cycle; the start state feature information in the last delivery cycle includes the historical delivery of the target delivery information before the last delivery cycle result information;
    基于所述目标投放信息在所述上一投放周期的投放结果信息,对所述历史投放结果信息进行更新,确定所述目标投放信息在所述当前投放周期的起始状态特征信息。Based on the delivery result information of the target delivery information in the last delivery period, the historical delivery result information is updated, and the initial state characteristic information of the target delivery information in the current delivery period is determined.
  3. 根据权利要求2所述的方法,其中,所述在上一投放周期的起始状态特征信息还包括投放设置信息以及所述目标投放信息的类别信息;所述投放设置信息用于对多项待投放信息进行排序;The method according to claim 2, wherein the initial state characteristic information in the last delivery cycle further includes delivery setting information and category information of the target delivery information; the delivery setting information is used for multiple waiting Sort the delivery information;
    所述基于所述目标投放信息在所述上一投放周期的投放结果信息,对所述历史投放结果信息进行更新,确定所述目标投放信息在所述当前投放周期的起始状态特征信息,包括:The updating of the historical delivery result information based on the delivery result information of the target delivery information in the previous delivery period, and determining the initial state characteristic information of the target delivery information in the current delivery period, including :
    基于所述投放设置信息、所述类别信息,以及更新后的历史投放结果,生成所述目标投放信息在所述当前投放周期的起始状态特征信息。Based on the delivery setting information, the category information, and the updated historical delivery results, the initial state characteristic information of the target delivery information in the current delivery cycle is generated.
  4. 根据权利要求3所述的方法,还包括:The method according to claim 3, further comprising:
    基于各项待投放信息在所述当前投放周期内的预测资源,计算所述当前投放周期内的资源均值和资源方差;Calculate resource mean and resource variance in the current delivery cycle based on the predicted resources of each item of information to be delivered in the current delivery period;
    根据所述资源均值、所述资源方差,以及所述目标资源,计算与所述目标资源对应的归一化系数;calculating a normalization coefficient corresponding to the target resource according to the resource mean, the resource variance, and the target resource;
    基于所述归一化系数以及预设资源量,确定在所述当前投放周期内为所述目标投放信息分配的实际资源;Based on the normalization coefficient and the preset amount of resources, determine actual resources allocated to the target delivery information within the current delivery cycle;
    基于所述各项待投放信息的实际资源,对所述各项待投放信息进行排序,得到排序结果。Based on the actual resources of the items of information to be delivered, the items of information to be delivered are sorted to obtain a sorting result.
  5. 根据权利要求4所述的方法,其中,所述基于各项待投放信息的实际资源,对所述各项待投放信息进行排序,得到排序结果,包括:The method according to claim 4, wherein, based on the actual resources of each item of information to be delivered, sorting the items of information to be delivered to obtain a sorting result includes:
    基于所述各项待投放信息的投放设置信息,以及所述各项待投放信息的实际资源,对所述各项待投放信息进行排序,得到所述排序结果。Based on the delivery setting information of the items of information to be delivered and the actual resources of the items of information to be delivered, the items of information to be delivered are sorted to obtain the sorting result.
  6. 根据权利要求5所述的方法,还包括:The method according to claim 5, further comprising:
    基于所述排序结果,在所述当前投放周期内进行信息投放。Based on the ranking result, information delivery is performed within the current delivery period.
  7. 根据权利要求4所述的方法,还包括:The method according to claim 4, further comprising:
    获取所述目标投放信息在所述当前投放周期内的投放结果信息;所述投放结果信息包括转化数据,以及投放消耗数据;Obtain delivery result information of the target delivery information within the current delivery cycle; the delivery result information includes conversion data and delivery consumption data;
    对所述投放转化数据以及所述投放消耗数据进行加权求和,得到所述目标投放信息在所述当前投放周期内的投放收益。The delivery conversion data and the delivery consumption data are weighted and summed to obtain the delivery revenue of the target delivery information in the current delivery period.
  8. 一种资源预测模型训练方法,包括:A resource prediction model training method, comprising:
    获取样本数据;所述样本数据包括样本投放信息在每个历史投放周期内的起始状态特征信息,以及历史资源;所述起始状态特征信息用于表征所述样本投放信息在每个历史投放周期的起始时刻之前的历史投放特征;Obtaining sample data; the sample data includes initial state characteristic information of sample delivery information in each historical delivery cycle, and historical resources; the initial state characteristic information is used to characterize the sample delivery information in each historical delivery cycle Historical distribution characteristics before the beginning of the period;
    基于所述起始状态特征信息以及所述历史资源对预设条件变分自编码网络进行训练,得到目标条件变分自编码网络;Based on the initial state feature information and the historical resources, the preset condition variational autoencoder network is trained to obtain the target conditional variational autoencoder network;
    将所述目标条件变分自编码网络对所述历史资源的编码信息,以及所述起始状态特征信息,输入到预设预测执行网络进行资源预测,得到与所述历史资源对应的预测资源;Inputting the coding information of the historical resource by the target condition variational self-encoding network and the initial state feature information into the preset prediction execution network to perform resource prediction, and obtain the predicted resource corresponding to the historical resource;
    基于所述历史资源、所述历史资源对应的预测资源,以及目标预测分析网络对所述预设预测执行网络进行训练,得到目标预测执行网络;所述目标预测执行网络预测得到的预测资源为使得待投放信息在投放周期的投放收益满足目标投放收益的资源;Based on the historical resources, the forecast resources corresponding to the historical resources, and the target forecast analysis network, the preset forecast execution network is trained to obtain the target forecast execution network; the forecast resources obtained by the target forecast execution network are as follows: Resources for which the delivery income of the information to be delivered in the delivery cycle meets the target delivery income;
    基于所述目标条件变分自编码网络以及所述目标预测执行网络,得到资源预测模型。A resource prediction model is obtained based on the target conditional variational self-encoding network and the target prediction execution network.
  9. 根据权利要求8所述的方法,其中,所述基于所述起始状态特征信息以及所述历史资源对预设条件变分自编码网络进行训练,得到目标条件变分自编码网络,包括:The method according to claim 8, wherein the training of the preset conditional variational autoencoder network based on the initial state feature information and the historical resources to obtain the target conditional variational autoencoder network comprises:
    将所述起始状态特征信息以及所述历史资源输入到所述预设条件变分自编码网络,通过所述预设条件变分自编码网络对所述起始状态特征信息以及所述历史资源的数据分布信息进行拟合,得到概率分布信息,以及通过所述预设条件变分自编码网络对所述历史资源进行编码,得到与所述历史资源对应的编码信息;Input the initial state feature information and the historical resources into the preset conditional variational self-encoding network, through the preset conditional variational autoencoding network, the initial state feature information and the historical resource fitting the data distribution information to obtain probability distribution information, and encoding the historical resources through the preset conditional variational self-encoding network to obtain encoding information corresponding to the historical resources;
    基于所述概率分布信息、所述历史资源、以及与所述历史资源对应的编码信息,对所述预设条件变分自编码网络进行训练,得到所述目标条件变分自编码网络。Based on the probability distribution information, the historical resources, and the encoding information corresponding to the historical resources, the preset conditional variational autoencoder network is trained to obtain the target conditional variational autoencoder network.
  10. 根据权利要求9所述的方法,其中,所述基于所述概率分布信息、所述历史资源、以及与所述历史资源对应的编码信息,对所述预设条件变分自编码网络进行训练,得到所述目标条件变分自编码网络包括:The method according to claim 9, wherein, based on the probability distribution information, the historical resources, and the encoding information corresponding to the historical resources, the preset conditional variational autoencoder network is trained, Obtaining the target conditional variational autoencoder network includes:
    根据所述概率分布信息和标准正态分布,得到第一损失分量;Obtaining a first loss component according to the probability distribution information and a standard normal distribution;
    根据所述历史资源、以及与所述历史资源对应的编码信息,得到第二损失分量;Obtaining a second loss component according to the historical resource and the encoding information corresponding to the historical resource;
    基于所述第一损失分量以及所述第二损失分量,得到第一损失函数;Obtaining a first loss function based on the first loss component and the second loss component;
    基于所述第一损失函数对所述预设条件变分自编码网络进行网络参数调整,得到所述目标条件变分自编码网络。Adjusting network parameters of the preset conditional variational autoencoder network based on the first loss function to obtain the target conditional variational autoencoder network.
  11. 根据权利要求8所述的方法,其中,所述基于所述历史资源、所述历史资源对应的预测资源,以及目标预测分析网络对所述预设预测执行网络进行训练,得到目标预测执行网络包括:The method according to claim 8, wherein the preset forecasting execution network is trained based on the historical resources, the forecasting resources corresponding to the historical resources, and the target forecasting analysis network, and the target forecasting execution network includes :
    将所述起始状态特征信息,以及所述历史资源对应的预测资源输入到所述目标预测分析网络,通过所述目标预测分析网络对基于所述起始状态特征信息分配所述预测资源的行为进行分析,得到第一分析信息;Inputting the initial state feature information and the forecast resources corresponding to the historical resources into the target predictive analysis network, and through the target predictive analysis network, the behavior of allocating the forecast resources based on the initial state feature information Perform analysis to obtain first analysis information;
    基于所述第一分析信息对所述预设预测执行网络进行网络参数调整,得到所述目标预测执行网络。Performing network parameter adjustment on the preset forecasting execution network based on the first analysis information to obtain the target forecasting execution network.
  12. 根据权利要求8所述的方法,其中,所述样本数据还包括所述样本投放信息在每个历史投放周期内的历史投放收益、以及更新状态特征信息;所述更新状态特征 信息基于所述起始状态特征信息和所述样本投放信息在所述历史投放周期内的投放结果信息得到;The method according to claim 8, wherein the sample data further includes the historical delivery revenue of the sample delivery information in each historical delivery period, and update status feature information; the update status feature information is based on the start The initial state feature information and the delivery result information of the sample delivery information in the historical delivery period are obtained;
    所述方法还包括:The method also includes:
    将所述起始状态特征信息以及所述历史资源输入到预设预测分析网络,通过所述预设预测分析网络对基于所述起始状态特征信息分配所述历史资源进行分析,得到第二分析信息;inputting the initial state feature information and the historical resources into a preset predictive analysis network, analyzing the allocation of the historical resources based on the initial state feature information through the preset predictive analysis network, and obtaining a second analysis information;
    基于所述更新状态特征信息以及所述目标条件变分自编码网络进行历史资源采样,得到预设数量的采样资源;Sampling historical resources based on the updated state feature information and the target condition variational self-encoding network to obtain a preset number of sampling resources;
    基于所述更新状态特征信息,确定与所述采样资源对应的投放收益;Based on the update state characteristic information, determine the delivery revenue corresponding to the sampling resources;
    确定投放收益最大的采样资源为目标采样资源;Determining the sampling resource with the largest delivery revenue as the target sampling resource;
    基于所述第二分析信息、所述历史投放收益、以及所述目标采样资源对应的投放收益,对所述预设预测分析网络进行网络参数调整,得到目标预测分析网络。Based on the second analysis information, the historical investment revenue, and the investment revenue corresponding to the target sampling resource, network parameters are adjusted for the preset predictive analysis network to obtain a target predictive analysis network.
  13. 根据权利要求8所述的方法,还包括:The method of claim 8, further comprising:
    获取已投放信息在目标投放周期内的第一投放收益,以及所述已投放信息在所述目标投放周期后的预设时间段内的第二投放收益;所述目标投放周期为初始投放阶段中的最后一个投放周期;Obtain the first delivery income of the delivered information within the target delivery period, and the second delivery income of the delivered information within the preset time period after the target delivery period; the target delivery period is the initial delivery period the last delivery cycle for ;
    基于所述第一投放收益以及所述第二投放收益,得到与所述目标投放周期对应的历史投放收益;Based on the first delivery income and the second delivery income, obtaining historical delivery income corresponding to the target delivery period;
    基于与所述目标投放周期对应的历史投放收益,生成与所述已投放信息在所述目标投放周期的样本。Based on the historical delivery revenue corresponding to the target delivery period, a sample corresponding to the delivered information in the target delivery period is generated.
  14. 一种投放信息处理装置,包括:A delivery information processing device, comprising:
    状态特征信息确定单元,被配置为确定目标投放信息在当前投放周期的起始状态特征信息;所述目标投放信息在所述当前投放周期的起始状态特征信息基于所述目标投放信息在上一投放周期的起始状态特征信息,和所述目标投放信息在所述上一投放周期的投放结果信息得到;所述起始状态特征信息包括所述目标投放信息在所述当前投放周期之前的历史投放结果信息,以及所述目标投放信息的属性信息;The state characteristic information determining unit is configured to determine the initial state characteristic information of the target delivery information in the current delivery cycle; the initial state characteristic information of the target delivery information in the current delivery cycle is based on the target delivery information in the previous The starting state characteristic information of the delivery period, and the delivery result information of the target delivery information in the last delivery period are obtained; the starting state characteristic information includes the history of the target delivery information before the current delivery period delivery result information, and attribute information of the target delivery information;
    资源预测模型获取单元,被配置为获取资源预测模型;所述资源预测模型包括条件变分自编码网络和预测执行网络;A resource forecasting model acquisition unit configured to acquire a resource forecasting model; the resource forecasting model includes a conditional variational self-encoding network and a forecasting execution network;
    第一预测单元,被配置为将所述目标投放信息在所述当前投放周期的起始状态特征信息,输入到所述条件变分自编码网络进行资源预测,得到第一资源;The first prediction unit is configured to input the initial state feature information of the target delivery information in the current delivery cycle into the conditional variational self-encoding network to perform resource prediction and obtain the first resource;
    第二预测单元,被配置为将所述目标投放信息在所述当前投放周期的起始状态特征信息,以及所述第一资源输入到所述预测执行网络进行资源预测,得到第二资源;The second prediction unit is configured to input the initial state feature information of the target delivery information in the current delivery cycle and the first resource to the forecast execution network to perform resource prediction, and obtain a second resource;
    目标资源确定单元,被配置为基于所述第一资源和所述第二资源得到所述目标投放信息对应的目标资源;所述目标资源为使得所述目标投放信息在所述当前投放周期的投放收益满足目标投放收益的预测资源。The target resource determination unit is configured to obtain the target resource corresponding to the target delivery information based on the first resource and the second resource; the target resource is the delivery of the target delivery information in the current delivery period Forecasted resources whose revenue meets the target delivery revenue.
  15. 根据权利要求14所述的装置,其中,所述状态特征信息确定单元包括:The device according to claim 14, wherein the state characteristic information determining unit comprises:
    第一获取单元,被配置为获取所述目标投放信息在所述上一投放周期的起始状态特征信息;所述在上一投放周期的起始状态特征信息包括所述目标投放信息在所述上一投放周期之前的历史投放结果信息;The first acquiring unit is configured to acquire the initial state feature information of the target delivery information in the last delivery cycle; the initial state feature information in the last delivery cycle includes the target delivery information in the Historical delivery result information before the previous delivery cycle;
    第一更新单元,被配置为基于所述目标投放信息在所述上一投放周期的投放结果信息,对所述历史投放结果信息进行更新,确定所述目标投放信息在所述当前投放周期的起始状态特征信息。The first update unit is configured to update the historical delivery result information based on the delivery result information of the target delivery information in the last delivery period, and determine that the target delivery information starts from the current delivery period Initial state feature information.
  16. 根据权利要求15所述的装置,其中,所述在上一投放周期的起始状态特征信息还包括投放设置信息以及所述目标投放信息的类别信息;所述投放设置信息用于对多项待投放的目标投放信息进行排序;The device according to claim 15, wherein the initial state feature information in the last delivery period further includes delivery setting information and category information of the target delivery information; the delivery setting information is used for multiple waiting Sort the target delivery information of delivery;
    所述第一更新单元包括:The first update unit includes:
    第一生成单元,被配置为基于所述投放设置信息、所述类别信息,以及更新后的历史投放结果,生成所述目标投放信息在所述当前投放周期的起始状态特征信息。The first generating unit is configured to generate initial state characteristic information of the target delivery information in the current delivery cycle based on the delivery setting information, the category information, and updated historical delivery results.
  17. 根据权利要求16所述的装置,还包括:The apparatus of claim 16, further comprising:
    第一计算单元,被配置为基于各项待投放信息在所述当前投放周期内的预测资源,计算所述当前投放周期内的资源均值和资源方差;The first calculation unit is configured to calculate resource mean value and resource variance in the current delivery period based on the predicted resources of each item of information to be delivered in the current delivery period;
    第二计算单元,被配置为根据所述资源均值、所述资源方差,以及所述目标资源,计算与所述目标资源对应的归一化系数;A second calculation unit configured to calculate a normalization coefficient corresponding to the target resource according to the resource mean, the resource variance, and the target resource;
    实际资源确定单元,被配置为基于所述归一化系数以及预设资源量,确定在所述当前投放周期内为所述目标投放信息分配的实际资源;The actual resource determining unit is configured to determine the actual resource allocated for the target delivery information in the current delivery period based on the normalization coefficient and the preset resource amount;
    第一排序单元,被配置为基于各项待投放信息的实际资源,对所述各项待投放信息进行排序,得到排序结果。The first sorting unit is configured to sort the items of information to be delivered based on actual resources of the items of information to be delivered, and obtain a sorting result.
  18. 根据权利要求17所述的装置,其中,所述第一排序单元包括:The device according to claim 17, wherein the first sorting unit comprises:
    第二排序单元,被配置为基于所述各项待投放信息的投放设置信息,以及所述各项待投放信息的实际资源,对所述各项待投放信息进行排序,得到所述排序结果。The second sorting unit is configured to sort the items of information to be delivered based on the delivery setting information of the items of information to be delivered and the actual resources of the items of information to be delivered, to obtain the sorting result.
  19. 根据权利要求18所述的装置,还包括:The apparatus of claim 18, further comprising:
    信息投放单元,被配置为基于所述排序结果,在所述当前投放周期内进行信息投放。The information delivery unit is configured to deliver information within the current delivery period based on the ranking result.
  20. 根据权利要求17所述的装置,还包括:The apparatus of claim 17, further comprising:
    第二获取单元,被配置为获取所述目标投放信息在所述当前投放周期内的投放结果信息;所述投放结果信息包括转化数据,以及投放消耗数据;The second obtaining unit is configured to obtain delivery result information of the target delivery information within the current delivery cycle; the delivery result information includes conversion data and delivery consumption data;
    加权求和单元,被配置为对所述投放转化数据以及所述投放消耗数据进行加权求和,得到所述目标投放信息在所述当前投放周期内的投放收益。The weighted summing unit is configured to perform weighted summation on the delivery conversion data and the delivery consumption data to obtain the delivery revenue of the target delivery information in the current delivery period.
  21. 一种资源预测模型训练装置,包括:A resource prediction model training device, comprising:
    样本数据获取单元,被配置为获取样本数据;所述样本数据包括样本投放信息在每个历史投放周期内的起始状态特征信息,以及历史资源;所述起始状态特征信息用于表征所述样本投放信息在每个历史投放周期的起始时刻之前的历史投放特征;The sample data acquisition unit is configured to acquire sample data; the sample data includes initial state characteristic information of sample delivery information in each historical delivery cycle, and historical resources; the initial state characteristic information is used to characterize the The historical delivery characteristics of the sample delivery information before the start of each historical delivery cycle;
    第一训练单元,被配置为基于所述起始状态特征信息以及所述历史资源对预设条件变分自编码网络进行训练,得到目标条件变分自编码网络;The first training unit is configured to train a preset conditional variational autoencoder network based on the initial state feature information and the historical resources to obtain a target conditional variational autoencoder network;
    第三预测单元,被配置为将所述目标条件变分自编码网络对所述历史资源的编码信息,以及所述起始状态特征信息,输入到预设预测执行网络进行资源预测,得到与所述历史资源对应的预测资源;The third prediction unit is configured to input the encoding information of the historical resource by the target conditional variational self-encoding network and the initial state feature information into the preset prediction execution network to perform resource prediction, and obtain the same The predicted resources corresponding to the historical resources mentioned above;
    第二训练单元,被配置为基于所述历史资源、所述历史资源对应的预测资源,以及目标预测分析网络对所述预设预测执行网络进行训练,得到目标预测执行网络;所述目标预测执行网络预测得到的预测资源为使得待投放信息在投放周期的投放收益满足目标投放收益的资源;The second training unit is configured to train the preset forecasting execution network based on the historical resources, the forecasting resources corresponding to the historical resources, and the target forecasting analysis network to obtain a target forecasting execution network; the target forecasting execution network The forecast resources obtained by network forecasting are the resources that make the delivery income of the information to be delivered in the delivery cycle meet the target delivery income;
    资源预测模型确定单元,被配置为基于所述目标条件变分自编码网络以及所述目标预测执行网络,得到资源预测模型。The resource forecasting model determination unit is configured to obtain a resource forecasting model based on the target conditional variational autoencoder network and the target forecasting execution network.
  22. 根据权利要求21所述的装置,其中,所述第一训练单元包括:The apparatus of claim 21, wherein the first training unit comprises:
    信息输入单元,被配置为将所述起始状态特征信息以及所述历史资源输入到所述预设条件变分自编码网络,通过所述预设条件变分自编码网络对所述起始状态特征信息以及所述历史资源的数据分布信息进行拟合,得到概率分布信息,以及通过所述预设条件变分自编码网络对所述历史资源进行编码,得到与所述历史资源对应的编码信息;An information input unit configured to input the initial state feature information and the historical resources into the preset conditional variational self-encoding network, through which the initial state Fitting feature information and data distribution information of the historical resources to obtain probability distribution information, and encoding the historical resources through the preset conditional variational self-encoding network to obtain encoding information corresponding to the historical resources ;
    第三训练单元,被配置为基于所述概率分布信息、所述历史资源、以及与所述历 史资源对应的编码信息,对所述预设条件变分自编码网络进行训练,得到所述目标条件变分自编码网络。The third training unit is configured to train the preset conditional variational autoencoder network based on the probability distribution information, the historical resources, and the encoding information corresponding to the historical resources to obtain the target condition Variational Autoencoder Networks.
  23. 根据权利要求22所述的装置,其中,所述第三训练单元包括:The apparatus of claim 22, wherein the third training unit comprises:
    第一损失分量确定单元,被配置为根据所述概率分布信息和标准正态分布,得到第一损失分量;The first loss component determining unit is configured to obtain the first loss component according to the probability distribution information and the standard normal distribution;
    第二损失分量确定单元,被配置为根据所述历史资源、以及与所述历史资源对应的编码信息,得到第二损失分量;The second loss component determining unit is configured to obtain a second loss component according to the historical resource and the encoding information corresponding to the historical resource;
    第一损失函数确定单元,被配置为基于所述第一损失分量以及所述第二损失分量,得到第一损失函数;a first loss function determining unit configured to obtain a first loss function based on the first loss component and the second loss component;
    第一参数调整单元,被配置为基于所述第一损失函数对所述预设条件变分自编码网络进行网络参数调整,得到所述目标条件变分自编码网络。The first parameter adjustment unit is configured to perform network parameter adjustment on the preset conditional variational autoencoder network based on the first loss function to obtain the target conditional variational autoencoder network.
  24. 根据权利要求21所述的装置,其中,所述第二训练单元包括:The apparatus of claim 21, wherein the second training unit comprises:
    第一分析信息确定单元,被配置为将所述起始状态特征信息,以及所述历史资源对应的预测资源输入到所述目标预测分析网络,通过所述目标预测分析网络对基于所述起始状态特征信息分配所述预测资源的行为进行分析,得到第一分析信息;The first analysis information determination unit is configured to input the initial state feature information and the forecast resources corresponding to the historical resources into the target forecast analysis network, and use the target forecast analysis network to The state feature information allocates the behavior of the prediction resource for analysis, and obtains first analysis information;
    第二参数调整单元,被配置为基于所述第一分析信息对所述预设预测执行网络进行网络参数调整,得到所述目标预测执行网络。The second parameter adjustment unit is configured to perform network parameter adjustment on the preset prediction execution network based on the first analysis information to obtain the target prediction execution network.
  25. 根据权利要求21所述的装置,其中,所述样本数据还包括所述样本投放信息在每个历史投放周期内的历史投放收益、以及更新状态特征信息;所述更新状态特征信息基于所述起始状态特征信息和所述样本投放信息在所述历史投放周期内的投放结果信息得到;The device according to claim 21, wherein the sample data further includes the historical delivery revenue of the sample delivery information in each historical delivery cycle, and update status feature information; the update status feature information is based on the start The initial state feature information and the delivery result information of the sample delivery information in the historical delivery period are obtained;
    所述装置还包括:The device also includes:
    第二分析信息确定单元,被配置为将所述起始状态特征信息以及所述历史资源输入到预设预测分析网络,通过所述预设预测分析网络对基于所述起始状态特征信息分配所述历史资源进行分析,得到第二分析信息;The second analysis information determination unit is configured to input the initial state feature information and the historical resources into a preset predictive analysis network, and use the preset predictive analysis network to allocate the resources based on the initial state feature information. Analyze the above-mentioned historical resources to obtain the second analysis information;
    资源采样单元,被配置为基于所述更新状态特征信息以及所述目标条件变分自编码网络进行历史资源采样,得到预设数量的采样资源;The resource sampling unit is configured to perform historical resource sampling based on the update status feature information and the target condition variational self-encoding network to obtain a preset number of sampling resources;
    投放收益确定单元,被配置为基于所述更新状态特征信息,确定与所述采样资源对应的投放收益;The delivery revenue determination unit is configured to determine the delivery revenue corresponding to the sampling resource based on the update status feature information;
    目标采样资源确定单元,被配置为确定投放收益最大的采样资源为目标采样资源;The target sampling resource determination unit is configured to determine that the sampling resource with the largest delivery revenue is the target sampling resource;
    第三参数调整单元,被配置为基于所述第二分析信息、所述历史投放收益、以及所述目标采样资源对应的投放收益,对所述预设预测分析网络进行网络参数调整,得到目标预测分析网络。The third parameter adjustment unit is configured to adjust the network parameters of the preset predictive analysis network based on the second analysis information, the historical investment income, and the investment income corresponding to the target sampling resource to obtain a target prediction Analyze the network.
  26. 根据权利要求21所述的装置,还包括:The apparatus of claim 21, further comprising:
    第三获取单元,被配置为获取已投放信息在目标投放周期内的第一投放收益,以及所述已投放信息在所述目标投放周期后的预设时间段内的第二投放收益;所述目标投放周期为初始投放阶段中的最后一个投放周期;The third obtaining unit is configured to obtain the first delivery income of the delivered information within the target delivery period, and the second delivery income of the delivered information within a preset time period after the target delivery period; the The target delivery period is the last delivery period in the initial delivery phase;
    历史投放收益确定单元,被配置为基于所述第一投放收益以及所述第二投放收益,得到与所述目标投放周期对应的历史投放收益;A historical delivery revenue determining unit configured to obtain a historical delivery revenue corresponding to the target delivery cycle based on the first delivery revenue and the second delivery revenue;
    样本生成单元,被配置为基于与所述目标投放周期对应的历史投放收益,生成与所述已投放信息在所述目标投放周期的样本。The sample generating unit is configured to generate a sample corresponding to the delivered information in the target delivery period based on the historical delivery revenue corresponding to the target delivery period.
  27. 一种电子设备,包括:An electronic device comprising:
    处理器;processor;
    用于存储所述处理器可执行指令的存储器;memory for storing said processor-executable instructions;
    其中,所述处理器被配置为执行所述指令,以实现以下步骤:Wherein, the processor is configured to execute the instructions to achieve the following steps:
    确定目标投放信息在当前投放周期的起始状态特征信息;所述目标投放信息在所述当前投放周期的起始状态特征信息基于所述目标投放信息在上一投放周期的起始状态特征信息,和所述目标投放信息在所述上一投放周期的投放结果信息得到;所述起始状态特征信息包括所述目标投放信息在所述当前投放周期之前的历史投放结果信息,以及所述目标投放信息的属性信息;determining the initial state feature information of the target delivery information in the current delivery cycle; the initial state feature information of the target delivery information in the current delivery cycle is based on the initial state feature information of the target delivery information in the previous delivery cycle, and the delivery result information of the target delivery information in the last delivery cycle; the initial state characteristic information includes the historical delivery result information of the target delivery information before the current delivery cycle, and the target delivery information attribute information of the information;
    获取资源预测模型;所述资源预测模型包括条件变分自编码网络和预测执行网络;Obtain a resource prediction model; the resource prediction model includes a conditional variational autoencoder network and a prediction execution network;
    将所述目标投放信息在所述当前投放周期的起始状态特征信息,输入到所述条件变分自编码网络进行资源预测,得到第一资源;inputting the initial state feature information of the target delivery information in the current delivery period into the conditional variational self-encoding network for resource prediction to obtain the first resource;
    将所述目标投放信息在所述当前投放周期的起始状态特征信息,以及所述第一资源输入到所述预测执行网络进行资源预测,得到第二资源;Inputting the initial state feature information of the target delivery information in the current delivery cycle and the first resource into the prediction execution network for resource prediction to obtain a second resource;
    基于所述第一资源和所述第二资源得到所述目标投放信息对应的目标资源;所述目标资源为使得所述目标投放信息在所述当前投放周期的投放收益满足目标投放收益的预测资源;The target resource corresponding to the target delivery information is obtained based on the first resource and the second resource; the target resource is a predicted resource that makes the delivery income of the target delivery information in the current delivery cycle meet the target delivery income ;
    或者实现以下步骤:Or implement the following steps:
    获取样本数据;所述样本数据包括样本投放信息在每个历史投放周期内的起始状态特征信息,以及历史资源;所述起始状态特征信息用于表征所述样本投放信息在每个历史投放周期的起始时刻之前的历史投放特征;Obtaining sample data; the sample data includes initial state characteristic information of sample delivery information in each historical delivery cycle, and historical resources; the initial state characteristic information is used to characterize the sample delivery information in each historical delivery cycle Historical distribution characteristics before the beginning of the period;
    基于所述起始状态特征信息以及所述历史资源对预设条件变分自编码网络进行训练,得到目标条件变分自编码网络;Based on the initial state feature information and the historical resources, the preset condition variational autoencoder network is trained to obtain the target conditional variational autoencoder network;
    将所述目标条件变分自编码网络对所述历史资源的编码信息,以及所述起始状态特征信息,输入到预设预测执行网络进行资源预测,得到与所述历史资源对应的预测资源;Inputting the coding information of the historical resource by the target condition variational self-encoding network and the initial state feature information into the preset prediction execution network to perform resource prediction, and obtain the predicted resource corresponding to the historical resource;
    基于所述历史资源、所述历史资源对应的预测资源,以及目标预测分析网络对所述预设预测执行网络进行训练,得到目标预测执行网络;所述目标预测执行网络预测得到的预测资源为使得待投放信息在投放周期的投放收益满足目标投放收益的资源;Based on the historical resources, the forecast resources corresponding to the historical resources, and the target forecast analysis network, the preset forecast execution network is trained to obtain the target forecast execution network; the forecast resources obtained by the target forecast execution network are as follows: Resources for which the delivery income of the information to be delivered in the delivery cycle meets the target delivery income;
    基于所述目标条件变分自编码网络以及所述目标预测执行网络,得到资源预测模型。A resource prediction model is obtained based on the target conditional variational self-encoding network and the target prediction execution network.
  28. 一种计算机可读存储介质,其特征在于,当所述计算机可读存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行以下步骤:A computer-readable storage medium, characterized in that when the instructions in the computer-readable storage medium are executed by a processor of the electronic device, the electronic device can perform the following steps:
    确定目标投放信息在当前投放周期的起始状态特征信息;所述目标投放信息在所述当前投放周期的起始状态特征信息基于所述目标投放信息在上一投放周期的起始状态特征信息,和所述目标投放信息在所述上一投放周期的投放结果信息得到;所述起始状态特征信息包括所述目标投放信息在所述当前投放周期之前的历史投放结果信息,以及所述目标投放信息的属性信息;determining the initial state feature information of the target delivery information in the current delivery cycle; the initial state feature information of the target delivery information in the current delivery cycle is based on the initial state feature information of the target delivery information in the previous delivery cycle, and the delivery result information of the target delivery information in the last delivery cycle; the initial state characteristic information includes the historical delivery result information of the target delivery information before the current delivery cycle, and the target delivery information attribute information of the information;
    获取资源预测模型;所述资源预测模型包括条件变分自编码网络和预测执行网络;Obtain a resource prediction model; the resource prediction model includes a conditional variational autoencoder network and a prediction execution network;
    将所述目标投放信息在所述当前投放周期的起始状态特征信息,输入到所述条件变分自编码网络进行资源预测,得到第一资源;inputting the initial state feature information of the target delivery information in the current delivery period into the conditional variational self-encoding network for resource prediction to obtain the first resource;
    将所述目标投放信息在所述当前投放周期的起始状态特征信息,以及所述第一资源输入到所述预测执行网络进行资源预测,得到第二资源;Inputting the initial state feature information of the target delivery information in the current delivery cycle and the first resource into the prediction execution network for resource prediction to obtain a second resource;
    基于所述第一资源和所述第二资源得到所述目标投放信息对应的目标资源;所述目标资源为使得所述目标投放信息在所述当前投放周期的投放收益满足目标投放收益的预测资源;The target resource corresponding to the target delivery information is obtained based on the first resource and the second resource; the target resource is a predicted resource that makes the delivery income of the target delivery information in the current delivery cycle meet the target delivery income ;
    或者执行以下步骤:Or perform the following steps:
    获取样本数据;所述样本数据包括样本投放信息在每个历史投放周期内的起始状态特征信息,以及历史资源;所述起始状态特征信息用于表征所述样本投放信息在每 个历史投放周期的起始时刻之前的历史投放特征;Obtaining sample data; the sample data includes initial state characteristic information of sample delivery information in each historical delivery cycle, and historical resources; the initial state characteristic information is used to characterize the sample delivery information in each historical delivery cycle Historical distribution characteristics before the beginning of the period;
    基于所述起始状态特征信息以及所述历史资源对预设条件变分自编码网络进行训练,得到目标条件变分自编码网络;Based on the initial state feature information and the historical resources, the preset condition variational autoencoder network is trained to obtain the target conditional variational autoencoder network;
    将所述目标条件变分自编码网络对所述历史资源的编码信息,以及所述起始状态特征信息,输入到预设预测执行网络进行资源预测,得到与所述历史资源对应的预测资源;Inputting the coding information of the historical resource by the target condition variational self-encoding network and the initial state feature information into the preset prediction execution network to perform resource prediction, and obtain the predicted resource corresponding to the historical resource;
    基于所述历史资源、所述历史资源对应的预测资源,以及目标预测分析网络对所述预设预测执行网络进行训练,得到目标预测执行网络;所述目标预测执行网络预测得到的预测资源为使得待投放信息在投放周期的投放收益满足目标投放收益的资源;Based on the historical resources, the forecast resources corresponding to the historical resources, and the target forecast analysis network, the preset forecast execution network is trained to obtain the target forecast execution network; the forecast resources obtained by the target forecast execution network are as follows: Resources for which the delivery income of the information to be delivered in the delivery cycle meets the target delivery income;
    基于所述目标条件变分自编码网络以及所述目标预测执行网络,得到资源预测模型。A resource prediction model is obtained based on the target conditional variational self-encoding network and the target prediction execution network.
  29. 一种计算机程序产品,所述计算机程序产品包括计算机程序,所述计算机程序存储在可读存储介质中,计算机设备的至少一个处理器从所述可读存储介质读取并执行所述计算机程序,使得设备执行以下步骤:A computer program product, the computer program product comprising a computer program stored in a readable storage medium, at least one processor of a computer device reading and executing the computer program from the readable storage medium, Causes the device to perform the following steps:
    确定目标投放信息在当前投放周期的起始状态特征信息;所述目标投放信息在所述当前投放周期的起始状态特征信息基于所述目标投放信息在上一投放周期的起始状态特征信息,和所述目标投放信息在所述上一投放周期的投放结果信息得到;所述起始状态特征信息包括所述目标投放信息在所述当前投放周期之前的历史投放结果信息,以及所述目标投放信息的属性信息;determining the initial state feature information of the target delivery information in the current delivery cycle; the initial state feature information of the target delivery information in the current delivery cycle is based on the initial state feature information of the target delivery information in the previous delivery cycle, and the delivery result information of the target delivery information in the last delivery cycle; the initial state characteristic information includes the historical delivery result information of the target delivery information before the current delivery cycle, and the target delivery information attribute information of the information;
    获取资源预测模型;所述资源预测模型包括条件变分自编码网络和预测执行网络;Obtain a resource prediction model; the resource prediction model includes a conditional variational autoencoder network and a prediction execution network;
    将所述目标投放信息在所述当前投放周期的起始状态特征信息,输入到所述条件变分自编码网络进行资源预测,得到第一资源;inputting the initial state feature information of the target delivery information in the current delivery period into the conditional variational self-encoding network for resource prediction to obtain the first resource;
    将所述目标投放信息在所述当前投放周期的起始状态特征信息,以及所述第一资源输入到所述预测执行网络进行资源预测,得到第二资源;Inputting the initial state feature information of the target delivery information in the current delivery cycle and the first resource into the prediction execution network for resource prediction to obtain a second resource;
    基于所述第一资源和所述第二资源得到所述目标投放信息对应的目标资源;所述目标资源为使得所述目标投放信息在所述当前投放周期的投放收益满足目标投放收益的预测资源;The target resource corresponding to the target delivery information is obtained based on the first resource and the second resource; the target resource is a predicted resource that makes the delivery income of the target delivery information in the current delivery cycle meet the target delivery income ;
    或者执行以下步骤:Or perform the following steps:
    获取样本数据;所述样本数据包括样本投放信息在每个历史投放周期内的起始状态特征信息,以及历史资源;所述起始状态特征信息用于表征所述样本投放信息在每个历史投放周期的起始时刻之前的历史投放特征;Obtaining sample data; the sample data includes initial state characteristic information of sample delivery information in each historical delivery cycle, and historical resources; the initial state characteristic information is used to characterize the sample delivery information in each historical delivery cycle Historical distribution characteristics before the beginning of the period;
    基于所述起始状态特征信息以及所述历史资源对预设条件变分自编码网络进行训练,得到目标条件变分自编码网络;Based on the initial state feature information and the historical resources, the preset condition variational autoencoder network is trained to obtain the target conditional variational autoencoder network;
    将所述目标条件变分自编码网络对所述历史资源的编码信息,以及所述起始状态特征信息,输入到预设预测执行网络进行资源预测,得到与所述历史资源对应的预测资源;Inputting the coding information of the historical resource by the target condition variational self-encoding network and the initial state feature information into the preset prediction execution network to perform resource prediction, and obtain the predicted resource corresponding to the historical resource;
    基于所述历史资源、所述历史资源对应的预测资源,以及目标预测分析网络对所述预设预测执行网络进行训练,得到目标预测执行网络;所述目标预测执行网络预测得到的预测资源为使得待投放信息在投放周期的投放收益满足目标投放收益的资源;Based on the historical resources, the forecast resources corresponding to the historical resources, and the target forecast analysis network, the preset forecast execution network is trained to obtain the target forecast execution network; the forecast resources obtained by the target forecast execution network are as follows: Resources for which the delivery income of the information to be delivered in the delivery cycle meets the target delivery income;
    基于所述目标条件变分自编码网络以及所述目标预测执行网络,得到资源预测模型。A resource prediction model is obtained based on the target conditional variational self-encoding network and the target prediction execution network.
PCT/CN2022/096373 2021-12-15 2022-05-31 Delivery information processing method, and resource prediction model training method and apparatus WO2023109025A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111529876.1A CN113918826B (en) 2021-12-15 2021-12-15 Processing method of release information, and training method and device of resource prediction model
CN202111529876.1 2021-12-15

Publications (1)

Publication Number Publication Date
WO2023109025A1 true WO2023109025A1 (en) 2023-06-22

Family

ID=79248943

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/096373 WO2023109025A1 (en) 2021-12-15 2022-05-31 Delivery information processing method, and resource prediction model training method and apparatus

Country Status (2)

Country Link
CN (1) CN113918826B (en)
WO (1) WO2023109025A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113918826B (en) * 2021-12-15 2022-03-25 北京达佳互联信息技术有限公司 Processing method of release information, and training method and device of resource prediction model
CN114786031B (en) * 2022-06-17 2022-10-14 北京达佳互联信息技术有限公司 Resource delivery method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110414710A (en) * 2019-06-20 2019-11-05 平安科技(深圳)有限公司 Trend forecasting method and device based on artificial intelligence
CN112232854A (en) * 2020-09-25 2021-01-15 北京三快在线科技有限公司 Service processing method, device, equipment and storage medium
US20210027379A1 (en) * 2019-07-26 2021-01-28 International Business Machines Corporation Generative network based probabilistic portfolio management
CN112580889A (en) * 2020-12-25 2021-03-30 北京嘀嘀无限科技发展有限公司 Service resource pre-estimation method and device, electronic equipment and storage medium
CN113627979A (en) * 2021-07-30 2021-11-09 北京达佳互联信息技术有限公司 Resource delivery data processing method, device, server, system and medium
CN113918826A (en) * 2021-12-15 2022-01-11 北京达佳互联信息技术有限公司 Processing method of release information, and training method and device of resource prediction model

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11574148B2 (en) * 2018-11-05 2023-02-07 Royal Bank Of Canada System and method for deep reinforcement learning
US11127032B2 (en) * 2018-11-19 2021-09-21 Eventbrite, Inc. Optimizing and predicting campaign attributes
CN112055235B (en) * 2020-08-25 2022-03-25 北京达佳互联信息技术有限公司 Method and device for pushing display object, electronic equipment and storage medium
CN113570395A (en) * 2021-01-22 2021-10-29 腾讯科技(深圳)有限公司 Information processing method and device, computer readable medium and electronic equipment
CN113095885B (en) * 2021-04-22 2024-04-12 加和(北京)信息科技有限公司 Information delivery data processing method and device
CN113344650B (en) * 2021-08-05 2021-12-07 北京达佳互联信息技术有限公司 Method and device for determining quantity of resources, computer equipment and medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110414710A (en) * 2019-06-20 2019-11-05 平安科技(深圳)有限公司 Trend forecasting method and device based on artificial intelligence
US20210027379A1 (en) * 2019-07-26 2021-01-28 International Business Machines Corporation Generative network based probabilistic portfolio management
CN112232854A (en) * 2020-09-25 2021-01-15 北京三快在线科技有限公司 Service processing method, device, equipment and storage medium
CN112580889A (en) * 2020-12-25 2021-03-30 北京嘀嘀无限科技发展有限公司 Service resource pre-estimation method and device, electronic equipment and storage medium
CN113627979A (en) * 2021-07-30 2021-11-09 北京达佳互联信息技术有限公司 Resource delivery data processing method, device, server, system and medium
CN113918826A (en) * 2021-12-15 2022-01-11 北京达佳互联信息技术有限公司 Processing method of release information, and training method and device of resource prediction model

Also Published As

Publication number Publication date
CN113918826A (en) 2022-01-11
CN113918826B (en) 2022-03-25

Similar Documents

Publication Publication Date Title
WO2023109025A1 (en) Delivery information processing method, and resource prediction model training method and apparatus
WO2021135562A1 (en) Feature validity evaluation method and apparatus, and electronic device and storage medium
US11288709B2 (en) Training and utilizing multi-phase learning models to provide digital content to client devices in a real-time digital bidding environment
CN108182633B (en) Loan data processing method, loan data processing device, loan data processing program, and computer device and storage medium
JP2022033695A (en) Method, device for generating model, electronic apparatus, storage medium and computer program product
CN112905897B (en) Similar user determination method, vector conversion model, device, medium and equipment
CN109389424B (en) Flow distribution method and device, electronic equipment and storage medium
CN110796513A (en) Multitask learning method and device, electronic equipment and storage medium
CN111124676A (en) Resource allocation method and device, readable storage medium and electronic equipment
CN112015990A (en) Method and device for determining network resources to be recommended, computer equipment and medium
CN113919923B (en) Live broadcast recommendation model training method, live broadcast recommendation method and related equipment
CN113344647B (en) Information recommendation method and device
Jin et al. An intelligent scheduling algorithm for resource management of cloud platform
CN113015010A (en) Push parameter determination method, device, equipment and computer readable storage medium
WO2020211616A1 (en) Method and device for processing user interaction information
CN113836388A (en) Information recommendation method and device, server and storage medium
CN115796937A (en) Big data complex relevance electric power supply and demand trend analysis method and device
CN115185606A (en) Method, device, equipment and storage medium for obtaining service configuration parameters
WO2021240715A1 (en) Mood prediction method, mood prediction device, and program
CN114529008A (en) Information recommendation method, object identification method and device
CN113760550A (en) Resource allocation method and resource allocation device
CN113010774A (en) Click rate prediction method based on dynamic deep attention model
CN114764469A (en) Content recommendation method and device, computer equipment and storage medium
CN111860870A (en) Training method, device, equipment and medium for interactive behavior determination model
CN113365095B (en) Live broadcast resource recommendation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22905791

Country of ref document: EP

Kind code of ref document: A1