JP7319443B1

JP7319443B1 - Information processing system, computer program, and information processing method

Info

Publication number: JP7319443B1
Application number: JP2022174867A
Authority: JP
Inventors: 一輝柴田; 雄介熊谷; 龍道本
Original assignee: Hakuhodo DY Holdings Inc
Current assignee: Hakuhodo DY Holdings Inc
Priority date: 2022-10-31
Filing date: 2022-10-31
Publication date: 2023-08-01
Anticipated expiration: 2042-10-31

Abstract

【課題】効果の高い広告出稿が可能な技術を提供する。【解決手段】第一の出稿量、及び、第二の出稿量が決定され、第一の出稿量に基づいた第一の媒体に対する広告出稿、及び、第二の出稿量に基づいた第二の媒体に対する広告出稿が指示される（Ｓ２５０）。予約型広告の露出に関する実績である第一の実績が判別される（Ｓ２６０）。非予約型広告の露出に関する実績である第二の実績が判別される（Ｓ２７０）。露出待ちにある未露出広告の露出スケジュールが判別される（Ｓ２９０）。複数の時点に関し、時点毎に、可能な残り出稿量と、未露出広告の露出スケジュールと、第一の実績と、第二の実績とに基づき、第一の出稿量及び第二の出稿量が決定される（Ｓ２４０）。【選択図】図４Kind Code: A1 To provide a technology capable of placing highly effective advertisements. A first advertisement amount and a second advertisement amount are determined, an advertisement is placed on a first medium based on the first advertisement amount, and a second advertisement is placed based on the second advertisement amount. Advertisement placement on the medium is instructed (S250). A first track record, which is a track record for exposure of reserved advertisements, is determined (S260). A second track record, which is a track record for exposure of non-reserved advertisements, is determined (S270). An exposure schedule for non-exposure advertisements waiting to be exposed is determined (S290). With respect to a plurality of time points, the first ad placement amount and the second ad placement amount are based on the possible remaining ad placement amount, the exposure schedule of the unexposed advertisement, the first performance, and the second performance for each time point. determined (S240). [Selection drawing] Fig. 4

Description

本開示は、情報処理システム及び方法に関する。 The present disclosure relates to information processing systems and methods.

従来、放送とネットワークにおける広告を融合させる技術、特には広告のタイミングを自動的に同期させると共に、その表示態様を最適化し、高い広告効果を得るための技術が知られている（例えば特許文献１参照）。 Conventionally, there is known a technique for integrating broadcasting and network advertisements, in particular, a technique for automatically synchronizing the timing of advertisements, optimizing their display mode, and obtaining high advertising effectiveness (for example, Patent Document 1). reference).

国際公開第２００８／０８１５９６号WO2008/081596

効果的な広告を実現するためには、選択可能な媒体に対する広告予算の配分もまた重要である。広告の例には、予約型広告及び運用型広告が含まれる。
予約型広告は、配信に事前の予約が必要なタイプの広告であり、予約時に定められたスケジュールに従って配信される。ここでいう配信は、デジタル配信だけではなく、放送を含む。予約型広告の例には、テレビジョン放送を通じた広告であるテレビコマーシャルが含まれる。 The allocation of advertising budgets to selectable media is also important to achieve effective advertising. Examples of advertising include scheduled advertising and programmatic advertising.
A reserved advertisement is a type of advertisement that requires prior reservation for distribution, and is distributed according to a schedule determined at the time of reservation. The distribution here includes not only digital distribution but also broadcasting. Examples of scheduled advertising include television commercials, which are advertisements through television broadcasts.

運用型広告は、予約型の広告とは異なり、配信条件をリアルタイムで変更できるタイプの広告である。運用型広告の例には、通信ネットワーク、特にはインターネットを通じた広告が含まれる。運用型広告は、事前予約なしに広告主から指定されたタイミングで配信され得る。 Programmatic advertising is a type of advertising whose distribution conditions can be changed in real time, unlike reserved advertising. Examples of programmatic advertising include advertising through communication networks, particularly the Internet. Programmatic advertisements can be distributed at timings specified by advertisers without prior reservations.

予約型広告では、広告効果の高い広告枠が存在する一方で、事前に出稿を決めておく必要がある。一方、運用型広告では、リアルタイム配信が可能な一方で、広告効果が予約型広告よりも低い場合がある。このように予約型広告及び運用型広告には、メリット及びデメリットが存在する。 In reserved advertising, while there are highly effective advertising slots, it is necessary to decide in advance where to place the advertisement. On the other hand, programmatic advertising can be delivered in real time, but the advertising effect may be lower than reserved advertising. As described above, there are merits and demerits in reserved advertisements and programmatic advertisements.

そこで、本開示の一側面によれば、可能な出稿量の総量が定められた条件下で、複数回に亘る予約型広告及び非予約型広告の出稿量の決定を通じて、広告効果の高い配分で予約型広告及び非予約型広告を出稿可能な技術を提供できることが望ましい。 Therefore, according to one aspect of the present disclosure, under conditions where the total amount of possible advertisements is determined, through determination of the amount of advertisements for reserved advertisements and non-reserved advertisements over multiple times, allocation with high advertisement effect It is desirable to be able to provide technology capable of placing both reserved and non-reserved advertisements.

本開示の一側面によれば、情報処理システムが提供される。情報処理システムは、出稿指示部と、第一実績判別部と、第二実績判別部と、スケジュール判別部と、を備える。出稿指示部は、第一の出稿量、及び、第二の出稿量を決定し、第一の出稿量に基づいた第一の媒体に対する広告出稿、及び、第二の出稿量に基づいた第二の媒体に対する広告出稿を指示するように構成される。第一の出稿量は、第一の媒体を通じて露出される予約型広告に対する出稿量である。第二の出稿量は、第二の媒体を通じて露出される非予約型広告に対する出稿量である。 According to one aspect of the present disclosure, an information processing system is provided. The information processing system includes an advertisement placement instruction section, a first performance determination section, a second performance determination section, and a schedule determination section. The advertisement placement instruction unit determines a first placement amount and a second placement amount, and places an advertisement on the first medium based on the first placement amount and a second advertisement placement based on the second placement amount. It is configured to instruct advertisement placement on the medium of The first placement amount is the placement amount for reserved advertisements exposed through the first medium. The second placement amount is the placement amount for non-reserved advertisements exposed through the second medium.

第一実績判別部は、第一の媒体を通じて露出された予約型広告の露出に関する実績である第一の実績を判別するように構成される。第二実績判別部は、第二の媒体を通じて露出された非予約型広告の露出に関する実績である第二の実績を判別するように構成される。スケジュール判別部は、予約型広告のうち、露出待ちにある未露出広告の露出スケジュールを判別するように構成される。 The first performance determination unit is configured to determine a first performance that is a performance regarding exposure of the reserved advertisement exposed through the first medium. The second performance determination unit is configured to determine a second performance, which is a performance regarding exposure of the non-reserved advertisement exposed through the second medium. The schedule discriminating unit is configured to discriminate an exposure schedule of non-exposure advertisements waiting for exposure among reserved advertisements.

本開示の一側面によれば、予約型広告及び非予約型広告に対して可能な出稿量の総量が、予め定められる。出稿指示部は、複数の時点に関し、時点毎に、可能な残り出稿量と、未露出広告の露出スケジュールと、第一の実績と、第二の実績とに基づき、第一の出稿量及び第二の出稿量を決定するように構成される。 According to one aspect of the present disclosure, the total amount of possible placements for reserved advertisements and non-reserved advertisements is determined in advance. The placement instruction unit determines the first placement amount and the second placement amount based on the possible remaining placement amount, the exposure schedule of the unexposed advertisement, the first performance, and the second performance for each of the plurality of time points. It is configured to determine the amount of two advertisements.

予約型広告及び非予約型広告の露出に関する実績に基づいて、更には、未露出広告の露出スケジュールに基づいて、予約型広告及び非予約型広告の出稿量を決定するシステムを用いれば、可能な出稿量の総量が定められた条件下で、複数回に亘る予約型広告及び非予約型広告の出稿量の決定を通じて、広告効果の高い配分で予約型広告及び非予約型広告を出稿可能である。 It is possible to use a system that determines the amount of reserved and non-reserved advertisements to be displayed based on the exposure performance of reserved and non-reserved advertisements and also based on the exposure schedule of non-reserved advertisements. Under the condition that the total amount of advertisement is determined, it is possible to place reserved advertisement and non-reserved advertisement with high advertisement effect allocation by determining the advertisement amount of reserved advertisement and non-reserved advertisement over multiple times. .

本開示の一側面によれば、可能な出稿量の総量のうち、第一の量が、予約型広告に対する出稿量として予め定められ得る。総量のうち、第二の量が、予約型広告及び非予約型広告に共用の出稿量として定められ得る。出稿指示部には、第二の量の予約型広告及び非予約型広告に対する配分の決定権が与えられる。 According to one aspect of the present disclosure, the first amount of the total possible amount of advertisements can be predetermined as the amount of advertisements for reservation-type advertisements. A second amount of the total amount may be defined as a shared placement amount for scheduled and non-reserved advertisements. The placement instruction unit is given the power to determine the allocation of the second amount of reserved advertisements and non-reserved advertisements.

本開示の一側面によれば、第一の量の少なくとも一部に対応する予約型広告の広告出稿が、複数の時点よりも前の時点である最初の時点で完了する場合があり得る。この場合、上述の露出スケジュールは、最初の時点で出稿済の予約型広告であって、露出待ちにある未露出広告の露出スケジュールを含み得る。 According to one aspect of the present disclosure, there may be cases where the placement of reserved advertisements corresponding to at least a portion of the first amount is completed at the first point in time that is earlier than the plurality of points in time. In this case, the above-mentioned exposure schedule may include an exposure schedule of non-exposure advertisements that are scheduled advertisements that have been published at the beginning and are waiting for exposure.

出稿指示部は、最初の時点で出稿済の予約型広告を含む未露出広告の露出スケジュールを加味して、時点毎に、第一の出稿量及び第二の出稿量を決定し得る。 The placement instruction unit can determine the first placement amount and the second placement amount for each time point, taking into account the exposure schedule of the unexposed advertisements including the reserved advertisements that have already been placed at the first time point.

この情報処理システムを用いれば、ユーザは、予約型広告に対する最低出稿量を理解し、予約型広告の初期出稿を済ませた状態から、情報処理システムに広告効果の高い予約型広告及び非予約型広告の出稿量を計算させることができる。すなわち、情報処理システムは、予約型広告に対する最低出稿量を制御しつつ、複数回に亘る予約型広告及び非予約型広告の出稿量の決定を通じて、広告効果の高い予約型広告及び非予約型広告の出稿を指示可能である。 By using this information processing system, the user can understand the minimum amount of advertisements for reserved advertisements, and from the state where the initial advertisement of reserved advertisements has been completed, the information processing system can display highly effective reserved advertisements and non-reserved advertisements. can be calculated. That is, the information processing system controls the minimum amount of advertisements for reservation type advertisements, and determines the amount of advertisements for reservation type advertisements and non-reservation type advertisements over multiple times. It is possible to instruct the publication of

本開示の一側面によれば、第一の出稿量は、予約型広告の出稿金額であり得る。第二の出稿量は、非予約型広告の出稿金額であり得る。可能な出稿量の総量は、予約型広告及び非予約型広告を含む広告の出稿予算であり得る。従って、本開示の一側面に係る情報処理システムを用いれば、ユーザは、限られた予算の中で、広告効果の高い予約型広告及び非予約型広告の出稿を実現可能である。 According to one aspect of the present disclosure, the first placement amount may be the placement amount of the reserved advertisement. The second placement amount may be the placement amount of the non-reserved advertisement. The total possible placement amount may be an advertisement placement budget including reserved advertisements and non-reserved advertisements. Therefore, by using the information processing system according to one aspect of the present disclosure, the user can realize posting of highly effective reserved advertisements and non-reserved advertisements within a limited budget.

本開示の一側面によれば、出稿指示部は、動的最適化アルゴリズムに従って、時点毎に、第一の出稿量及び第二の出稿量を決定し得る。本開示の一側面によれば、動的最適化アルゴリズムは、強化学習、コンテキスチュアルバンデットアルゴリズム、及びカルマンフィルタの少なくとも一つを含み得る。 According to one aspect of the present disclosure, the ad placement instruction unit can determine the first ad placement amount and the second ad placement amount for each time point according to the dynamic optimization algorithm. According to one aspect of the present disclosure, dynamic optimization algorithms may include at least one of reinforcement learning, contextual bandit algorithms, and Kalman filters.

本開示の一側面によれば、出稿指示部は、強化学習又はコンテキスチュアルバンデットアルゴリズムに従って、時点毎に、第一の出稿量及び第二の出稿量を決定し得る。こうしたアルゴリズムを通じた出稿量の決定によれば、ユーザは、広告効果の高い予約型広告及び非予約型広告の出稿を実現することが可能である。 According to one aspect of the present disclosure, the placement instruction unit can determine the first placement amount and the second placement amount for each time point according to reinforcement learning or a contextual bandit algorithm. Determining the placement amount through such an algorithm enables users to place highly effective reserved ads and non-reserved ads.

本開示の一側面によれば、出稿指示部は、時点毎に、対応する時点における状態又はコンテキストに基づき、対応する時点における広告出稿に関する行動として、第一の出稿量及び第二の出稿量を決定し得る。状態又はコンテキストは、可能な残り出稿量と、未露出広告の露出スケジュールと、第一の実績と、第二の実績と、を用いて定義され得る。 According to one aspect of the present disclosure, for each time point, the placement instruction unit sets the first placement amount and the second placement amount as actions related to advertising placement at the corresponding time point based on the state or context at the corresponding time point. can decide. A state or context may be defined with a possible remaining placement volume, an unexposed ad exposure schedule, a first performance, and a second performance.

出稿指示部は、時点毎に、第一の実績及び第二の実績に基づき、対応する時点までの広告出稿により新たに露出された予約型広告及び非予約型広告の広告効果を、行動に対する報酬として決定し得る。 Based on the first performance and the second performance for each time point, the advertisement placement instruction unit calculates the advertisement effect of the reserved advertisement and the non-reserved advertisement newly exposed by the advertisement placement up to the corresponding time point as a reward for the action. can be determined as

出稿指示部は、時点毎に、対応する時点での広告出稿と、対応する時点での第一の実績及び第二の実績と、を加味して、状態又はコンテキストを更新し得る。出稿指示部は、時点毎に、報酬に基づいて、行動の選択に関するポリシーを更新し得る。 The advertisement placement instruction unit may update the state or context in consideration of the advertisement placement at the corresponding time and the first performance and the second performance at the corresponding time for each time. The placement instruction unit can update the policy regarding action selection based on the reward at each point in time.

本開示の一側面によれば、出稿指示部は、状態、報酬、及び行動が定義された強化学習により、複数の時点に関して、時点毎に、対応する時点での第一の出稿量及び第二の出稿量を行動として決定するように構成され得る。 According to one aspect of the present disclosure, the ad placement instruction unit uses reinforcement learning in which states, rewards, and actions are defined, for each time point, the first ad placement amount and the second is determined as behavior.

本開示の一側面によれば、状態は、対応する時点での可能な残り出稿量、第一の実績、第二の実績、及び、未露出広告のスケジュールを用いて定義され得る。報酬は、第一の実績及び第二の実績から判別される対応する時点での広告効果を用いて定義され得る。 According to one aspect of the present disclosure, a state may be defined with a possible remaining placement volume, a first performance, a second performance, and a schedule of unexposed ads at the corresponding time. A reward may be defined using the advertising effectiveness at the corresponding time determined from the first performance and the second performance.

本開示の一側面によれば、出稿指示部は、コンテキスト、報酬、及び行動が定義されたコンテキスチュアルバンデットアルゴリズムにより、複数の時点に関して、時点毎に、対応する時点での第一の出稿量及び第二の出稿量を行動として決定するように構成され得る。 According to one aspect of the present disclosure, the placement instruction unit uses a contextual bandit algorithm in which context, reward, and behavior are defined, for each time point, the first placement amount and It can be configured to determine the second ad placement amount as an action.

コンテキストは、対応する時点での可能な残り出稿量、第一の実績、第二の実績、及び、未露出広告のスケジュールを用いて定義され得る。報酬は、第一の実績及び第二の実績から判別される対応する時点での広告効果を用いて定義され得る。 A context may be defined with the remaining placement volume, first performance, second performance, and unexposed ad schedule at the corresponding time. A reward may be defined using the advertising effectiveness at the corresponding time determined from the first performance and the second performance.

本開示の一側面によれば、出稿指示部は、例えばカルマンフィルタ等の状態空間モデルを用いて、時点毎に、対応する時点での第一の出稿量及び第二の出稿量を決定するように構成され得る。 According to one aspect of the present disclosure, the ad placement instruction unit uses, for example, a state space model such as a Kalman filter to determine, for each time point, the first ad placement amount and the second ad placement amount at the corresponding time point. can be configured.

状態空間モデルは、状態量と、観測量と、入力量との間の関係を定義するモデルであり得る。状態空間モデルは、未露出広告の露出スケジュールの情報を含むモデルであり得る。入力量は、第一の出稿量と第二の出稿量とを用いて定義される広告出稿に関する量であり得る。 A state-space model can be a model that defines the relationship between state quantities, observable quantities, and input quantities. The state-space model can be a model that contains information about the exposure schedule of unexposed advertisements. The input amount may be an amount related to advertisement placement defined using a first placement amount and a second placement amount.

状態量は、広告出稿により変化する状態量であり得る。状態量は、可能な残り出稿量と、第一の実績と、第二の実績と、を用いて定義され得る。観測量は、第一の実績及び第二の実績に基づいて判別される出稿された予約型広告及び非予約型広告の広告効果を定義する量であり得る。 The state quantity may be a state quantity that changes according to the placement of advertisements. The state quantity can be defined using the possible remaining ad placement amount, the first performance, and the second performance. The observable quantity may be a quantity that defines the advertising effectiveness of the posted scheduled and non-reserved advertisements determined based on the first performance and the second performance.

本開示の一側面によれば、上述の情報処理システムにおける出稿指示部と、第一実績判別部と、第二実績判別部と、スケジュール判別部としての機能をコンピュータに実現させるためのコンピュータプログラムが提供されてもよい。コンピュータプログラムは、コンピュータ読取可能な記録媒体に記録されて、提供されてもよい。 According to one aspect of the present disclosure, a computer program for causing a computer to implement functions as an advertisement placement instruction unit, a first performance determination unit, a second performance determination unit, and a schedule determination unit in the information processing system described above. may be provided. The computer program may be recorded on a computer-readable recording medium and provided.

本開示の一側面によれば、次の情報処理方法が提供されてもよい。情報処理方法は、第一の出稿量、及び、第二の出稿量を決定し、第一の出稿量に基づいた第一の媒体に対する広告出稿、及び、第二の出稿量に基づいた第二の媒体に対する広告出稿を指示することを含み得る。第一の出稿量は、第一の媒体を通じて露出される予約型広告に対する出稿量であり得る。第二の出稿量は、第二の媒体を通じて露出される非予約型広告に対する出稿量であり得る。 According to one aspect of the present disclosure, the following information processing method may be provided. The information processing method determines a first ad placement amount and a second ad placement amount, places an advertisement on the first medium based on the first ad placement amount, and determines a second ad placement amount based on the second ad placement amount. may include directing the placement of advertisements on the media of The first ad placement amount may be an ad placement amount for reserved advertisements exposed through the first medium. The second placement amount may be the placement amount for non-reserved advertisements exposed through the second medium.

情報処理方法は、第一の媒体を通じて露出された予約型広告の露出に関する実績である第一の実績を判別することを含み得る。情報処理方法は、第二の媒体を通じて露出された非予約型広告の露出に関する実績である第二の実績を判別することを含み得る。 The information processing method may include determining a first performance that is a performance relating to exposure of the scheduled advertisements exposed through the first medium. The information processing method may include determining a second performance that is a performance relating to the exposure of non-reserved advertisements exposed through the second medium.

情報処理方法は、予約型広告のうち、露出待ちにある未露出広告の露出スケジュールを判別することを含み得る。予約型広告及び非予約型広告に対して可能な出稿量の総量は、予め定められ得る。 The information processing method may include determining an exposure schedule for non-exposure advertisements waiting to be exposed among the scheduled advertisements. The total amount of possible placements for reserved advertisements and non-reserved advertisements can be predetermined.

広告出稿を指示することは、複数の時点に関し、時点毎に、可能な残り出稿量と、未露出広告の露出スケジュールと、第一の実績と、第二の実績とに基づき、第一の出稿量及び第二の出稿量を決定することを含み得る。 Instructing the placement of advertisements includes, for each of a plurality of time points, the first placement of advertisements based on the possible remaining amount of advertisements, the exposure schedule of unexposed advertisements, the first performance, and the second performance. Determining an amount and a second placement amount.

こうした方法を用いれば、情報処理システムと同様に、可能な出稿量の総量が定められた条件下で、複数回に亘る予約型広告及び非予約型広告の出稿量の決定を通じて、広告効果の高い予約型広告及び非予約型広告の出稿を実現可能である。 By using such a method, similarly to the information processing system, under the condition that the total amount of possible advertisements is determined, through determination of the amount of advertisements for reserved advertisements and non-reserved advertisements over multiple times, advertisement effectiveness is high It is possible to implement reserved advertisements and non-reserved advertisements.

本開示の一側面によれば、情報処理方法は、コンピュータにより実行されてもよい。本開示の一側面によれば、強化学習、コンテキスチュアルバンデットアルゴリズム、及びカルマンフィルタ等の動的最適化アルゴリズムに従って、時点毎に、第一の出稿量及び第二の出稿量が決定されてもよい。その他、上述の情報処理システムに対応する情報処理方法が提供されてもよい。 According to one aspect of the present disclosure, the information processing method may be performed by a computer. According to one aspect of the present disclosure, the first ad spend and the second ad spend may be determined for each time point according to dynamic optimization algorithms such as reinforcement learning, contextual bandit algorithms, and Kalman filters. In addition, an information processing method corresponding to the information processing system described above may be provided.

情報処理システムの構成を表すブロック図である。1 is a block diagram showing the configuration of an information processing system; FIG. 第一実施形態において、プロセッサが実行する第一の出稿関連処理を表すフローチャートである。4 is a flow chart showing a first advertisement-related process executed by a processor in the first embodiment; 予備予算の使用に関する説明図である。It is explanatory drawing regarding the use of a reserve budget. 第一実施形態において、プロセッサが実行する第二の出稿関連処理を表すフローチャートである。10 is a flow chart showing second placement-related processing executed by the processor in the first embodiment. 第二実施形態において、プロセッサが実行する第一の出稿関連処理を表すフローチャートである。FIG. 10 is a flow chart showing a first advertisement-related process executed by a processor in the second embodiment; FIG. 第二実施形態において、プロセッサが実行する第二の出稿関連処理を表すフローチャートである。FIG. 11 is a flow chart showing a second advertisement-related process executed by a processor in the second embodiment; FIG.

以下に本開示の例示的実施形態を、図面を参照しながら説明する。
［第一実施形態］
本実施形態の情報処理システム１は、汎用のコンピュータシステムに、本実施形態に特有のコンピュータプログラムがインストールされることにより構成される。図１に示す情報処理システム１は、プロセッサ１１と、メモリ１２と、ストレージ１３と、ディスプレイ１５と、入力デバイス１７と、メディアリーダ／ライタ１８と、通信デバイス１９とを備える。 Exemplary embodiments of the present disclosure are described below with reference to the drawings.
[First embodiment]
The information processing system 1 of this embodiment is configured by installing a computer program specific to this embodiment in a general-purpose computer system. The information processing system 1 shown in FIG. 1 includes a processor 11 , memory 12 , storage 13 , display 15 , input device 17 , media reader/writer 18 and communication device 19 .

プロセッサ１１は、ストレージ１３が記憶するコンピュータプログラムに従う処理を実行するように構成される。メモリ１２は、ＲＡＭを含む。メモリ１２は、プロセッサ１１がコンピュータプログラムに従う処理を実行する際に、作業領域として使用される。メモリ１２は、ストレージ１３から読み出されたコンピュータプログラム及びデータを一時記憶する。 Processor 11 is configured to execute processing according to a computer program stored in storage 13 . Memory 12 includes RAM. Memory 12 is used as a work area when processor 11 executes processing according to a computer program. The memory 12 temporarily stores computer programs and data read from the storage 13 .

ストレージ１３は、コンピュータプログラム及び各種データを格納する。ストレージ１３に格納されるコンピュータプログラムの一つには、プロセッサ１１が、指定された広告予算の中で広告効果を最大化するように、予約型広告及び非予約型広告のそれぞれに対する出稿量を、動的最適化アルゴリズムに基づいて段階的に決定するためのコンピュータプログラムが含まれる。ストレージ１３の例には、ハードディスクドライブ及びソリッドステートドライブが含まれる。 The storage 13 stores computer programs and various data. One of the computer programs stored in the storage 13 is that the processor 11 maximizes the advertising effect within the specified advertising budget, by adjusting the amount of advertisements for each of the reserved advertisements and the non-reserved advertisements, A computer program is included for stepwise determination based on a dynamic optimization algorithm. Examples of storage 13 include hard disk drives and solid state drives.

ディスプレイ１５は、ユーザに向けて各種情報を表示するように構成される。ディスプレイ１５は、例えば液晶ディスプレイである。入力デバイス１７は、ユーザからの操作信号をプロセッサ１１に入力するように構成される。入力デバイス１７は、ユーザが操作可能なキーボード及びポインティングデバイスを備える。 The display 15 is configured to display various information to the user. The display 15 is, for example, a liquid crystal display. The input device 17 is configured to input operation signals from the user to the processor 11 . The input device 17 includes a user-operable keyboard and pointing device.

メディアリーダ／ライタ１８は、メモリカードなどの記録メディアに記録された情報を読取可能、及び、記録メディアに新規情報を書込可能に構成される。通信デバイス１９は、プロセッサ１１により制御されて、ローカルエリアネットワーク内の、及び／又は、広域ネットワーク内の外部システムと通信するように構成される。 The media reader/writer 18 is configured to be able to read information recorded on a recording medium such as a memory card and write new information to the recording medium. Communication device 19 is controlled by processor 11 and configured to communicate with external systems within a local area network and/or within a wide area network.

プロセッサ１１は、入力デバイス１７を通じてユーザから入力される指令に従って、ストレージ１３が記憶するコンピュータプログラムに基づく出稿関連処理を実行する。出稿関連処理は、第一の出稿関連処理（図２参照）と、第二の出稿関連処理（図４参照）とを含む。プロセッサ１１は、第一の出稿関連処理の実行後、入力デバイス１７を通じてユーザから入力される更なる指令に従って、第二の出稿関連処理を実行する。 The processor 11 executes advertisement-related processing based on computer programs stored in the storage 13 according to commands input by the user through the input device 17 . The advertisement related processing includes a first advertisement related processing (see FIG. 2) and a second advertisement related processing (see FIG. 4). After executing the first advertisement-related process, the processor 11 executes a second advertisement-related process according to a further command input by the user through the input device 17 .

第一の出稿関連処理では、指定された条件で、複数の媒体に対する広告の最低出稿予算が決定され、出稿計画が出力される。第二の出稿関連処理では、出稿計画で定められた最低出稿予算に基づいて、複数の媒体への出稿量が決定され、広告出稿が指示される。 In the first placement-related process, a minimum placement budget for advertisements for a plurality of media is determined under specified conditions, and a placement plan is output. In the second placement-related processing, the amount of placements on a plurality of media is determined based on the minimum placement budget determined in the placement plan, and the placement of advertisements is instructed.

複数の媒体には、予約型広告を配信する複数の媒体（以下「予約型媒体」という）と、非予約型広告を配信する複数の媒体（以下「非予約型媒体」という）と、が含まれる。ここでいう広告の配信は、広告を流すことを意味し、インターネットを通じた広告の配信だけでなく、テレビジョン放送やラジオ放送等を通じた広告の放送を含むと、広義に理解されたい。広告は、媒体を通じて消費者に露出される。広告は、例えば、広告主が販売する商品又は広告主が提供するサービスに関するものであり得る。 Multiple media include multiple media that deliver reserved advertisements (hereinafter referred to as “reserved media”) and multiple media that deliver non-reserved advertisements (hereinafter referred to as “non-reserved media”). be The distribution of advertisements here means the distribution of advertisements, and should be broadly understood to include not only the distribution of advertisements through the Internet, but also the broadcasting of advertisements through television broadcasting, radio broadcasting, and the like. Advertisements are exposed to consumers through media. Advertisements may relate to, for example, goods sold by the advertiser or services provided by the advertiser.

一例としての非予約型広告は、出稿とほぼ同時に配信される運用型広告であって、インターネットを通じて配信されるデジタル広告である。一例としての予約型広告は、テレビジョン放送を通じて、テレビコマーシャル（ＣＭ）の形態で配信される広告である。テレビＣＭは、スポットＣＭを含む。テレビＣＭとしての予約型広告は、広告出稿時に定められたスケジュールに従って、所定の放送枠で放送される。以下では、予約型広告としてテレビＣＭを想定し、非予約型広告としてデジタル広告を想定した例を説明する。 Non-reserved advertisements, for example, are programmatic advertisements that are distributed almost simultaneously with the placement of advertisements, and are digital advertisements that are distributed over the Internet. A reserved advertisement as an example is an advertisement distributed in the form of a television commercial (CM) through television broadcasting. TV commercials include spot commercials. A reservation-type advertisement as a TV commercial is broadcast in a predetermined broadcast frame according to a schedule determined at the time the advertisement is placed. Below, an example will be described in which a television commercial is assumed as a reserved advertisement and a digital advertisement is assumed as a non-reserved advertisement.

出稿関連処理の第一ステップでは、ユーザから実行条件として次の情報が与えられる。
・・広告予算
・・キャンペーン期間
・・広告効果の推定モデル
・・ターゲット情報
・・ペナルティ
プロセッサ１１は、第一の出稿関連処理（図２参照）を開始すると、Ｓ１１０において、ストレージ１３を通じて又は入力デバイス１７を通じて、これらの情報を取得する。 In the first step of the advertisement related processing, the user gives the following information as execution conditions.
Advertisement budget Campaign period Advertisement effect estimation model Target information Penalty When the processor 11 starts the first advertisement placement related process (see FIG. 2), in S110, through the storage 13 or the input device 17 to obtain this information.

広告予算は、キャンペーンで使用可能な予約型広告及び非予約型広告を含む広告の出稿予算の総額である。キャンペーン期間は、広告活動を行う期間に対応する。広告効果の推定モデルは、広告の露出実績に関する指標から、広告効果を推定するための数理モデルである。 The advertising budget is the total advertising budget for advertisements including reserved advertisements and non-reserved advertisements that can be used in campaigns. The campaign period corresponds to the period during which advertising activities are performed. The advertising effectiveness estimation model is a mathematical model for estimating advertising effectiveness from an index related to advertising exposure results.

広告効果は、目標の消費者に広告が到達した数であるリーチ数として数値化され得る。あるいは、広告効果は、コンバージョン率として数値化され得る。あるいは、広告効果は、ブランドリフトの大小を数値化して表現され得る。広告効果は、リーチ数、コンバージョン率、及びブランドリフト等の複数の指標を総合したスコアとして数値表現されてもよい。 Ad effectiveness may be quantified as reach, which is the number of times an ad reaches a target consumer. Alternatively, advertising effectiveness can be quantified as a conversion rate. Alternatively, the advertising effect can be expressed by quantifying the magnitude of the brand lift. Advertisement effectiveness may be expressed numerically as a score that combines multiple indices such as the number of reach, conversion rate, and brand lift.

ターゲット情報は、広告目標の消費者セグメントを指定する情報である。広告効果は、ターゲット情報で指定された消費者セグメントに対する効果として推定される。消費者セグメントは、例えば、性別及び年齢層により指定される。 Target information is information that specifies a consumer segment for advertising targets. Advertising effectiveness is estimated as the effectiveness for the consumer segment specified by the targeting information. Consumer segments are specified, for example, by gender and age group.

ペナルティは、可能な出稿量の総量である、指定された広告予算を超える広告出稿を、出稿関連処理が指示するのを抑制するためのパラメータである。予算を超える広告出稿に対する広告主の許容度が低いほど、予算を超える広告出稿の指示を抑えるために、ペナルティは大きい値として指定される。この説明から理解できるように、「可能な」出稿量の総量は、対応する量までは、少なくとも出稿が可能な出稿量の総量の意味であり、「可能な」出稿量を超える出稿が禁止されていることを意味する表現ではないことに留意されたい。 The penalty is a parameter for suppressing the placement-related processing from instructing placement of an advertisement exceeding a designated advertisement budget, which is the total possible amount of placement. The lower the advertiser's tolerance for over-budget ad placements, the greater the penalty is specified in order to discourage over-budget ad placement directives. As can be understood from this explanation, the total amount of “possible” ad placement means the total amount of adverts that can be placed at least up to the corresponding amount, and placing more than the “possible” amount is prohibited. Note that the expression does not imply that

続くＳ１２０において、プロセッサ１１は、キャンペーン期間開始前の時点ｔ＝０での広告出稿に関する行動を、強化学習アルゴリズムに従って選択する。具体的には、プロセッサ１１は、与えられた初期状態ｓ（ｔ＝０）に基づき、キャンペーン期間開始前の時点ｔ＝０での行動として、複数の予約型媒体、及び、複数の非予約型媒体に対する最低出稿予算ｂ_１ ^Ｒ（０），ｂ_２ ^Ｒ（０），…，ｂ_ｍ ^Ｒ（０），…，ｂ_Ｍ ^Ｒ（０），ｂ_１ ^Ｐ（０），ｂ_２ ^Ｐ（０），…，ｂ_ｎ ^Ｐ（０），…，ｂ_Ｎ ^Ｐ（０）を決定する。ここでの出稿予算は、広告枠の買い付け予算である。 In subsequent S120, the processor 11 selects, according to a reinforcement learning algorithm, an action regarding advertisement placement at time t=0 before the start of the campaign period. Specifically, based on a given initial state s (t=0), the processor 11 performs a plurality of reserved media and a plurality of non-reserved media as actions at time t=0 before the start of the campaign period. Minimum advertising budget for media b ₁ ^R (0), b ₂ ^R (0), ..., b _m ^R (0), ..., b _M ^R (0), b ₁ ^P (0), b ₂ ^P (0) _{, . . . , b n} _P ⁽ 0), ^. The advertisement budget here is the budget for purchasing the advertising space.

ｂ_ｍ ^Ｒ（０）は、複数の予約型媒体のうち、第ｍの予約型媒体に対する出稿予算である。Ｍは、予約型媒体が、第１の予約型媒体から第Ｍの予約型媒体まで存在するときの予約型媒体の数である。Ｍは、１以上の整数値を採り、ｍは、値１から値Ｍまでの整数値を採る。 b _m ^R (0) is the ad placement budget for the m-th reservation-type medium among the plurality of reservation-type media. M is the number of reservation-type media when the reservation-type media exist from the first reservation-type medium to the M-th reservation-type medium. M takes an integer value of 1 or more, and m takes an integer value from 1 to M.

ｂ_ｎ ^Ｐ（０）は、複数の非予約型媒体のうち、第ｎの非予約型媒体に対する出稿予算である。Ｎは、非予約型媒体が、第１の非予約型媒体から第Ｎの非予約型媒体まで存在するときの非予約型媒体の数である。Ｎは、１以上の整数値を採り、ｎは、値１から値Ｎまでの整数値を採る。 b _n ^P (0) is the ad placement budget for the n-th non-reserved medium among the plurality of non-reserved mediums. N is the number of non-reserved media when non-reserved media exist from the first non-reserved medium to the Nth non-reserved medium. N takes an integer value of 1 or more, and n takes an integer value from 1 to N.

初期状態ｓ（０）は、広告予算によって定義される。強化学習アルゴリズムでは、指定された広告予算の情報に基づいて、更には予め学習された行動選択に関するポリシーに基づいて、複数の予約型媒体及び複数の非予約型媒体に対する最低出稿予算ｂ_１ ^Ｒ（０），ｂ_２ ^Ｒ（０），…，ｂ_Ｍ ^Ｒ（０），ｂ_１ ^Ｐ（０），ｂ_２ ^Ｐ（０），…，ｂ_Ｎ ^Ｐ（０）が決定される。 The initial state s(0) is defined by the advertising budget. In the reinforcement learning algorithm, based on the specified advertising budget information and further based on the pre-learned policy regarding action selection, the minimum advertisement budget b ₁ ^R ( 0), b ₂ ^R (0), . . . , b _MR ₍ 0), ^b ₁ ^P (0), ^b ₂ ^P (0), .

続くＳ１３０において、プロセッサ１１は、出稿計画として、Ｓ１２０で決定された媒体毎の最低出稿予算ｂ_１ ^Ｒ（０），ｂ_２ ^Ｒ（０），…，ｂ_Ｍ ^Ｒ（０），ｂ_１ ^Ｐ（０），ｂ_２ ^Ｐ（０），…，ｂ_Ｎ ^Ｐ（０）を、広告予算と予備予算と共に出力する。 ^In subsequent S130, the processor 11 sets ^the minimum advertising budget _b1R (0), _b2R (0), ^... , _bMR ⁽ 0), _b1P ( ⁰ ) _, b ₂ ^P (0), .

広告予算は、上述した通りである。予備予算は、広告予算から、最低出稿予算ｂ_１ ^Ｒ（０），ｂ_２ ^Ｒ（０），…，ｂ_Ｍ ^Ｒ（０），ｂ_１ ^Ｐ（０），ｂ_２ ^Ｐ（０），…，ｂ_Ｎ ^Ｐ（０）の合計を減算した値に対応する。広告予算に、変数Ｂを割り当て、予備予算に、変数Ｂｒを割り当てるとき、予備予算は、Ｂｒ＝Ｂ－（ｂ_１ ^Ｒ（０）＋ｂ_２ ^Ｒ（０）＋…＋ｂ_Ｍ ^Ｒ（０）＋ｂ_１ ^Ｐ（０）＋ｂ_２ ^Ｐ（０）＋…＋ｂ_Ｎ ^Ｐ（０））である。すなわち、予備予算は、広告予算のうち、予約型媒体及び非予約型媒体のいずれにも割り当てられていない予算に対応する。 The advertising budget is as described above. The reserve budget is calculated from the advertising budget as the minimum advertising budget b ₁ ^R (0), b ₂ ^R (0), ..., b _M ^R (0), b ₁ ^P (0), b ₂ ^P (0), ..., corresponds to the sum of b _N ^P (0) subtracted. When the advertising budget is assigned the variable B and the reserve budget is assigned the variable Br, the reserve budget is Br=B−(b ₁ ^R (0)+b ₂ ^R (0)+ . . . +b _M ^R (0)+b ₁ ^P (0 ⁾ + _b2P (0)+...+ _bNP (0)) ^. In other words, the reserve budget corresponds to the advertising budget that is not allocated to either the reserved medium or the non-reserved medium.

Ｓ１３０において、プロセッサ１１は、ディスプレイ１５を通じた表示により出稿計画を出力することができる。プロセッサ１１は、ストレージ１３に、ユーザが閲覧可能なデータファイルとして出稿計画を出力することができる。プロセッサ１１は、第二の出稿関連処理に必要な出稿計画の情報をストレージ１３に記憶することができる。その後、プロセッサ１１は、第一の出稿関連処理を終了する。 At S<b>130 , the processor 11 can output the advertisement placement plan by display through the display 15 . The processor 11 can output the publication plan to the storage 13 as a data file that can be viewed by the user. The processor 11 can store in the storage 13 the information of the advertisement plan necessary for the second advertisement related processing. After that, the processor 11 terminates the first publication-related processing.

出稿計画は、キャンペーン期間開始前に広告主に提示される。出稿計画が広告主により採用された場合、キャンペーン期間開始前に、予約型媒体に関して、出稿計画に従う最低出稿予算ｂ_１ ^Ｒ（０），ｂ_２ ^Ｒ（０），…，ｂ_Ｍ ^Ｒ（０）分の広告枠が買い付けられ、各予約型媒体に対する出稿作業が完了する。この出稿作業は、広告代理店の担当者によって手作業で、あるいは、媒体との中継システム３１を通じて自動で行われる。 Advertisement plans are presented to advertisers before the start of the campaign period. When the advertisement plan _is adopted by the advertiser, the minimum advertisement budget b ₁ ^R (0 ⁾ , b ₂ ^R (0), . The number of advertising spaces is purchased, and the work of placing advertisements for each reserved medium is completed. This advertisement work is performed manually by the person in charge of the advertising agency, or automatically through the media relay system 31 .

その後、出稿済予約型広告についての配信スケジュールがストレージ１３に記録される。プロセッサ１１は、配信スケジュールを、中継システム３１を通じて自動で取得することができる。あるいは、広告代理店の担当者は、配信スケジュールを媒体側の企業から取得し、入力デバイス１７やメディアリーダ／ライタ１８を通じて、情報処理システム１に入力することができる。 After that, the distribution schedule for the posted reservation-type advertisement is recorded in the storage 13 . Processor 11 can automatically acquire the delivery schedule through relay system 31 . Alternatively, the person in charge of the advertising agency can acquire the delivery schedule from the company on the medium side and input it to the information processing system 1 through the input device 17 or the media reader/writer 18 .

第一の出稿関連処理の終了後、プロセッサ１１は、出稿計画に基づく処理の実行指示が入力デバイス１７を通じてユーザから入力されると、第二の出稿関連処理（図４参照）を開始する。第二の出稿関連処理では、出稿計画に従う広告予算Ｂ、各媒体の最低出稿予算ｂ_１ ^Ｒ（０），ｂ_２ ^Ｒ（０），…，ｂ_Ｍ ^Ｒ（０），ｂ_１ ^Ｐ（０），ｂ_２ ^Ｐ（０），…，ｂ_Ｎ ^Ｐ（０）、予備予算Ｂｒに基づき、非予約型媒体への広告出稿、及び、予約型媒体への追加の広告出稿が決定される。 After the first advertisement-related processing ends, when the user inputs an instruction to execute processing based on the advertisement placement plan through the input device 17, the processor 11 starts the second advertisement-related processing (see FIG. 4). In the second advertisement related processing, the advertisement budget B according to the advertisement placement plan, the minimum advertisement budget of each medium b ₁ ^R (0), b ₂ ^R (0), . . . , b _M ^R (0), b ₁ ^P (0) , b ₂ ^P (0) ^, _.

第二の出稿関連処理の開始時であるキャンペーン期間の開始時において、上述の通り、最低出稿予算ｂ_１ ^Ｒ（０），ｂ_２ ^Ｒ（０），…，ｂ_Ｍ ^Ｒ（０）に基づく予約型媒体への広告出稿は完了している。一方、最低出稿予算ｂ_１ ^Ｐ（０），ｂ_２ ^Ｐ（０），…，ｂ_Ｎ ^Ｐ（０）に基づく非予約型媒体への広告出稿は行われていない。最低出稿予算ｂ_１ ^Ｐ（０），ｂ_２ ^Ｐ（０），…，ｂ_Ｎ ^Ｐ（０）に基づく非予約型媒体への広告出稿は、キャンペーン期間の開始後に行われる。 At the start of the campaign period, which is the start of the second ^{advertisement} _- related processing, reservations based on the minimum advertisement budgets b ₁ ^R (0), b ₂ ^R (0), . The placement of advertisements on printed media has been completed. On the other hand, no advertisements are placed on the non-reservation type media based on the minimum advertisement budgets b ₁ ^P (0), b ₂ ^P (0), . . . , b _N ^P (0). Ad placements on non-reservation type media based on the minimum placement budgets b ₁ ^P (0), b ₂ ^P (0), . . . , b _N ^P (0) are performed after the campaign period starts.

本実施形態では、図３に示すように、予備予算Ｂｒが、キャンペーン期間において、予約型広告又は非予約型広告の出稿に段階的に使用される。図３によれば、キャンペーン期間開始前における予約型媒体への出稿予算は「４０」であり、非予約型媒体への出稿予算は「２０」であり、予備予算は、「１０」である。数値は例示であり、その単位は任意である。 In this embodiment, as shown in FIG. 3, the reserve budget Br is used step by step for placing reserved advertisements or non-reserved advertisements during the campaign period. According to FIG. 3, before the start of the campaign period, the advertisement budget for reservation type media is "40", the advertisement budget for non-reservation type media is "20", and the reserve budget is "10". Numerical values are examples, and their units are arbitrary.

キャンペーン期間の途中において、予備予算「１０」のうちの「４」が、予約型媒体に割り当てられ、「２」が非予約型媒体に割り当てられ、それぞれの媒体に対する追加の広告出稿が行われる。更に後の時点で、残り予備予算「４」のうちの「２」が、予約型媒体に割り当てられ、更に「２」が、非予約型媒体に割り当てられ、それぞれの媒体に対する追加の広告出稿が行われる。これにより予備予算が消化される。図３を用いた説明は、予備予算の消化が段階的に行われることを説明する目的でなされたものであり、数値に何ら意味はなく、単なる例示であることを理解されたい。 In the middle of the campaign period, "4" of the reserve budget of "10" are allocated to reserved media and "2" are allocated to non-reserved media, and additional advertisements are placed on each medium. At a still later point, "2" of the remaining reserve budget of "4" will be allocated to reserved media, and a further "2" will be allocated to non-reserved media, with additional ad placements on each medium. done. This will consume the reserve budget. The explanation using FIG. 3 is made for the purpose of explaining that the reserve budget is used in stages, and it should be understood that the numerical values have no meaning and are merely examples.

この予備予算の段階的割当は、強化学習アルゴリズムに従って、広告効果を最大化するように実行される。第二の出稿関連処理（図４参照）において、プロセッサ１１は、出稿計画に従って予め出稿された出稿済予約型広告の配信スケジュールを、ストレージ１３から読み込む（Ｓ２１０）。 This stepwise allocation of reserve budget is performed according to a reinforcement learning algorithm to maximize advertising effectiveness. In the second placement-related process (see FIG. 4), the processor 11 reads from the storage 13 the delivery schedule of the posted reservation-type advertisements placed in advance according to the placement plan (S210).

更にプロセッサ１１は、強化学習アルゴリズムに従って各媒体への出稿量を決定するために、キャンペーン期間の開始時点であるキャンペーン期間１日目の時点ｔ＝１における状態ｓ（１）を次のように設定する（Ｓ２３０）。 Further, the processor 11 sets the state s(1) at time t=1 on the first day of the campaign period, which is the start time of the campaign period, as follows, in order to determine the amount of advertisements to be placed on each medium according to the reinforcement learning algorithm. (S230).

状態変数は、６つの項から構成される。第１項「１／Ｔ」は、キャンペーン期間の１日目におけるキャンペーン期間の経過割合を示す。ｔ／Ｔは、時点ｔにおけるキャンペーン期間の経過割合を示す。 A state variable consists of six terms. The first term “1/T” indicates the rate of progress of the campaign period on the first day of the campaign period. t/T indicates the rate of elapse of the campaign period at time t.

以下において時点ｔは、キャンペーン期間のｔ日目の時点であることを意味する。すなわち、ｔは、キャンペーン期間内の時点を、日単位で表す離散時間である。Ｔは、キャンペーン期間の長さを表し、具体的には、キャンペーン期間の日数を表す。 In the following, the point in time t means the point in time of the tth day of the campaign period. That is, t is a discrete time representing the time in days within the campaign period. T represents the length of the campaign period, specifically, the number of days of the campaign period.

第２項「Ｂ（１）／Ｂ」は、広告予算Ｂに対する時点ｔ＝１での予算残高Ｂ（１）の割合を示す。すなわち、Ｂ（ｔ）／Ｂは、キャンペーン期間ｔ日目の広告予算Ｂに対する予算残高Ｂ（ｔ）の割合である。予算残高Ｂ（１）は、次式により算出される。 The second term "B(1)/B" indicates the ratio of the budget balance B(1) to the advertising budget B at time t=1. That is, B(t)/B is the ratio of the remaining budget B(t) to the advertising budget B on the tth day of the campaign period. Budget balance B(1) is calculated by the following equation.

このように予算残高Ｂ（１）は、予備予算Ｂｒに対応する。

Budget balance B(1) thus corresponds to reserve budget Br.

第３項は、第１の予約型媒体から第Ｍの予約型媒体までの各予約型媒体の広告配信に関するコストパフォーマンスを表す。キャンペーン期間１日目、すなわち時点ｔ＝１においてはコストパフォーマンスが不明であることから、時点ｔ＝１での第３項は、各予約型媒体のコストパフォーマンスがゼロであることを示している。すなわち｛０｝_{１≦ｍ≦Ｍ}は、要素数がＭで、各要素の値がゼロの一次元配列である。 The third term represents cost performance related to advertisement distribution of each reserved medium from the first reserved medium to the Mth reserved medium. Since the cost performance is unknown on the first day of the campaign period, that is, at time t=1, the third term at time t=1 indicates that the cost performance of each reserved medium is zero. That is, {0} _1≤m≤M is a one-dimensional array with M elements and the value of each element is zero.

第４項は、第１の非予約型媒体から第Ｎの非予約型媒体までの各非予約型媒体の広告配信に関するコストパフォーマンスを表す。キャンペーン期間１日目においてはコストパフォーマンスが不明であることから、時点ｔ＝１での第４項は、各非予約型媒体のコストパフォーマンスがゼロであることを示している。すなわち｛０｝_{１≦ｎ≦Ｎ}は、要素数がＮで、各要素の値がゼロの一次元配列である。但し、第３項及び第４項ともゼロに代えて過去の類似の広告出稿で計測されたコストパフォーマンスが用いられてもよい。 The fourth term represents cost performance related to advertisement distribution for each non-reservation type medium from the first non-reservation type medium to the Nth non-reservation type medium. Since the cost performance is unknown on the first day of the campaign period, the fourth term at time t=1 indicates that the cost performance of each non-reserved medium is zero. That is, {0} _{1 ≤ n ≤ N} is a one-dimensional array with N elements and the value of each element is zero. However, both the third and fourth terms may be replaced with zero, and the cost performance measured in past similar advertisement placements may be used.

第５項は、時点ｔ＝１において推定される第１の予約型媒体から第Ｍの予約型媒体までの各予約型媒体におけるキャンペーン期間中（１≦τ≦Ｔ）の各日τのＧＲＰ（ＧｒｏｓｓＲａｔｉｎｇＰｏｉｎｔ）の配列である。 The fifth term is GRP ( Gross Rating Point).

Ｇ_ｍ，τ（１）は、キャンペーン期間１日目において推定されるＧＲＰ（延べ視聴率）であって、配信スケジュールによれば、対応する日τに、対応する予約型媒体（ｍ）で配信される出稿済の未配信予約型広告の推定視聴率の合計である。すなわち、第５項は、１≦ｍ≦Ｍ及び１≦τ≦Ｔのｍ，τの組合せ毎のＧ_ｍ，τ（１）の配列｛Ｇ_ｍ，τ（１）｝_{１≦ｍ≦Ｍ，１≦τ≦Ｔ}である。 G _m,τ (1) is the GRP (gross audience rating) estimated on the first day of the campaign period, and according to the distribution schedule, on the corresponding day τ, distributed on the corresponding reserved medium (m) It is the total estimated audience rating of undelivered scheduled advertisements that have been placed. That is, the fifth term is an array {G m _{, τ (1)} of G m, τ} ₍ 1) for each combination of m and τ with 1 ≤ m ≤ M and _{1 ≤ τ ≤ T 1 ≤ m ≤ M, 1≤τ≤T} .

プロセッサ１１は、Ｓ２１０で取得した配信スケジュールに従って、配列｛Ｇ_ｍ，τ（１）｝_{１≦ｍ≦Ｍ，１≦τ≦Ｔ}を生成することができる。推定ＧＲＰは、過去の同時間帯の視聴率の実績から算出され得る。算出に必要な視聴率の実績データは、予めストレージ１３に格納され得る。 The processor 11 can generate the array {G _{m, τ} (1)} _{1≦m≦M, 1≦τ≦T} according to the distribution schedule obtained in S210. The estimated GRP can be calculated from the past track record of audience ratings for the same time period. Performance data of audience rating required for calculation can be stored in the storage 13 in advance.

第６項は、時点ｔ＝１における各非予約型媒体に対する出稿予算の残高の配列｛Ｗ_ｎ（１）｝_{１≦ｎ≦Ｎ}である。残高Ｗ_ｎ（１）は、時点ｔ＝１における第ｎの非予約型媒体に対する出稿予算の残高であり、第一の出稿関連処理で算出された第ｎの非予約型媒体に対する最低出稿予算ｂ_ｎ ^Ｐ（０）にセットされる。すなわち、｛Ｗ_１（１），Ｗ_２（１），…，Ｗ_Ｎ（１）｝＝｛ｂ_１ ^Ｐ（０），ｂ_２ ^Ｐ（０），…，ｂ_Ｎ ^Ｐ（０）｝である。 The sixth term is an array {W _n (1)} _1≦n≦N of the balance of the advertising budget for each non-reserved medium at time t=1. The balance W _n (1) is the balance of the ad placement budget for the n-th non-reserved medium at time t=1, and is the minimum ad budget b for the n-th non-reserved medium calculated in the first ad placement related process. _nP ⁽ 0) is set. That is, _{ W ₁ (1), W ₂ (1), . . . , W _N (1)}={b ₁ ^P (0), b ₂ ^P (0), ^. .

続くＳ２４０において、プロセッサ１１は、現在の時点ｔ＝１に関し、この時点ｔの状態ｓ（ｔ）に基づき、時点ｔでの広告出稿に関する行動として、時点ｔにおける予約型媒体及び非予約型媒体のそれぞれに対する出稿量を、強化学習アルゴリズムを用いて決定する。 In subsequent S240, the processor 11, regarding the current time point t=1, based on the state s(t) at this time point t, determines whether the reserved medium and the non-reserved medium at time point t are acting as an action regarding the advertisement at time point t. The ad spend for each is determined using a reinforcement learning algorithm.

すなわち、プロセッサ１１は、出稿量（ｂ_１ ^Ｒ（ｔ），ｂ_２ ^Ｒ（ｔ），…，ｂ_Ｍ ^Ｒ（ｔ），ｂ_１ ^Ｐ（ｔ），ｂ_２ ^Ｐ（ｔ），…，ｂ_Ｎ ^Ｐ（ｔ））を、状態ｓ（ｔ）と行動選択に関するポリシーとに基づいて決定する。ここで、ｂ_ｍ ^Ｒ（ｔ）は、時点ｔにおける第ｍの予約型媒体に対する出稿量である。ｂ_ｎ ^Ｐ（ｔ）は、時点ｔにおける第ｎの非予約型媒体に対する出稿量である。各媒体に対する出稿量は、具体的には、各媒体に対する出稿金額、換言すれば、各媒体の広告枠の購入額である。 That is, the processor 11 calculates the ad placement amount (b ₁ ^R (t), b ₂ ^R (t), . . . , b _M ^R (t), b ₁ ^P (t), b ₂ ^P (t), . . . , b _N ^P (t)) is determined based on the state s(t) and the policy for action selection. Here, b _m ^R (t) is the ad placement amount for the m-th reservation-type medium at time t. b _n ^P (t) is the ad placement amount for the n-th non-reserved medium at time t. The amount of advertisement for each medium is specifically the amount of advertisement for each medium, in other words, the purchase amount of the advertisement space for each medium.

強化学習アルゴリズムによれば、ポリシーは、後述する報酬ｒ（ｔ）によって更新される。本実施形態によれば、数量の異なる複数の出稿量が、行動の選択肢として定義される。ポリシーに従う行動選択として、選択肢の中から、一つの数量が選択されることにより、出稿量が決定される。 According to the reinforcement learning algorithm, the policy is updated with the reward r(t) described below. According to this embodiment, a plurality of ad placement volumes with different quantities are defined as action options. As an action selection according to the policy, the amount of advertisement is determined by selecting one quantity from the options.

続くＳ２５０において、プロセッサ１１は、出稿量（ｂ_１ ^Ｒ（ｔ），ｂ_２ ^Ｒ（ｔ），…，ｂ_Ｍ ^Ｒ（ｔ），ｂ_１ ^Ｐ（ｔ），ｂ_２ ^Ｐ（ｔ），…，ｂ_Ｎ ^Ｐ（ｔ））に基づく各媒体への広告出稿を指示する。 In subsequent S250, the processor 11 calculates the ad placement amount ( _b1R ( ^t ), _b2R (t), ^... , _bMR ( ^t ), _b1P ( ^t ), _b2P ( ^t ),..., b _N ^P (t)) to direct the placement of advertisements on each medium.

Ｓ２５０において、プロセッサ１１は、例えば中継システム３１に広告出稿を指示することができる。中継システム３１は、指示された内容に従って、各媒体のシステムに対する広告出稿を自動で行うことができる。プロセッサ１１は、ディスプレイ１５を通じた出稿量の表示により各媒体への広告出稿をユーザに向けて指示してもよい。ユーザは、表示された内容に従って、各媒体に対する広告出稿作業を少なくとも部分的に手作業で行うことができる。 In S250, the processor 11 can instruct the relay system 31 to place an advertisement, for example. The relay system 31 can automatically place an advertisement for each media system according to the contents of the instruction. The processor 11 may instruct the user to place advertisements on each medium by displaying the amount of advertisements on the display 15 . The user can at least partially manually place an advertisement for each medium according to the displayed content.

続くＳ２６０において、プロセッサ１１は、第１の予約型媒体から第Ｍの予約型媒体までの各予約型媒体を通じて配信された予約型広告の平均視聴率Ｐ_ｍ ^Ｒ（ｔ）を露出実績として判別する。プロセッサ１１は更に、各予約型媒体に対する広告出稿コストｃ_ｍ ^Ｒ（ｔ）＝ｂ_ｍ ^Ｒ（ｔ）と、各予約型媒体における予約型広告の平均視聴率Ｐ_ｍ ^Ｒ（ｔ）と、に基づき、各予約型媒体の時点ｔにおける露出量Ｉ_ｍ ^Ｒ（ｔ）＝ｃ_ｍ ^Ｒ（ｔ）×Ｐ_ｍ ^Ｒ（ｔ）を算出する。 In subsequent S260, the processor 11 determines the average audience rating P _m ^R (t) of the reservation-type advertisements distributed through each reservation-type medium from the first reservation-type medium to the Mth reservation-type medium as the actual exposure performance. . The processor 11 is further based on the advertisement placement cost c _m ^R (t)=b _m ^R (t) for each reservation type medium and the average audience rating P _m ^R (t) of the reservation type advertisement for each reservation type medium. , the exposure amount I _m ^R (t)=c _m ^R (t)×P _m ^R (t) at time t for each reservation type medium is calculated.

Ｉ_ｍ ^Ｒ（ｔ）は、第ｍの予約型媒体を通じて配信された予約型広告の時点ｔにおける露出量を表す。Ｐ_ｍ ^Ｒ（ｔ）は、第ｍの予約型媒体における時点ｔでの平均視聴率を表し、ｃ_ｍ ^Ｒ（ｔ）は、時点ｔにおいて第ｍの予約型媒体を通じて配信された広告の出稿コスト、すなわち出稿金額を表す。プロセッサ１１は、各予約型媒体の平均視聴率Ｐ_ｍ ^Ｒ（ｔ）の情報を、例えば、視聴行動を計測する計測システム３５から通信デバイス１９を通じて取得して、平均視聴率Ｐ_ｍ ^Ｒ（ｔ）を判別することができる。 I _m ^R (t) represents the amount of exposure at time t of the reserved advertisement delivered through the m-th reserved medium. P _m ^R (t) represents the average audience rating at time t on the m-th reservation-type medium, and c _m ^R (t) is the cost of advertising distributed through the m-th reservation-type medium at time t. , that is, represents the amount of money placed. The processor 11 acquires information on the average audience rating P _m ^R (t) of each reservation-type medium, for example, from the measurement system 35 that measures viewing behavior through the communication device 19, and calculates the average audience rating P _m ^R (t). can be determined.

続くＳ２７０において、プロセッサ１１は、第１の非予約型媒体から第Ｎの非予約型媒体までの各非予約型媒体に関して、時点ｔで非予約型媒体を通じて配信された非予約型広告のＣＰＭ（ＣｏｓｔｐｅｒＭｉｌｌｅ）：Ｐ_ｎ ^Ｐ（ｔ）を露出実績として判別し、各非予約型媒体に対する広告出稿コストｃ_ｎ ^Ｐ（ｔ）と、各非予約型媒体における非予約型広告のＣＰＭ：Ｐ_ｎ ^Ｐ（ｔ）と、に基づき、各予約型媒体の時点ｔにおけるインプレッションＩ_ｎ ^Ｐ（ｔ）＝ｃ_ｎ ^Ｐ（ｔ）／Ｐ_ｎ ^Ｐ（ｔ）を算出する。 In subsequent S270, the processor 11 calculates the CPM ( Cost per Mille): P _n ^P (t) is discriminated as the exposure performance, and the advertisement placement cost c _n ^P (t) for each non-reservation type medium and the CPM of non-reservation type advertisement in each non-reservation type medium: P _n Based on ^P (t) and Impression I _n ^P (t)=c _n ^P (t)/P _n ^P (t) at time t for each reserved medium is calculated.

Ｉ_ｎ ^Ｐ（ｔ）は、第ｎの非予約型媒体の時点ｔにおけるインプレッションを表す。Ｐ_ｎ ^Ｐ（ｔ）は、第ｎの非予約型媒体における時点ｔでの非予約型広告のＣＰＭを表し、ｃ_ｎ ^Ｐ（ｔ）は、第ｎの非予約型媒体における時点ｔの広告出稿コストを表す。ｃ_ｎ ^Ｐ（ｔ）は、ｂ_ｎ ^Ｐ（ｔ）に基づく出稿で実際に要する出稿金額を表す。 I _n ^P (t) represents the impression at time t on the n th non-reserved medium. P _n ^P (t) represents the CPM of the non-reserved advertisement at time t on the n th non-reserved medium, and c _n ^P (t) is the ad placement at time t on the n th non-reserved medium represent the cost. _cnP (t) represents the amount of money actually required for ^placement based on _bnP (t) ^.

続くＳ２８０において、プロセッサ１１は、時点ｔにおける報酬ｒ（ｔ）を算出する。具体的には、プロセッサ１１は、時点ｔまでの予約型媒体を通じた予約型広告の露出量｛Ｉ_ｍ ^Ｒ（ｓ）｝_{１≦ｍ≦Ｍ，１≦ｓ≦ｔ}と、時点ｔまでの非予約型媒体を通じた非予約型広告の露出量であるインプレッション｛Ｉ_ｎ ^Ｐ（ｓ）｝_{１≦ｎ≦Ｎ，１≦ｓ≦ｔ}と、を、広告効果の推定モデルに入力することにより、推定モデルから時点ｔでの広告効果Ｚ（ｔ）を得る。プロセッサ１１は、時点ｔでの広告効果Ｚ（ｔ）と、一つ前の時点での広告効果Ｚ（ｔ－１）との差分Ｚ（ｔ）－Ｚ（ｔ－１）を、報酬ｒ（ｔ）として、算出する。 In subsequent S280, processor 11 calculates reward r(t) at time t. Specifically, the processor 11 calculates the exposure amount {I _m ^R (s)} _{1≦m≦M, 1≦s≦t} of the reserved advertisement through the reserved medium up to time t, and the non-exposure amount up to time t. Impression {I _n ^P (s)} _{1 ≤ n ≤ N, 1 ≤ s ≤ t} , which is the amount of exposure of non-reserved advertisements through reserved media, is input to the advertising effectiveness estimation model. Obtain the advertising effectiveness Z(t) at time t from the model. The processor 11 calculates the difference Z(t)-Z(t-1) between the advertising effect Z(t) at time t and the advertising effect Z(t-1) at the previous time as a reward r( t).

ｒ（ｔ）＝Ｚ（ｔ）－Ｚ（ｔ－１）
但し、予算残高Ｂ（ｔ）がゼロ未満の場合、すなわち、予算超過の状況における報酬ｒ（ｔ）は、負の報酬－Ｃとして定義される。値Ｃは、正の値であり、上述したペナルティである。 r(t)=Z(t)-Z(t-1)
However, if the budget balance B(t) is less than zero, ie, the reward r(t) in an over-budget situation, is defined as a negative reward -C. The value C is a positive value and the penalty mentioned above.

ｒ（ｔ）＝－Ｃ
続くＳ２９０において、プロセッサ１１は、次の時点ｔの到来に応じて、状態ｓ（ｔ）を、到来した現時点ｔの状態に更新する。すなわち、プロセッサ１１は、状態ｓ（ｔ）を次のように設定する。この際、状態ｓ（ｔ）の設定のために、時点ｔにおける最新の配信スケジュールを取得する。 r(t) = -C
In subsequent S290, the processor 11 updates the state s(t) to the state of the current time point t when the next time point t arrives. That is, processor 11 sets state s(t) as follows. At this time, the latest delivery schedule at time t is obtained for setting the state s(t).

状態ｓ（ｔ）の第１項及び第２項は上述した通りである。現時点ｔが、キャンペーン期間２日目の時点であれば（すなわちｔ＝２であれば）、第１項は、２／Ｔに更新され、第２項は、Ｂ（２）／Ｂに更新される。Ｂ（２）は、キャンペーン期間２日目の時点における予算残高である。 The first and second terms of state s(t) are as described above. If the current time t is the second day of the campaign period (that is, if t=2), then the first term is updated to 2/T and the second term is updated to B(2)/B. be. B(2) is the budget balance as of the second day of the campaign period.

第３項は、時点ｔにおける各予約型媒体の広告配信に関するコストパフォーマンスを表す。第ｍの予約型媒体のコストパフォーマンスは、キャンペーン期間の初日から現在より一つ前の時点ｔ－１までの各時点τにおける第ｍの予約型媒体での平均視聴率Ｐ_ｍ ^Ｒ（τ）の平均値で数値化される。第３項は、第１の予約型媒体から第Ｍの予約型媒体までの各媒体における平均値（１／（ｔ－１））ΣＰ_ｍ ^Ｒ（τ）を要素に有する配列である。 The third term represents cost performance related to advertisement distribution for each reserved medium at time t. The cost performance of the m-th reservation-type media is the average audience rating P _m ^R (τ) of the m-th reservation-type media at each time τ from the first day of the campaign period to the time t-1 one year before the current time. It is quantified by the average value. The third term is an array whose elements are average values (1/(t−1))ΣP _m ^R (τ) in each medium from the first to the Mth reservation-type media.

第４項は、時点ｔにおける各非予約型媒体の広告配信に関するコストパフォーマンスを表す。第ｎの非予約型媒体のコストパフォーマンスは、キャンペーン期間の初日から現在より一つ前の時点ｔ－１までの各時点τにおける第ｎの非予約型媒体でのＣＲＭ：Ｐ_ｎ ^Ｐ（τ）の平均値で数値化される。第４項は、第１の非予約型媒体から第Ｎの非予約型媒体までの各媒体におけるＣＲＭ平均値（１／（ｔ－１））ΣＰ_ｎ ^Ｐ（τ）を要素に有する配列である。 The fourth term represents the cost performance regarding advertisement distribution for each non-reservation type medium at time t. The cost performance of the n-th non-reservation-type media is the CRM for the n-th non-reservation-type media at each time τ from the first day of the campaign period to the time t-1 one before the current time: P _n ^P (τ) is quantified by the average value of The fourth term is an array whose elements are CRM average values (1/(t−1))ΣP _n ^P (τ) in each medium from the first non-reserved medium to the Nth non-reserved medium. .

第５項は、時点ｔにおいて推定される第１予約型媒体から第Ｍ予約型媒体までの各予約型媒体におけるキャンペーン期間の（１≦τ≦Ｔ）の各日τのＧＲＰ（ＧｒｏｓｓＲａｔｉｎｇＰｏｉｎｔ）の配列である。すなわち、第５項は、１≦ｍ≦Ｍ及び１≦τ≦Ｔの範囲におけるｍ，τの組合せ毎のＧ_ｍ，τ（ｔ）の配列｛Ｇ_ｍ，τ（ｔ）｝_{１≦ｍ≦Ｍ，１≦τ≦Ｔ}である。 The fifth term is the GRP (Gross Rating Point) of τ for each day of the campaign period (1 ≤ τ ≤ T) in each reservation-type medium from the first reservation-type medium to the M-th reservation-type medium estimated at time t. is an array of That is, the fifth term is an array {G _{m, τ (t)} of G m, τ} ₍ t) for each combination of m and τ in the ranges of 1 ≤ m ≤ M and _{1 ≤ τ ≤} T _{M, 1≤τ≤T} .

Ｇ_ｍ，τ（ｔ）は、キャンペーン期間ｔ日目において推定されるＧＲＰであって、配信スケジュールによれば、対応する日τに、対応する予約型媒体（ｍ）で配信される出稿済の未配信予約型広告の推定視聴率の合計である。 G _m,τ (t) is the estimated GRP on the t-th day of the campaign period, and according to the distribution schedule, on the corresponding day It is the total estimated viewership of undelivered reserved ads.

Ｇ_ｍ，τ（ｔ）の計算には、キャンペーン期間の開始後において追加で出稿された予約型広告の配信スケジュールに従う視聴率も考慮される。一方で、配信済の広告に関する視聴率は、ＧＲＰの算出に用いられない点に留意されたい。従って、現時点ｔよりも前の期間τ＜ｔに対応するＧ_ｍ，τ（ｔ）は全てゼロである。配信済の予約型広告は、配信された時点ｔで、上述の通り露出量Ｉ_ｍ ^Ｒ（ｔ）に基づく広告効果として、報酬ｒ（ｔ）に反映される。 The calculation of G _m,τ (t) also takes into account the audience rating according to the distribution schedule of the reservation-type advertisements additionally posted after the start of the campaign period. On the other hand, it should be noted that viewership ratings for ads that have already been delivered are not used to calculate GRP. Therefore, the G _m,τ (t) corresponding to the period τ<t before the current time t are all zero. A distributed reserved advertisement is reflected in a reward r(t) as an advertising effect based on the amount of exposure I _m ^R (t) as described above at the time t when the advertisement is distributed.

第６項は、時点ｔにおける各非予約型媒体に対する出稿予算の残高の配列｛Ｗ_ｎ（ｔ）｝である。残高Ｗ_ｎ（ｔ）は、時点ｔにおける第ｎの非予約型媒体に対する出稿予算の残高である。残高Ｗ_ｎ（ｔ）は、次式に従って更新される。 The sixth term is an array {W _n (t)} of the balance of the advertising budget for each non-reserved medium at time t. The balance W _n (t) is the balance of the ad placement budget for the n-th non-reserved medium at time t. The balance W _n (t) is updated according to the following equation.

上述したように、ｃ_ｎ ^Ｐ（ｔ－１）は、時点ｔ－１において第ｎ非予約型媒体を通じて配信された広告に関する出稿コストである。この他、時点ｔにおける予算残高Ｂ（ｔ）は、次式により算出される。 As described above, c _n ^P (t−1) is the placement cost for the advertisement delivered through the n-th non-reservation type medium at time t−1. In addition, the budget balance B(t) at time t is calculated by the following equation.

ここで関数Ｅ（ｃ_ｎ ^Ｐ（ｔ－１））は、時点ｔ－１における第ｎ非予約型媒体への出稿コストｃ_ｎ ^Ｐ（ｔ－１）が、時点ｔ－１における第ｎ非予約型媒体の予算残高Ｗ_ｎ（ｔ－１）より大きいとき値１を出力し、それ以外のときには、値０を出力する関数である。 Here, the function E(c _n ^P (t−1)) is such that the cost of placing an advertisement on the n-th non-reserved medium at time t−1, c _n ^P (t−1), is the n-th non-reserved medium at time t−1. It is a function that outputs a value of 1 when it is greater than the budget balance W _n (t−1) of the type medium, and outputs a value of 0 otherwise.

すなわち、時点ｔにおける予算残高Ｂ（ｔ）は、時点ｔ－１における予算残高Ｂ（ｔ－１）から、時点ｔ－１において予約型広告及び非予約型広告の出稿のために消化された予備予算の合計を減算した値に対応する。 That is, the budget balance B(t) at time t is the reserve budget B(t-1) at time t-1 that has been consumed for the placement of reserved advertisements and non-reserved advertisements at time t-1. Corresponds to the value minus the total budget.

その後、プロセッサ１１は、Ｓ２４０に処理を戻し、現在の時点ｔ＝２以降に関して、Ｓ２９０で設定した現時点ｔの状態ｓ（ｔ）に基づき、時点ｔにおける予約型媒体及び非予約型媒体のそれぞれに対する出稿量を、強化学習アルゴリズムを用いて決定する。 After that, the processor 11 returns to S240, and regarding the current time t=2 and later, based on the state s(t) at the current time t set in S290, Advertisement volume is determined using a reinforcement learning algorithm.

プロセッサ１１は更に、Ｓ２５０において、出稿量（ｂ_１ ^Ｒ（ｔ），ｂ_２ ^Ｒ（ｔ），…，ｂ_Ｍ ^Ｒ（ｔ），ｂ_１ ^Ｐ（ｔ），ｂ_２ ^Ｐ（ｔ），…，ｂ_Ｎ ^Ｐ（ｔ））に基づく各媒体への広告出稿を指示する。Ｓ２６０において、プロセッサ１１は、各予約型媒体の平均視聴率Ｐ_ｍ ^Ｒ（ｔ）を判別し、各予約型媒体を通じて配信された予約型広告の時点ｔにおける露出量Ｉ_ｍ ^Ｒ（ｔ）＝ｃ_ｍ ^Ｒ（ｔ）×Ｐ_ｍ ^Ｒ（ｔ）を算出する。 Further, in S250, the processor 11 calculates the amount of advertisement (b ₁ ^R (t), b ₂ ^R (t), . . . , b _M ^R (t), b ₁ ^P (t), b ₂ ^P (t), . . . , b _N ^P (t)) to direct the placement of advertisements on each medium. In S260, the processor 11 determines the average audience rating P _m ^R (t) of each reserved medium, and the exposure amount I _m ^R (t) at time t of the reserved advertisement delivered through each reserved medium = c Calculate _mR ⁽ t) x _PmR ⁽ t).

続くＳ２７０において、プロセッサ１１は、各非予約型媒体に関して、時点ｔでのＣＰＭ：Ｐ_ｎ ^Ｐ（ｔ）を判別し、各予約型媒体の時点ｔにおけるインプレッションＩ_ｎ ^Ｐ（ｔ）＝ｃ_ｎ ^Ｐ（ｔ）／Ｐ_ｎ ^Ｐ（ｔ）を算出する。続くＳ２８０において、プロセッサ１１は、報酬ｒ（ｔ）を算出する。続くＳ２９０において、状態ｓ（ｔ）を更新する。 In subsequent S270, the processor 11 determines the CPM at time t for each non-reserved medium: P _n ^P (t), and the impression I _n ^P (t) at time t for each reserved medium = c _n ^P (t)/P _n ^P (t) is calculated. In subsequent S280, the processor 11 calculates the reward r(t). In subsequent S290, the state s(t) is updated.

このようにして、プロセッサ１１は、Ｓ２４０～Ｓ２９０において、状態ｓ（ｔ）に基づく行動選択（すなわち出稿量の決定）、行動に基づく報酬ｒ（ｔ）の算出、状態ｓ（ｔ）の更新を繰返し行うことにより、強化学習アルゴリズムに従って、報酬ｒ（ｔ）である広告効果を最大化する方向に、各時点における予約型媒体及び非予約型媒体に対する出稿量を、予備予算Ｂｒを使用しながら決定する。行動選択に関するポリシーは、強化学習アルゴリズムに従って、報酬ｒ（ｔ）に基づき調整される。 In this way, the processor 11 selects an action based on the state s(t) (that is, decides the amount of ad placement) based on the state s(t), calculates the reward r(t) based on the action, and updates the state s(t) in steps S240 to S290. By repeating this, according to the reinforcement learning algorithm, the amount of advertisements for reserved media and non-reserved media at each point in time in the direction of maximizing the advertising effect, which is the reward r(t), is determined using the reserve budget Br. do. A policy for action selection is adjusted based on the reward r(t) according to a reinforcement learning algorithm.

プロセッサ１１は、キャンペーン期間が終了することにより、終了条件が満足されると（Ｓ３００でＹｅｓ）、第二の出稿関連処理を終了する。 When the end condition is satisfied by the end of the campaign period (Yes in S300), the processor 11 ends the second placement-related process.

以上に説明した本実施形態の情報処理システム１によれば、強化学習アルゴリズムに従って、予約型媒体及び非予約型媒体を含む複数の媒体への出稿を、予め定められた広告予算で広告効果を最大化するように決定する。 According to the information processing system 1 of the present embodiment described above, according to the reinforcement learning algorithm, advertisements on a plurality of media including reserved media and non-reserved media are maximized with a predetermined advertising budget. decide to

予約型広告には、出稿から広告効果が得られるまでに遅延がある。この遅延のために、単なる強化学習アルゴリズムでは、適切な行動選択ができないところ、本実施形態では、状態ｓ（ｔ）に、未配信広告のＧＲＰの情報を付与することによって、この問題を解決した。 With reserved advertising, there is a delay from the time the ad is placed until the advertising effect is achieved. Due to this delay, a simple reinforcement learning algorithm cannot select an appropriate action. In this embodiment, this problem is solved by adding GRP information of undelivered advertisements to the state s(t). .

本実施形態では、予約型広告における出稿から配信までのタイムラグの情報を、未配信広告のＧＲＰの形態により、状態ｓ（ｔ）として保持するために、遅延の影響を抑えて、広告効果を最大化するように、予約型媒体及び非予約型媒体を含む複数の媒体への出稿量を決定することができる。 In this embodiment, in order to hold information on the time lag from placement to delivery of reserved advertisements as a state s(t) in the form of GRP of undelivered advertisements, the effects of delays are suppressed and advertisement effects are maximized. Advertisement amounts can be determined for a plurality of media, including reserved media and non-reserved media, so as to be more flexible.

従って、本実施形態によれば、予約型媒体及び非予約型媒体を含む異なる媒体を横断した広告出稿に有意義な情報処理システム１を提供することができる。 Therefore, according to the present embodiment, it is possible to provide an information processing system 1 that is meaningful for placing advertisements across different media including reserved media and non-reserved media.

以上には、予約型広告がテレビＣＭであり、非予約型広告がデジタル配信される運用型広告である例を説明したが、予約型広告は、ラジオＣＭであってもよいし、デジタル配信される広告であってもよい。 In the above example, the reserved advertisement is a TV commercial and the non-reserved advertisement is a programmatic advertisement that is digitally distributed. It may be an advertisement that

この他、キャンペーン期間開始前の時点ｔ＝０において、非予約型広告の最低出稿予算ｂ_１ ^Ｐ（０），ｂ_２ ^Ｐ（０），…，ｂ_Ｎ ^Ｐ（０）がゼロとなる状態を回避するために、報酬ｒ（ｔ）には、負の報酬ｒ１（ｔ）が追加されてもよい。 In addition, at time t=0 before the _start of the campaign period, the minimum placement budget b ₁ ^P (0), b ₂ ^P (0), ^. To avoid, a negative reward r1(t) may be added to the reward r(t).

この報酬ｒ１（ｔ）によれば、ｂ_ｎ ^Ｐ（０）が小さいほど大きな負の報酬ｒ１（ｔ）が算出することになるため、結果として、行動選択のポリシーとして、キャンペーン期間開始前あるいはキャンペーン期間の早い時期の出稿に積極的な、出稿量の決定が行われることになる。 According to this reward r1(t), the smaller b _n ^P (0) is, the larger negative reward r1(t) is calculated. The amount of advertisements to be placed will be determined positively for the placement of ads early in the period.

プロセッサ１１は、この報酬ｒ１（ｔ）のチューニングパラメータαの情報を、出稿関連処理の第一ステップ（Ｓ１１０）において、ストレージ１３を通じて又は入力デバイス１７を通じて取得することができる。αは、正の実数である。 The processor 11 can acquire information on the tuning parameter α of this reward r1(t) through the storage 13 or through the input device 17 in the first step (S110) of the advertisement placement related process. α is a positive real number.

［第二実施形態］
続いて第二実施形態の情報処理システム１の詳細を、図５及び図６を用いて説明する。第二実施形態の情報処理システム１は、出稿量の段階的決定に、強化学習アルゴリズムに代えて、コンテキスチュアルバンデットアルゴリズムを用いる点で、第一実施形態とは異なるが、その他の点において、基本的に第一実施形態の情報処理システム１と同様に構成される。情報処理システム１のハードウェア構成は、第一実施形態と同じである。従って、以下では、コンテキスチュアルバンデットアルゴリズムに関係する処理の内容を選択的に説明し、その他の説明を省略する。 [Second embodiment]
Next, details of the information processing system 1 of the second embodiment will be described with reference to FIGS. 5 and 6. FIG. The information processing system 1 of the second embodiment differs from the first embodiment in that it uses a contextual bandit algorithm instead of a reinforcement learning algorithm for stepwise determination of the amount of advertisements. It is basically configured in the same manner as the information processing system 1 of the first embodiment. The hardware configuration of the information processing system 1 is the same as that of the first embodiment. Therefore, the contents of the processing related to the contextual bandit algorithm will be selectively described below, and other descriptions will be omitted.

知られているように、強化学習アルゴリズムは、定義された状態、行動、及び報酬に基づいて、報酬を最大化するように行動を選択する。第一実施形態によれば、行動は、各媒体への出稿量の決定である。 As is known, reinforcement learning algorithms select actions to maximize rewards based on defined states, actions, and rewards. According to the first embodiment, the action is determination of the amount of advertisements to be placed on each medium.

これに対し、コンテキスチュアルバンデットアルゴリズムは、定義されたコンテキスト、行動、及び報酬に基づいて、報酬を最大化するように行動を選択するアルゴリズムである。第二実施形態における、行動及び報酬は、第一実施形態と同様に定義される。すなわち、行動は、各媒体への出稿量の決定である。報酬は、各媒体を通じた広告配信により得られる広告効果である。 In contrast, contextual bandit algorithms are algorithms that select actions to maximize reward based on defined context, actions, and rewards. Actions and rewards in the second embodiment are defined in the same way as in the first embodiment. In other words, action is the determination of the amount of advertisements to be placed on each medium. Remuneration is the advertising effect obtained by distributing advertisements through each medium.

本実施形態では、プロセッサ１１が、ユーザからの指令に基づき、出稿関連処理として、コンテキスチュアルバンデットアルゴリズムを用いた第一の出稿関連処理（図５参照）及び第二の出稿関連処理（図６参照）を実行する。 In the present embodiment, the processor 11, based on a command from the user, performs first advertisement-related processing (see FIG. 5) and second advertisement-related processing (see FIG. 6) using a contextual bandit algorithm as advertisement-related processing. ).

処理内容を説明するにあたって変数等を次のように定義する。キャンペーン期間の日数は、Ｔである。予約型広告の媒体数は、Ｍであり、Ｍ個の予約型媒体は、第ｍの予約型媒体を含む。ｍは、１，２，…，Ｍまでの範囲の整数値を採る。非予約型広告の媒体数は、Ｎであり、Ｎ個の非予約型媒体は、第ｎの非予約型媒体を含む。ｎは、１，２，…，Ｎまでの範囲の整数値を採る。行動空間Ａは、各次元が、出稿量の選択肢に対応する複数の要素をもつ、（Ｍ＋Ｎ）次元の離散行動空間である。 Variables and the like are defined as follows when explaining the processing contents. The number of days in the campaign period is T. The number of reserved advertisement media is M, and the M reserved media include the m-th reserved medium. m takes an integer value ranging from 1, 2, . . . The number of non-reserved advertisement media is N, and the N non-reserved media include the n-th non-reserved media. n takes an integer value ranging from 1, 2, . . . The action space A is an (M+N)-dimensional discrete action space in which each dimension has a plurality of elements corresponding to options for the amount of advertisements.

時点ｔにおいて、第ｍの予約型媒体で平均視聴率Ｐ_ｍ ^Ｒ（ｔ）が観測される。時点ｔにおいて、第ｎの非予約型媒体で、ＣＰＭ：Ｐ_ｎ ^Ｐ（ｔ）が観測される。時点ｔにおいて推定される「第ｍの予約型媒体を通じてキャンペーン期間のτ日目に配信される予約型広告のＧＲＰ」が、Ｇ_ｍ，τ（ｔ）である。
時点ｔ＞１におけるコンテキストｘ_ｔ、及び、時点ｔ＝０におけるコンテキストｘ_０は、次式で表される。 At time t, an average audience rating P _m ^R (t) is observed on the m-th reservation-type medium. At time t, CPM:P _n ^P (t) is observed on the nth non-reserved medium. The “GRP of the reserved advertisement delivered on the τ day of the campaign period through the m-th reserved medium” estimated at time t is G _m,τ (t).
The context x _t at time t>1 and the context x ₀ at time t=0 are represented by the following equations.

第一の出稿関連処理（図５参照）において、プロセッサ１１は、まず、指定条件に関する情報を、Ｓ１１０の処理と同様に取得する（Ｓ３１０）。続くＳ３２０において、プロセッサ１１は、キャンペーン期間開始前の時点ｔ＝０での広告出稿に関する行動を、コンテキスチュアルバンデットアルゴリズムに従って選択する。 In the first ad placement-related process (see FIG. 5), the processor 11 first acquires information on the specified condition in the same manner as in the process of S110 (S310). In subsequent S320, the processor 11 selects an action regarding advertisement placement at time t=0 before the start of the campaign period according to the contextual bandit algorithm.

具体的には、プロセッサ１１は、状態ｓ（０）に代わる、与えられたコンテキストｘ_０に基づき、時点ｔ＝０で選択し得る各行動のＵＣＢ（ＵｐｐｅｒＣｏｎｆｉｄｅｎｃｅＢｏｕｎｄ）スコアを計算し、ＵＣＢスコアが最大となる行動ａ^＊（０）＝（ｂ_１ ^Ｒ（０），ｂ_２ ^Ｒ（０），…，ｂ_ｍ ^Ｒ（０），…，ｂ_Ｍ ^Ｒ（０），ｂ_１ ^Ｐ（０），ｂ_２ ^Ｐ（０），…，ｂ_ｎ ^Ｐ（０），…，ｂ_Ｎ ^Ｐ（０））を選択する。 Specifically, the processor 11 calculates the UCB (Upper Confidence Bound) score of each action that can be selected at time t=0 based on the given context x ₀ instead of the state s(0), and the UCB score a ^* ( ₀ )=( _b1R (0), ^{b2R(0),...,bmR} ₍ ⁰ ₎ ^, ..., _bMR ( ⁰ ), ^b1P (0) , b ₂ ^P (0 _{), . . . , b n} ^P ₍ ⁰ ), .

これにより、複数の予約型媒体、及び、複数の非予約型媒体に対する最低出稿予算ｂ_１ ^Ｒ（０），ｂ_２ ^Ｒ（０），…，ｂ_ｍ ^Ｒ（０），…，ｂ_Ｍ ^Ｒ（０），ｂ_１ ^Ｐ（０），ｂ_２ ^Ｐ（０），…，ｂ_ｎ ^Ｐ（０），…，ｂ_Ｎ ^Ｐ（０）を決定する。 As a result, the ^minimum ad budgets b ₁ ^R (0), b ₂ ^R (0), . . . , _{b m} _R ⁽ 0), . ⁰ ), b ₁ ^P (0), b ₂ ^P (0 _{), . . . , b n} _P ⁽ 0), .

続くＳ３３０において、プロセッサ１１は、Ｓ１３０での処理と同様に、出稿計画として、Ｓ３２０で決定された媒体毎の最低出稿予算ｂ_１ ^Ｒ（０），ｂ_２ ^Ｒ（０），…，ｂ_Ｍ ^Ｒ（０），ｂ_１ ^Ｐ（０），ｂ_２ ^Ｐ（０），…，ｂ_Ｎ ^Ｐ（０）を出力する。その後、プロセッサ１１は、第一の出稿関連処理を終了する。 In subsequent S330, the processor 11 sets the minimum advertising budgets _b ₁ ^R (0), b ₂ ^R ( ⁰ ), . (0), b ₁ ^P (0), b ₂ ^P (0), . . . , b _N ^P (0). After that, the processor 11 terminates the first publication-related processing.

第一の出稿関連処理の終了後、プロセッサ１１は、出稿計画に基づく処理の実行指示が入力デバイス１７を通じてユーザから入力されると、図６に示す第二の出稿関連処理を開始する。第二の出稿関連処理において、プロセッサ１１は、出稿計画に従って予め出稿された予約型広告の配信スケジュールを、ストレージ１３から読み込む（Ｓ４１０）。 After finishing the first advertisement related process, the processor 11 starts the second advertisement related process shown in FIG. In the second placement-related process, the processor 11 reads from the storage 13 the distribution schedule of the reservation-type advertisement placed in advance according to the placement plan (S410).

更にプロセッサ１１は、コンテキスチュアルバンデットアルゴリズムに従って各媒体への出稿量を決定するために、キャンペーン期間の開始時点であるキャンペーン期間１日目の時点ｔ＝１におけるコンテキストｘ_１及び予算残高Ｂ（１）を次のように設定する（Ｓ４３０）。 Furthermore _, the processor 11 determines the amount of advertisements to be placed on each medium according to the contextual bandit algorithm. is set as follows (S430).

プロセッサ１１は、Ｓ４１０で取得した配信スケジュールに従って、第一実施形態と同様に、配列｛Ｇ_ｍ，τ（１）｝_{１≦ｍ≦Ｍ，１≦τ≦Ｔ}を生成することができる。 The processor 11 can generate the array {G _{m, τ} (1)} _{1≦m≦M, 1≦τ≦T} according to the distribution schedule acquired in S410, as in the first embodiment.

続くＳ４４０において、プロセッサ１１は、現在の時点ｔ＝１に関し、この時点ｔのコンテキストｘ_ｔに基づき、選択し得る各行動のＵＣＢスコアを計算し、ＵＣＢスコアが最大となる行動ａ^＊（ｔ）＝（ｂ_１ ^Ｒ（ｔ），ｂ_２ ^Ｒ（ｔ），…，ｂ_ｍ ^Ｒ（ｔ），…，ｂ_Ｍ ^Ｒ（ｔ），ｂ_１ ^Ｐ（ｔ），ｂ_２ ^Ｐ（ｔ），…，ｂ_ｎ ^Ｐ（ｔ），…，ｂ_Ｎ ^Ｐ（ｔ））を選択する。 At S440, the processor 11 calculates the UCB score of each action that can be selected based on the context x _t at this time t for the current time t=1, and the action a ^* (t) with the maximum UCB score =( _b1R (t), ^b2R (t), ^... , ^bmR (t) _, ^... , _bMR (t), _b1P ( ^t ), _b2P ( ^t ), _... , Choose _bnP ⁽ t),..., _bNP ⁽ t)).

これにより、プロセッサ１１は、時点ｔでの広告出稿に関する行動として、時点ｔにおける予約型媒体及び非予約型媒体のそれぞれに対する出稿量（ｂ_１ ^Ｒ（ｔ），ｂ_２ ^Ｒ（ｔ），…，ｂ_Ｍ ^Ｒ（ｔ），ｂ_１ ^Ｐ（ｔ），ｂ_２ ^Ｐ（ｔ），…，ｂ_Ｎ ^Ｐ（ｔ））を、コンテキスチュアルバンデットアルゴリズムを用いて決定する。ｂ_ｍ ^Ｒ（ｔ）は、時点ｔにおける第ｍの予約型媒体に対する予約型広告の出稿量である。ｂ_ｎ ^Ｐ（ｔ）は、時点ｔにおける第ｎの非予約型媒体に対する非予約型広告の出稿量である。 As a result, the processor 11 calculates the amount of advertisements (b ₁ ^R (t), b ₂ ^R (t), . . . , b _MR (t), b ₁ ^P (t), b ₂ ^P (t), . . . , b _NP (t)) are ^determined using the contextual bandit ^algorithm . b _m ^R (t) is the placement amount of reserved advertisements for the m-th reserved medium at time t. b _n ^P (t) is the amount of non-reserved advertisements placed on the n-th non-reserved medium at time t.

続くＳ４５０において、プロセッサ１１は、出稿量（ｂ_１ ^Ｒ（ｔ），ｂ_２ ^Ｒ（ｔ），…，ｂ_Ｍ ^Ｒ（ｔ），ｂ_１ ^Ｐ（ｔ），ｂ_２ ^Ｐ（ｔ），…，ｂ_Ｎ ^Ｐ（ｔ））に基づく各媒体への広告出稿を指示する。 In subsequent S450, the processor 11 calculates the ad placement amount ( _b1R ( ^t ), _b2R (t), ^... , _bMR ( ^t ), _b1P ( ^t ), _b2P ( ^t ),..., b _N ^P (t)) to direct the placement of advertisements on each medium.

続くＳ４６０において、プロセッサ１１は、第１の予約型媒体から第Ｍの予約型媒体までの各予約型媒体を通じて配信された予約型広告の平均視聴率Ｐ_ｍ ^Ｒ（ｔ）を、対応する情報の取得により判別し、Ｓ２６０での処理と同様に、各予約型媒体の時点ｔにおける露出量Ｉ_ｍ ^Ｒ（ｔ）＝ｃ_ｍ ^Ｒ（ｔ）×Ｐ_ｍ ^Ｒ（ｔ）を算出する。 In subsequent S460, the processor 11 calculates the average audience rating P _m ^R (t) of the reservation-type advertisement distributed through each reservation-type medium from the first reservation-type medium to the M-th reservation-type medium to the corresponding information. The amount of exposure I _m ^R (t)=c _m ^R (t)×P _m ^R (t) at time t for each reservation-type medium is calculated in the same manner as in the processing in S260.

続くＳ４７０において、プロセッサ１１は、第１の非予約型媒体から第Ｎの非予約型媒体までの各非予約型媒体に関して、時点ｔで非予約型媒体を通じて配信される非予約型広告のＣＰＭ：Ｐ_ｎ ^Ｐ（ｔ）を、対応する情報の取得により判別し、Ｓ２７０での処理と同様に、各予約型媒体の時点ｔにおけるインプレッションＩ_ｎ ^Ｐ（ｔ）＝ｃ_ｎ ^Ｐ（ｔ）／Ｐ_ｎ ^Ｐ（ｔ）を算出する。 In subsequent S470, the processor 11 determines the CPM of non-reserved advertisements distributed through the non-reserved media at time t for each non-reserved medium from the first non-reserved medium to the Nth non-reserved medium: _PnP (t) is determined by obtaining the corresponding ^information , and ^similar to the processing at S270, _{impressions InP(t) at time t for each reserved medium = cnP} ₍ ^t )/ _Pn Calculate ^P (t).

続くＳ４８０において、プロセッサ１１は、時点ｔにおける報酬ｒ（ｔ）を、Ｓ２８０での処理と同様に、式ｒ（ｔ）＝Ｚ（ｔ）－Ｚ（ｔ－１）に従って算出する。予算残高Ｂ（ｔ）がゼロ未満の場合の報酬ｒ（ｔ）は、負の報酬－Ｃである（ｒ（ｔ）＝－Ｃ）。 In subsequent S480, the processor 11 calculates the reward r(t) at time t according to the formula r(t)=Z(t)-Z(t-1), as in the processing in S280. The reward r(t) when the budget balance B(t) is less than zero is the negative reward −C (r(t)=−C).

続くＳ４９０において、プロセッサ１１は、次の時点ｔの到来を待ち、その後、コンテキストｘ_ｔを更新する。すなわち、プロセッサ１１は、コンテキストｘ_ｔを次のように設定する。 In subsequent S490, the processor 11 waits for the arrival of the next time point t and then updates the context _xt . That is, processor 11 sets context x _t as follows.

Ｇ_ｍ，τ（ｔ）の計算は、第一実施形態におけるＳ２９０での処理と同様である。時点ｔにおける予算残高Ｂ（ｔ）は、次のように計算される。 The calculation of G _m,τ (t) is the same as the processing in S290 in the first embodiment. The budget balance B(t) at time t is calculated as follows.

その後、プロセッサ１１は、Ｓ４４０に処理を戻し、現在の時点ｔ＝２以降に関して、Ｓ４９０で設定した、現時点ｔのコンテキストｘ_ｔに基づき、時点ｔでの広告出稿に関する行動として、時点ｔにおける予約型媒体及び非予約型媒体のそれぞれに対する出稿量を、コンテキスチュアルバンデットアルゴリズムを用いて決定する。更に、Ｓ４５０において、出稿量（ｂ_１ ^Ｒ（ｔ），ｂ_２ ^Ｒ（ｔ），…，ｂ_Ｍ ^Ｒ（ｔ），ｂ_１ ^Ｐ（ｔ），ｂ_２ ^Ｐ（ｔ），…，ｂ_Ｎ ^Ｐ（ｔ））に基づく各媒体への広告出稿を指示する。 After that, the processor 11 returns the process to S440, and regarding the current time t=2 or later, based on the context x _t of the current time t set in S490, the reservation type Advertisement volumes for each medium and non-reserved medium are determined using a contextual bandit algorithm. Further, in S450, the ad placement amount (b ₁ ^R (t), b ₂ ^R (t), . . . , b _M ^R (t), b ₁ ^P (t), b ₂ ^P (t), . . . , b _N ^P (t)) to direct the placement of advertisements on each medium.

このようにして、プロセッサ１１は、Ｓ４４０～Ｓ４９０において、コンテキストｘ_ｔに基づく行動選択（すなわち出稿量の決定）、行動に基づく報酬ｒ（ｔ）の算出、コンテキストｘ_ｔの更新を繰返し行うことにより、コンテキスチュアルバンデットアルゴリズムに従って、報酬ｒ（ｔ）である広告効果を最大化する方向に、各時点における予約型媒体及び非予約型媒体に対する出稿量を決定する。 In this way, the processor 11 repeatedly selects an action based on the context x _t (that is, determines the amount of advertisement), calculates the reward r(t) based on the action, and updates the context x _t in steps S440 to S490. , according to the contextual bandit algorithm, determines the amount of advertisements to be placed on reserved media and non-reserved media at each point in the direction of maximizing the advertisement effect, which is the reward r(t).

プロセッサ１１は、キャンペーン期間が終了することにより、終了条件が満足されると（Ｓ５００でＹｅｓ）、第二の出稿関連処理を終了する。 When the end condition is satisfied by the end of the campaign period (Yes in S500), the processor 11 ends the second placement-related process.

以上に説明した本実施形態の情報処理システム１によれば、コンテキスチュアルバンデットアルゴリズムに従って、更には予約型広告における出稿から配信までのタイムラグを考慮して、広告効果を最大化するように、予約型媒体及び非予約型媒体を含む複数の媒体への出稿を適切に決定することができる。従って、この情報処理システム１は、予約型媒体及び非予約型媒体を含む異なる媒体を横断した広告出稿に有意義である。 According to the information processing system 1 of the present embodiment described above, the reservation-type Appropriate decisions can be made regarding placement on a plurality of media, including media and non-reserved media. Therefore, the information processing system 1 is significant for placing advertisements across different media including reserved media and non-reserved media.

［その他の実施形態］
本開示は、上記実施形態に限定されるものではなく、種々の態様を採ることができる。
以上には、動的最適化アルゴリズムとして、強化学習アルゴリズム（第一実施形態）及びコンテキスチュアルバンデットアルゴリズム（第二実施形態）を用いて、予約型媒体及び非予約型媒体を含む異なる媒体を横断した最適な広告出稿を実現する情報処理システム１の例を説明した。 [Other embodiments]
The present disclosure is not limited to the above embodiments, and can take various forms.
Above, as dynamic optimization algorithms, reinforcement learning algorithm (first embodiment) and contextual bandit algorithm (second embodiment) were used to cross different media including reserved media and non-reserved media. An example of the information processing system 1 that realizes optimal advertisement placement has been described.

しかしながら、動的最適化アルゴリズムとして、他のアルゴリズムを用いて同様の機能が実現されてもよい。例えば、情報処理システム１は、状態空間モデル、特にはカルマンフィルタを用いて、キャンペーン期間における各時点での予約型媒体及び非予約型媒体への出稿量を決定するように構成されてもよい。 However, similar functionality may be achieved using other algorithms as dynamic optimization algorithms. For example, the information processing system 1 may be configured to use a state space model, particularly a Kalman filter, to determine the amount of advertisements placed on reservation-type media and non-reservation-type media at each point in the campaign period.

カルマンフィルタは、状態方程式と観測方程式とから構成される動的制御手法の一種である。線形モデルにおける状態方程式及び観測方程式は、次式によって表され、状態量ｘ_ｔと、観測量ｙ_ｔと、制御入力ｕ_ｔとの関係を定義する。

A Kalman filter is a kind of dynamic control technique that consists of a state equation and an observation equation. A state equation and an observation equation in a linear model are represented by the following equations, and define the relationship between the state quantity _xt , the observation quantity _yt , and the control input _ut .

非線形モデルを含む一般化モデルの状態方程式及び観測方程式は、次の通りである。

The state equation and observation equation of the generalized model including the nonlinear model are as follows.

本開示の技術思想を、カルマンフィルタを用いて実現する場合、制御入力ｕ_ｔに対して、時点ｔにおける各媒体での出稿量を割り当てることができる。すなわち、制御入力ｕ_ｔを強化学習アルゴリズムにおける行動に対応付けることができる。 When the technical idea of the present disclosure is implemented using a Kalman filter, it is possible to allocate the ad placement amount for each medium at time t to the control input u _t . That is, control inputs u _t can be mapped to actions in a reinforcement learning algorithm.

状態量ｘ_ｔに対しては、時点ｔにおける予算残高、ＣＰＭ、視聴率などの割り当てることができる。すなわち、状態量ｘ_ｔを、強化学習アルゴリズムにおける状態ｓ（ｔ）に対応付けることができる。観測量ｙ_ｔに対しては、時点ｔにおける広告効果を割り当てることができる。すなわち、観測量ｙ_ｔを、強化学習アルゴリズムにおける報酬ｒ（ｔ）に対応付けることができる。 The budget balance, CPM, audience rating, etc. at time t can be assigned to the state quantity _xt . That is, the state quantity _xt can be associated with the state s(t) in the reinforcement learning algorithm. For the observable _yt , the advertising effectiveness at time t can be assigned. That is, the observable _yt can be mapped to the reward r(t) in the reinforcement learning algorithm.

カルマンフィルタを用いる場合、予約型広告における出稿から配信までの遅延は、時刻ｔ＝０の制御入力ｕ_０が観測量ｙ_ｔに影響を与えるとして、観測方程式ｙ_ｔ＝ｈ（ｘ_ｔ，ｕ_０，ｕ_ｔ，ｖ_ｔ）を定義することにより、モデル化することができる。すなわち、配信スケジュールは、制御入力ｕ_ｔと観測量ｙ_ｔとの関係を数式表現することにより、観測方程式に組込可能である。この観測方程式は、時刻ｔ＝０での予約型広告の出稿に関する制御入力ｕ_０が観測量ｙ_ｔに影響を与えることを示している。 When using the Kalman _filter , the delay from placing an advertisement to distribution in reserved advertisements is an observation equation y _t = _h (x _t , u ₀ , u _t , v _t ). That is, the delivery schedule can be incorporated into the observation equation by formulating the relationship between the control input u _t and the observable y _t . This observation equation indicates that the control input u ₀ regarding the placement of the reserved advertisement at time t=0 affects the observable quantity y _t .

本開示の技術思想は、ＰＩＤ制御の手法を用いて実現されてもよい。報酬の指標には、リーチやフリークエンシー等の広告接触指標の他、生活者意識に関する指標が用いられてもよい。生活者意識に関する指標には、認知、興味、関心、理解、購入意向、第一想起、及び、継続購入意向の指標が含まれ得る。 The technical idea of the present disclosure may be realized using a PID control technique. As the reward index, in addition to an advertisement exposure index such as reach and frequency, an index related to consumer awareness may be used. Consumer awareness indicators may include awareness, interest, interest, comprehension, purchase intent, first thought, and repeat purchase intent.

報酬の指標には、広告主側で得られる指標、具体的には、広告対象の商品の売上金額、売上個数、商品紹介サイトへのアクセス数、広告対象のサービスの利用数、関連するアプリケーションソフトウェアのインストール数、ＭＡＵ（ＭｏｎｔｈｌｙＡｃｔｉｖｅＵｓｅｒｓ）、及び、広告に関連する問い合わせ件数が用いられてもよい。 Reward indicators include indicators obtained by the advertiser, specifically sales amount and number of sales of the advertised product, number of accesses to the product introduction site, number of uses of the advertised service, and related application software. , the number of monthly active users (MAU), and the number of inquiries related to the advertisement may be used.

広告出稿に関する行動選択を行う上で考慮すべき状態に関する指標には、リーチや獲得に要した単価に関する指標、具体的には、ＣＰＭ、ＣＰＣ（ＣｏｓｔＰｅｒＣｌｉｃｋ）、ＣＰＡ（ＣｏｓｔｐｅｒＡｃｑｕｉｓｉｔｉｏｎ）、及び、ＲＯＩ（Ｒｅｔｕｒｎｏｎｉｎｖｅｓｔｍｅｎｔ）等が含まれていてもよい。 Indicators related to the state that should be considered when making action choices regarding advertisement placement include indicators related to reach and the unit price required for acquisition, specifically CPM, CPC (Cost Per Click), CPA (Cost per Acquisition), and , ROI (Return on investment), etc. may be included.

この他、考慮すべき状態に関する指標には、広告パフォーマンスに関する指標、具体的には、ＣＴＲ（ＣｌｉｃｋＴｈｒｏｕｇｈＲａｔｅ）、ＣＶＲ（ＣｏｎｖｅｒｓｉｏｎＲａｔｅ）、ＶＴＲ（ＶｉｅｗＴｈｒｏｕｇｈＲａｔｅ）、及び、ＴＡＲＰ（ＴａｒｇｅｔＡｕｄｉｅｎｃｅＲａｔｉｎｇＰｏｉｎｔ）等が含まれていてもよい。 In addition, indicators related to the state to be considered include indicators related to advertising performance, specifically, CTR (Click Through Rate), CVR (Conversion Rate), VTR (View Through Rate), and TARP (Target Audience Rating Point). ) etc. may be included.

また、強化学習の報酬として用いる指標に依存して、状態の指標に好ましい指標は異なる。例えば、強化学習の報酬として、リーチを採用する場合であって、予約型広告がテレビジョン放送による配信である場合、状態の指標に好ましい指標の例は、視聴率（又はＧＲＰ）や注視率であり、予約型広告がデジタル配信である場合、状態の指標に好ましい指標の例は、ＣＰＭ及びｖＣＰＭ（ｖｉｅｗａｂｌｅＣｏｓｔＰｅｒＭｉｌｌｅ）である。非予約型広告についても同様である。 In addition, the preferred state index differs depending on the index used as the reward for reinforcement learning. For example, when reach is adopted as a reward for reinforcement learning, and when the reserved advertisement is distributed by television broadcasting, examples of indicators that are preferable for the status indicator are audience rating (or GRP) and attention rate. Yes, and if the reserved advertisement is a digital delivery, examples of preferred metrics for status metrics are CPM and vCPM (viewable Cost Per Mille). The same is true for non-reserved advertisements.

強化学習の報酬として、売上数を採用する場合であって、予約型広告がテレビジョン放送による配信である場合、状態の指標に好ましい指標の例は、視聴率（又はＧＲＰ）及び注視率であり、予約型広告がデジタル配信である場合、状態の指標に好ましい指標の例は、ＣＰＭ、コンバージョン数、ＣＰＡ、及びＣＰＣである。 When the number of sales is used as a reward for reinforcement learning, and when the reserved advertisement is distributed by television broadcasting, examples of indicators that are preferable for the status indicator are the audience rating (or GRP) and the attention rate. , if the scheduled ad is digital delivery, examples of preferred metrics for the state metrics are CPM, conversions, CPA, and CPC.

非予約型広告がテレビジョン放送による配信である場合、状態の指標に好ましい指標の例は、コンバージョン数、及びＣＰＡであり、非予約型広告がデジタル配信である場合、状態の指標に好ましい指標の例は、コンバージョン数、ＣＰＡ、及びＣＰＣである。 If the non-reserved advertisement is distributed by television broadcasting, examples of preferable indicators for the status indicator are the number of conversions and CPA. Examples are conversions, CPA, and CPC.

上記実施形態では、複数の予約型媒体、及び、複数の非予約型媒体に対する広告出稿を例に挙げたが、出稿対象の予約型媒体及び非予約型媒体の少なくとも一方は、一つのみであってもよい。あるいは、複数の予約型媒体に対して共用の出稿量が決定されてもよく、同様に、複数の非予約型媒体に対して共用の出稿量が決定されてもよい。 In the above-described embodiment, advertisement placement on a plurality of reservation-type media and a plurality of non-reservation-type media was taken as an example. may Alternatively, a common ad placement amount may be determined for a plurality of reserved media, and similarly a shared ad placement amount may be determined for a plurality of non-reserved media.

上記実施形態は、予約型広告の追加出稿の概念を含むが、予約型広告の予約取消（換言すれば出稿取消）の概念が更に含まれていてもよい。すなわち、動的最適化アルゴリズムにおける行動の選択肢の中には、出稿済予約型広告の予約取消の行動が含まれてもよい。 Although the above embodiments include the concept of additional placement of reserved advertisements, the concept of canceling reservation of reserved advertisements (in other words, canceling the placement of advertisements) may be further included. That is, the action options in the dynamic optimization algorithm may include the action of canceling the posted reservation-type advertisement.

この取消行動は、広告枠の買付額の払い戻しを、予約取消により受けることに対応する。従って、動的最適化アルゴリズムには、負の出稿量を導入することにより、予約取消の概念を導入することができる。このように本開示は、予約取消を考慮して、出稿量を決定し得る。強化学習アルゴリズムによれば、負の出稿量の決定に際して、状態として保持する広告の配信スケジュールを削減し、予算残高Ｂ（ｔ）を増加方向に変更し得る。 This cancel action corresponds to receiving a refund of the purchase price of the advertising space by canceling the reservation. Therefore, the concept of reservation cancellation can be introduced into the dynamic optimization algorithm by introducing negative ad placement. In this manner, the present disclosure may determine the ad placement amount in consideration of reservation cancellation. According to the reinforcement learning algorithm, when determining a negative amount of advertisements, it is possible to reduce the advertisement distribution schedule held as a state and increase the budget balance B(t).

上記実施形態における１つの構成要素が有する機能は、複数の構成要素に分散して設けられてもよい。複数の構成要素が有する機能は、１つの構成要素に統合されてもよい。上記実施形態の構成の一部は、省略されてもよい。上記実施形態の構成の少なくとも一部は、他の上記実施形態の構成に対して付加又は置換されてもよい。特許請求の範囲に記載の文言から特定される技術思想に含まれるあらゆる態様が本開示の実施形態である。 A function possessed by one component in the above embodiment may be distributed to a plurality of components. Functions possessed by multiple components may be integrated into one component. A part of the configuration of the above embodiment may be omitted. At least part of the configurations of the above embodiments may be added or replaced with respect to the configurations of other above embodiments. All aspects included in the technical ideas specified from the language in the claims are embodiments of the present disclosure.

［本明細書が開示する技術思想］
本明細書には、次の技術思想が開示されていると理解することができる。
［項目１］
第一の媒体を通じて露出される予約型広告に対する出稿量である第一の出稿量、及び、第二の媒体を通じて露出される非予約型広告に対する出稿量である第二の出稿量を決定し、前記第一の出稿量に基づいた前記第一の媒体に対する広告出稿、及び、前記第二の出稿量に基づいた前記第二の媒体に対する広告出稿を指示するように構成される出稿指示部と、
前記第一の媒体を通じて露出された前記予約型広告の露出に関する実績である第一の実績を判別するように構成される第一実績判別部と、
前記第二の媒体を通じて露出された前記非予約型広告の露出に関する実績である第二の実績を判別するように構成される第二実績判別部と、
前記予約型広告のうち、露出待ちにある未露出広告の露出スケジュールを判別するように構成されるスケジュール判別部と、
を備え、
前記予約型広告及び前記非予約型広告に対して可能な出稿量の総量は、予め定められており、
前記出稿指示部は、複数の時点に関し、時点毎に、可能な残り出稿量と、前記未露出広告の露出スケジュールと、前記第一の実績と、前記第二の実績とに基づき、前記第一の出稿量及び前記第二の出稿量を決定する情報処理システム。
［項目２］
前記総量のうち、第一の量が、前記予約型広告に対する出稿量として予め定められ、第二の量が、前記予約型広告及び前記非予約型広告に共用の出稿量として定められ、前記出稿指示部には、前記第二の量の前記予約型広告及び前記非予約型広告に対する配分の決定権が与えられており、
前記第一の量の少なくとも一部に対応する前記予約型広告の広告出稿は、前記複数の時点よりも前の時点である最初の時点で完了しており、前記露出スケジュールは、前記最初の時点で出稿済の前記予約型広告であって、露出待ちにある未露出広告の露出スケジュールを含み、
前記出稿指示部は、前記最初の時点で出稿済の前記予約型広告を含む未露出広告の露出スケジュールを加味して、前記時点毎に、前記第一の出稿量及び前記第二の出稿量を決定する項目１記載の情報処理システム。
［項目３］
前記第一の出稿量は、前記予約型広告の出稿金額であり、
前記第二の出稿量は、前記非予約型広告の出稿金額であり、
前記総量は、前記予約型広告及び前記非予約型広告を含む広告の出稿予算である項目１又は項目２記載の情報処理システム。
［項目４］
前記出稿指示部は、動的最適化アルゴリズムに従って、前記時点毎に、前記第一の出稿量及び前記第二の出稿量を決定する項目１～項目３のいずれか一項記載の情報処理システム。
［項目５］
前記動的最適化アルゴリズムは、強化学習、コンテキスチュアルバンデットアルゴリズム、及びカルマンフィルタの少なくとも一つを含む項目４記載の情報処理システム。
［項目６］
前記出稿指示部は、前記時点毎に、
対応する時点における状態又はコンテキストに基づき、前記対応する時点における広告出稿に関する行動として、前記第一の出稿量及び前記第二の出稿量を決定し、
前記第一の実績及び前記第二の実績に基づき、前記対応する時点までの広告出稿により新たに露出された前記予約型広告及び前記非予約型広告の広告効果を、前記行動に対する報酬として決定し、
前記対応する時点での広告出稿と、前記対応する時点での前記第一の実績及び前記第二の実績とを加味して、前記状態又はコンテキストを更新し、
前記報酬に基づいて、前記行動の選択に関するポリシーを更新する
ように構成され、
前記状態又はコンテキストは、前記可能な残り出稿量と、前記未露出広告の露出スケジュールと、前記第一の実績と、前記第二の実績と、を用いて定義される項目１～項目５のいずれか一項記載の情報処理システム。
［項目７］
前記出稿指示部は、状態、報酬、及び行動が定義された強化学習により、前記時点毎に、対応する時点での前記第一の出稿量及び前記第二の出稿量を前記行動として決定するように構成され、
前記状態は、前記対応する時点での前記可能な残り出稿量、前記第一の実績、前記第二の実績、及び、前記未露出広告のスケジュールを用いて定義され、
前記報酬は、前記第一の実績及び前記第二の実績から判別される前記対応する時点での広告効果を用いて定義される項目１～項目５のいずれか一項記載の情報処理システム。
［項目８］
前記出稿指示部は、コンテキスト、報酬、及び行動が定義されたコンテキスチュアルバンデットアルゴリズムにより、前記時点毎に、対応する時点での前記第一の出稿量及び前記第二の出稿量を前記行動として決定するように構成され、
前記コンテキストは、前記対応する時点での前記可能な残り出稿量、前記第一の実績、前記第二の実績、及び、前記未露出広告のスケジュールを用いて定義され、
前記報酬は、前記第一の実績及び前記第二の実績から判別される前記対応する時点での広告効果を用いて定義される項目１～項目５のいずれか一項記載の情報処理システム。
［項目９］
前記出稿指示部は、例えばカルマンフィルタ等の状態空間モデルを用いて、前記時点毎に、対応する時点での前記第一の出稿量及び前記第二の出稿量を決定するように構成され、
前記状態空間モデルは、前記未露出広告の露出スケジュールの情報を含むモデルであって、状態量と、観測量と、入力量との間の関係を定義し、
前記入力量は、前記第一の出稿量と前記第二の出稿量とを用いて定義される広告出稿に関する量であり、
前記状態量は、前記広告出稿により変化する状態量であり、前記可能な残り出稿量と、前記第一の実績と、前記第二の実績と、を用いて定義され、
前記観測量は、前記第一の実績及び前記第二の実績に基づいて判別される出稿された前記予約型広告及び前記非予約型広告の広告効果を定義する量である項目１～項目５のいずれか一項記載の情報処理システム。
［項目１０］
項目１～項目９のいずれか一項記載の情報処理システムにおける出稿指示部と、第一実績判別部と、第二実績判別部と、スケジュール判別部としての機能をコンピュータに実現させるためのコンピュータプログラム。
［項目１１］
コンピュータにより実行される情報処理方法であって、
第一の媒体を通じて露出される予約型広告に対する出稿量である第一の出稿量、及び、第二の媒体を通じて露出される非予約型広告に対する出稿量である第二の出稿量を決定し、前記第一の出稿量に基づいた前記第一の媒体に対する広告出稿、及び、前記第二の出稿量に基づいた前記第二の媒体に対する広告出稿を指示することと、
前記第一の媒体を通じて露出された前記予約型広告の露出に関する実績である第一の実績を判別することと、
前記第二の媒体を通じて露出された前記非予約型広告の露出に関する実績である第二の実績を判別することと、
前記予約型広告のうち、露出待ちにある未露出広告の露出スケジュールを判別することと、
を備え、
前記予約型広告及び前記非予約型広告に対して可能な出稿量の総量は、予め定められており、
前記広告出稿を指示することは、複数の時点に関し、時点毎に、可能な残り出稿量と、前記未露出広告の露出スケジュールと、前記第一の実績と、前記第二の実績とに基づき、前記第一の出稿量及び前記第二の出稿量を決定することを含む情報処理方法。 [Technical concept disclosed in this specification]
It can be understood that the following technical ideas are disclosed in this specification.
[Item 1]
Determining a first advertisement amount, which is an advertisement amount for reserved advertisements exposed through the first medium, and a second advertisement amount, which is an advertisement amount for non-reserved advertisements exposed through the second medium, an advertisement placement instruction unit configured to instruct advertisement placement on the first medium based on the first placement amount and advertisement placement on the second medium based on the second placement amount;
a first performance determination unit configured to determine a first performance, which is a performance related to exposure of the reserved advertisement exposed through the first medium;
a second performance discriminating unit configured to discriminate a second performance, which is a performance relating to exposure of the non-reserved advertisement exposed through the second medium;
a schedule discriminating unit configured to discriminate an exposure schedule of non-exposure advertisements waiting for exposure among the reserved advertisements;
with
The total amount of possible placements for the reserved advertisement and the non-reserved advertisement is predetermined,
The advertisement placement instruction unit, for each of a plurality of time points, determines the first and the second amount of advertisements.
[Item 2]
Of the total amount, a first amount is predetermined as an ad placement amount for the reserved advertisement, a second amount is set as an ad placement amount shared by the reserved advertisement and the non-reserved advertisement, and The instruction unit is given the right to decide allocation of the second amount of the reserved advertisement and the non-reserved advertisement,
Placement of the scheduled advertisement corresponding to at least a portion of the first amount is completed at an initial point in time that is earlier than the plurality of points in time, and the exposure schedule is set at the initial point in time. including an exposure schedule for an unexposed advertisement waiting for exposure, which is the reserved advertisement that has been posted in
The placement instruction unit adjusts the first placement amount and the second placement amount for each time point, taking into account the exposure schedule of the unexposed advertisement including the reserved advertisement that has been placed at the first time point. The information processing system according to item 1 to be determined.
[Item 3]
The first ad placement amount is the ad placement amount of the reserved advertisement,
The second ad placement amount is the ad placement amount of the non-reserved advertisement,
3. The information processing system according to item 1 or 2, wherein the total amount is an advertisement budget for advertisements including the reserved advertisement and the non-reserved advertisement.
[Item 4]
The information processing system according to any one of items 1 to 3, wherein the advertisement placement instruction unit determines the first advertisement amount and the second advertisement amount for each of the time points according to a dynamic optimization algorithm.
[Item 5]
5. Information processing system according to item 4, wherein the dynamic optimization algorithm includes at least one of reinforcement learning, contextual bandit algorithm, and Kalman filter.
[Item 6]
The publication instruction unit, at each time point,
Based on the state or context at the corresponding point in time, determining the first ad placement amount and the second ad placement amount as actions related to ad placement at the corresponding point in time,
Based on the first performance and the second performance, the advertisement effect of the reserved advertisement and the non-reserved advertisement newly exposed by the advertisement placement up to the corresponding time is determined as a reward for the action. ,
updating the state or context in consideration of the advertisement placement at the corresponding time and the first performance and the second performance at the corresponding time;
configured to update a policy regarding selection of said action based on said reward;
The state or context is any of items 1 to 5 defined using the possible remaining ad placement amount, the exposure schedule of the unexposed advertisement, the first performance, and the second performance. or the information processing system according to item 1.
[Item 7]
The placement instruction unit determines the first placement amount and the second placement amount at the corresponding point in time as the action by reinforcement learning in which states, rewards, and actions are defined. configured to
the states are defined using the possible remaining ad placement volume, the first performance, the second performance, and the schedule of unexposed ads at the corresponding time points;
6. The information processing system according to any one of items 1 to 5, wherein the remuneration is defined using the advertising effect at the corresponding time determined from the first performance and the second performance.
[Item 8]
The placement instruction unit determines the first placement amount and the second placement amount at the corresponding point in time as the action by a contextual bandit algorithm in which context, reward, and action are defined. is configured to
the context is defined using the possible remaining ad placement volume, the first performance, the second performance, and the unexposed ad schedule at the corresponding time points;
6. The information processing system according to any one of items 1 to 5, wherein the remuneration is defined using the advertising effect at the corresponding time determined from the first performance and the second performance.
[Item 9]
The ad placement instruction unit is configured to determine the first ad placement amount and the second ad placement amount at a corresponding time point for each of the time points using a state space model such as a Kalman filter, for example,
The state space model is a model that includes information on the exposure schedule of the unexposed advertisement, and defines relationships between state quantities, observable quantities, and input quantities;
The input amount is an amount related to advertisement placement defined using the first placement amount and the second placement amount,
The state quantity is a state quantity that changes according to the placement of the advertisement, and is defined using the possible remaining placement amount, the first performance, and the second performance,
The observable amount is an amount that defines the advertising effectiveness of the posted reserved advertisement and the non-reserved advertisement determined based on the first performance and the second performance. Information processing system according to any one of the preceding items.
[Item 10]
A computer program for causing a computer to implement functions as an advertisement placement instruction unit, a first performance determination unit, a second performance determination unit, and a schedule determination unit in the information processing system according to any one of items 1 to 9. .
[Item 11]
A computer-implemented information processing method comprising:
Determining a first advertisement amount, which is an advertisement amount for reserved advertisements exposed through the first medium, and a second advertisement amount, which is an advertisement amount for non-reserved advertisements exposed through the second medium, instructing the placement of advertisements on the first medium based on the first amount of advertisements and the placement of advertisements on the second medium based on the second amount of advertisements;
Determining a first performance that is a performance related to exposure of the reserved advertisement exposed through the first medium;
Determining a second performance that is a performance related to exposure of the non-reserved advertisement exposed through the second medium;
Determining an exposure schedule of non-exposure advertisements waiting for exposure among the reserved advertisements;
with
The total amount of possible placements for the reserved advertisement and the non-reserved advertisement is predetermined,
Instructing the placement of the advertisement is based on the possible remaining placement amount, the exposure schedule of the unexposed advertisement, the first performance, and the second performance for each time point, and An information processing method including determining the first ad placement amount and the second ad placement amount.

１…情報処理システム、１１…プロセッサ、１２…メモリ、１３…ストレージ、１５…ディスプレイ、１７…入力デバイス、１８…メディアリーダ／ライタ、１９…通信デバイス、３１…中継システム、３５…計測システム。 DESCRIPTION OF SYMBOLS 1... Information processing system, 11... Processor, 12... Memory, 13... Storage, 15... Display, 17... Input device, 18... Media reader/writer, 19... Communication device, 31... Relay system, 35... Measurement system.

Claims

Determining a first advertisement amount, which is an advertisement amount for reserved advertisements exposed through the first medium, and a second advertisement amount, which is an advertisement amount for non-reserved advertisements exposed through the second medium, an advertisement placement instruction unit configured to instruct advertisement placement on the first medium based on the first placement amount and advertisement placement on the second medium based on the second placement amount;
a first performance determination unit configured to determine a first performance, which is a performance related to exposure of the reserved advertisement exposed through the first medium;
a second performance discriminating unit configured to discriminate a second performance, which is a performance relating to exposure of the non-reserved advertisement exposed through the second medium;
a schedule discriminating unit configured to discriminate an exposure schedule of non-exposure advertisements waiting for exposure among the reserved advertisements;
with
The total amount of possible placements for the reserved advertisement and the non-reserved advertisement is predetermined,
According to at least one of a reinforcement learning and a contextual bandit algorithm, the advertisement placement instruction unit, for each time point with respect to a plurality of time points ,
Based on the state or context at the corresponding point in time, determining the first ad placement amount and the second ad placement amount as actions related to ad placement at the corresponding point in time,
Based on the first performance and the second performance, the advertisement effect of the reserved advertisement and the non-reserved advertisement newly exposed by the advertisement placement up to the corresponding time is determined as a reward for the action. ,
updating the state or context in consideration of the advertisement placement at the corresponding time and the first performance and the second performance at the corresponding time;
update a policy for selection of said action based on said reward
configured as
The information processing system, wherein the state or context is defined using a possible remaining placement amount, an exposure schedule of the unexposed advertisement, the first performance, and the second performance.

Determining a first advertisement amount, which is an advertisement amount for reserved advertisements exposed through the first medium, and a second advertisement amount, which is an advertisement amount for non-reserved advertisements exposed through the second medium, an advertisement placement instruction unit configured to instruct advertisement placement on the first medium based on the first placement amount and advertisement placement on the second medium based on the second placement amount;
a first performance determination unit configured to determine a first performance, which is a performance related to exposure of the reserved advertisement exposed through the first medium;
a second performance discriminating unit configured to discriminate a second performance, which is a performance relating to exposure of the non-reserved advertisement exposed through the second medium;
a schedule discriminating unit configured to discriminate an exposure schedule of non-exposure advertisements waiting for exposure among the reserved advertisements;
with
The total amount of possible placements for the reserved advertisement and the non-reserved advertisement is predetermined,
The placement instruction unit calculates the first placement amount and the second placement amount at each corresponding time point with respect to a plurality of time points by reinforcement learning in which states, rewards, and actions are defined. is configured to determine as
the states are defined using the remaining ad placement volume, the first performance, the second performance, and the schedule of unexposed ads at the corresponding time points;
The information processing system, wherein the reward is defined using advertising effectiveness at the corresponding time determined from the first performance and the second performance.

Determining a first advertisement amount, which is an advertisement amount for reserved advertisements exposed through the first medium, and a second advertisement amount, which is an advertisement amount for non-reserved advertisements exposed through the second medium, an advertisement placement instruction unit configured to instruct advertisement placement on the first medium based on the first placement amount and advertisement placement on the second medium based on the second placement amount;
a first performance determination unit configured to determine a first performance, which is a performance related to exposure of the reserved advertisement exposed through the first medium;
a second performance discriminating unit configured to discriminate a second performance, which is a performance relating to exposure of the non-reserved advertisement exposed through the second medium;
a schedule discriminating unit configured to discriminate an exposure schedule of non-exposure advertisements waiting for exposure among the reserved advertisements;
with
The total amount of possible placements for the reserved advertisement and the non-reserved advertisement is predetermined,
The placement instruction unit calculates the first placement amount and the second placement amount at each corresponding time point with respect to a plurality of time points by a contextual bandit algorithm in which context, reward, and behavior are defined. configured to determine as said action ;
the context is defined using the remaining ad placement volume, the first performance, the second performance, and the unexposed advertisement schedule at the corresponding time points;
The information processing system, wherein the reward is defined using advertising effectiveness at the corresponding time determined from the first performance and the second performance.

Determining a first advertisement amount, which is an advertisement amount for reserved advertisements exposed through the first medium, and a second advertisement amount, which is an advertisement amount for non-reserved advertisements exposed through the second medium, an advertisement placement instruction unit configured to instruct advertisement placement on the first medium based on the first placement amount and advertisement placement on the second medium based on the second placement amount;
a first performance determination unit configured to determine a first performance, which is a performance related to exposure of the reserved advertisement exposed through the first medium;
a second performance discriminating unit configured to discriminate a second performance, which is a performance relating to exposure of the non-reserved advertisement exposed through the second medium;
a schedule discriminating unit configured to discriminate an exposure schedule of non-exposure advertisements waiting for exposure among the reserved advertisements;
with
The total amount of possible placements for the reserved advertisement and the non-reserved advertisement is predetermined,
The advertisement placement instruction unit uses a state space model such as a Kalman filter, for example, for a plurality of time points, for each time point, the possible remaining ad placement amount, the exposure schedule of the unexposed advertisement, the first performance, the configured to determine the first ad placement amount and the second ad placement amount at corresponding points in time based on a second performance,
The state space model is a model that includes information about the exposure schedule of the non-exposed advertisement, defines a relationship between a state quantity, an observable quantity, and an input quantity, and defines the information of the exposure schedule of the non-exposed advertisement. is incorporated into the state-space model by formulating the relationship between the input quantities and the observables according to the exposure schedule;
The input amount is an amount related to advertisement placement defined using the first placement amount and the second placement amount,
The state quantity is a state quantity that changes according to the placement of the advertisement, and is defined using the possible remaining placement amount, the first performance, and the second performance,
The information processing system, wherein the observable amount is an amount that defines advertising effectiveness of the posted reserved advertisement and the posted non-reserved advertisement determined based on the first performance and the second performance.

Of the total amount, a first amount is predetermined as an ad placement amount for the reserved advertisement, a second amount is set as an ad placement amount shared by the reserved advertisement and the non-reserved advertisement, and The instruction unit is given the right to decide allocation of the second amount of the reserved advertisement and the non-reserved advertisement,
Placement of the scheduled advertisement corresponding to at least a portion of the first amount is completed at an initial point in time that is earlier than the plurality of points in time, and the exposure schedule is set at the initial point in time. including an exposure schedule for an unexposed advertisement waiting for exposure, which is the reserved advertisement that has been posted in
The placement instruction unit considers, as the state or the context, an exposure schedule of an unexposed advertisement including the reserved advertisement that has already been placed at the first time point, and calculates the first placement amount and the 2. The information processing system according to claim 1, wherein the second advertisement amount is determined.

The first ad placement amount is the ad placement amount of the reserved advertisement,
The second ad placement amount is the ad placement amount of the non-reserved advertisement,
6. The information processing system according to any one of claims 1 to 5, wherein the total amount is an advertisement budget for advertisements including the reserved advertisement and the non-reserved advertisement.

A method for causing a computer to realize the functions of the publication instruction unit, the first performance determination unit, the second performance determination unit, and the schedule determination unit in the information processing system according to any one of claims 1 to 5 computer program.

A computer-implemented information processing method comprising:
Determining a first advertisement amount, which is an advertisement amount for reserved advertisements exposed through the first medium, and a second advertisement amount, which is an advertisement amount for non-reserved advertisements exposed through the second medium, instructing the placement of advertisements on the first medium based on the first amount of advertisements and the placement of advertisements on the second medium based on the second amount of advertisements;
Determining a first performance that is a performance related to exposure of the reserved advertisement exposed through the first medium;
Determining a second performance that is a performance related to exposure of the non-reserved advertisement exposed through the second medium;
Determining an exposure schedule of non-exposure advertisements waiting for exposure among the reserved advertisements;
with
The total amount of possible placements for the reserved advertisement and the non-reserved advertisement is predetermined,
Instructing the placement of advertisements is performed according to at least one of a reinforcement learning and a contextual bandit algorithm, with respect to a plurality of time points, at each time point :
Based on the state or context at the corresponding point in time, determining the first ad placement amount and the second ad placement amount as actions related to ad placement at the corresponding point in time,
Based on the first performance and the second performance, the advertisement effect of the reserved advertisement and the non-reserved advertisement newly exposed by the advertisement placement up to the corresponding time is determined as a reward for the action. ,
updating the state or context in consideration of the advertisement placement at the corresponding time and the first performance and the second performance at the corresponding time;
Updating a policy for selecting the action based on the reward
including
The information processing method, wherein the state or context is defined using a possible remaining placement amount, an exposure schedule of the unexposed advertisement, the first performance, and the second performance.

A computer-implemented information processing method comprising:
Determining a first advertisement amount, which is an advertisement amount for reserved advertisements exposed through the first medium, and a second advertisement amount, which is an advertisement amount for non-reserved advertisements exposed through the second medium, instructing the placement of advertisements on the first medium based on the first amount of advertisements and the placement of advertisements on the second medium based on the second amount of advertisements;
Determining a first performance that is a performance related to exposure of the reserved advertisement exposed through the first medium;
Determining a second performance that is a performance related to exposure of the non-reserved advertisement exposed through the second medium;
Determining an exposure schedule of non-exposure advertisements waiting for exposure among the reserved advertisements;
with
The total amount of possible placements for the reserved advertisement and the non-reserved advertisement is predetermined,
Instructing the advertisement placement is performed by reinforcement learning in which states, rewards, and actions are defined, and for each time point, the first ad placement amount and the second ad placement amount at corresponding points in time. as said action ;
the states are defined using the remaining ad placement volume, the first performance, the second performance, and the schedule of unexposed ads at the corresponding time points;
The information processing method, wherein the reward is defined using advertising effectiveness at the corresponding time determined from the first performance and the second performance.

A computer-implemented information processing method comprising:
Determining a first advertisement amount, which is an advertisement amount for reserved advertisements exposed through the first medium, and a second advertisement amount, which is an advertisement amount for non-reserved advertisements exposed through the second medium, instructing the placement of advertisements on the first medium based on the first amount of advertisements and the placement of advertisements on the second medium based on the second amount of advertisements;
Determining a first performance that is a performance related to exposure of the reserved advertisement exposed through the first medium;
Determining a second performance that is a performance related to exposure of the non-reserved advertisement exposed through the second medium;
Determining an exposure schedule of non-exposure advertisements waiting for exposure among the reserved advertisements;
with
The total amount of possible placements for the reserved advertisement and the non-reserved advertisement is predetermined,
By a contextual bandit algorithm in which a context, a reward, and an action are defined, for each time point, the first ad placement amount and the second ad placement amount at a corresponding time point for a plurality of time points configured to determine the amount of advertisement as the action ,
the context is defined using the remaining ad placement volume, the first performance, the second performance, and the unexposed advertisement schedule at the corresponding time points;
The information processing method, wherein the reward is defined using advertising effectiveness at the corresponding time determined from the first performance and the second performance.

A computer-implemented information processing method comprising:
Determining a first advertisement amount, which is an advertisement amount for reserved advertisements exposed through the first medium, and a second advertisement amount, which is an advertisement amount for non-reserved advertisements exposed through the second medium, instructing the placement of advertisements on the first medium based on the first amount of advertisements and the placement of advertisements on the second medium based on the second amount of advertisements;
Determining a first performance that is a performance related to exposure of the reserved advertisement exposed through the first medium;
Determining a second performance that is a performance related to exposure of the non-reserved advertisement exposed through the second medium;
Determining an exposure schedule of non-exposure advertisements waiting for exposure among the reserved advertisements;
with
The total amount of possible placements for the reserved advertisement and the non-reserved advertisement is predetermined,
Instructing the advertisement placement includes, for example, using a state space model such as a Kalman filter, with respect to a plurality of time points, for each time point, the possible remaining ad placement amount, the exposure schedule of the unexposed advertisement, and the first performance and determining the first ad placement amount and the second ad placement amount at the corresponding point in time based on the second performance ,
The state space model is a model that includes information about the exposure schedule of the non-exposed advertisement, defines a relationship between a state quantity, an observable quantity, and an input quantity, and defines the information of the exposure schedule of the non-exposed advertisement. is incorporated into the state-space model by formulating the relationship between the input quantities and the observables according to the exposure schedule;
The input amount is an amount related to advertisement placement defined using the first placement amount and the second placement amount,
The state quantity is a state quantity that changes according to the placement of the advertisement, and is defined using the possible remaining placement amount, the first performance, and the second performance,
The information processing method, wherein the observable amount is an amount that defines advertising effectiveness of the posted reserved advertisement and the non-reserved advertisement determined based on the first performance and the second performance.