JP2022158400A

JP2022158400A - System for controlling execution system including multiple subsystems

Info

Publication number: JP2022158400A
Application number: JP2021063275A
Authority: JP
Inventors: 高斉松本; Kosei Matsumoto; 俊宏鯨井; Toshihiro Kujirai
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2021-04-02
Filing date: 2021-04-02
Publication date: 2022-10-17
Also published as: WO2022210958A1

Abstract

To improve overall control of a system including multiple subsystems.SOLUTION: A system is disclosed for controlling an execution system that includes multiple subsystems. A storage device stores data for constructing a first model for extracting feature quantities of each input graph. A processor acquires measurement data from each of the multiple subsystems and generates graphs representing the measurement data of each of the multiple subsystems based on the measurement data from each of the multiple subsystems. The processor uses the first model to extract feature quantities from respective graphs and controls the execution system based on the feature quantities.SELECTED DRAWING: Figure 5

Description

本発明は、複数のサブシステムを含む実行システムを制御するためのシステムに関する。 The present invention relates to a system for controlling an execution system containing multiple subsystems.

現在、多くのシステムは、異種の複数のサブシステムから構成されており、システムの複雑化が増している。その一方で、それらサブシステムからなるシステム全体の運用及び制御の効率化が求められている。このようなサブシステム例は、電力網、通信網、路線・道路網、物流網、サプライチェーン、金融取引、オフィスやショッピングモールにおける人の流れ等である。 Many systems today consist of multiple heterogeneous subsystems, increasing system complexity. On the other hand, there is a demand for more efficient operation and control of the entire system consisting of these subsystems. Examples of such subsystems are power grids, telecommunication networks, rail and road networks, logistics networks, supply chains, financial transactions, the flow of people in offices and shopping malls, and the like.

また、機械学習を利用したＡＩ（ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ）モデルにより、システムを分析又は自動制御することが提案されている。様々な機械学習の手法が提案されており、例えば、非特許文献１は、グラフ畳み込みネットワークを使用した分類手法を開示している。 Also, it has been proposed to analyze or automatically control a system by an AI (Artificial Intelligence) model using machine learning. Various machine learning methods have been proposed. For example, Non-Patent Document 1 discloses a classification method using a graph convolution network.

Thomas N. Kipf, Max Welling, "Semi-Supervised Classification with Graph Convolutional Networks", ICLR 2017.Thomas N. Kipf, Max Welling, "Semi-Supervised Classification with Graph Convolutional Networks", ICLR 2017.

システムを構成する異種のサブシステム間において、サブシステムが対象とするデータの種類、データ構造、データを取得するシステム、及びＫＰＩ（ＫｅｙＰｅｒｆｏｒｍａｎｃｅＩｎｄｉｃａｔｏｒ）等、データ領域が異なる。そのため、異種のサブシステム間の関係を記述し、システム全体の制御を最適化することは困難である。 Data areas such as types of data targeted by the subsystems, data structures, systems for acquiring data, and KPIs (Key Performance Indicators) differ among the different types of subsystems that make up the system. Therefore, it is difficult to describe the relationships between heterogeneous subsystems and optimize the control of the entire system.

本開示代表的な一例は、複数のサブシステムを含む実行システムを制御するためのシステムであって、１以上のプロセッサと、前記１以上のプロセッサが実行するプログラムを格納する１以上の記憶装置と、を含む。前記１以上の記憶装置は、入力されたグラフそれぞれの特徴量を抽出する第１モデル、を構成するためのデータ、を格納する。前記１以上のプロセッサは、前記複数のサブシステムそれぞれからの計測データを取得し、前記複数のサブシステムそれぞれからの計測データに基づいて、前記複数のサブシステムそれぞれの計測データを表すグラフを生成し、前記第１モデルを使用して、前記グラフそれぞれから特徴量を抽出し、前記特徴量に基づいて、前記実行システムを制御する。 A representative example of the present disclosure is a system for controlling an execution system including a plurality of subsystems, comprising: one or more processors; one or more storage devices that store programs executed by the one or more processors; ,including. The one or more storage devices store data for constructing a first model for extracting the feature amount of each input graph. The one or more processors acquire measurement data from each of the plurality of subsystems and generate graphs representing the measurement data of each of the plurality of subsystems based on the measurement data from each of the plurality of subsystems. and extracting features from each of the graphs using the first model and controlling the execution system based on the features.

本発明の代表的な実施の形態によれば、複数のサブシステムを含むシステム全体の制御を改善できる。前述した以外の課題、構成及び効果は、以下の実施例の説明により明らかにされる。 Exemplary embodiments of the present invention provide improved control over systems that include multiple subsystems. Problems, configurations, and effects other than those described above will be clarified by the following description of the embodiments.

複数のサブシステムを含む実行システム及び当該実行対象システムを制御する制御システムの機能構成例を模式的に示す。1 schematically shows a functional configuration example of an execution system including a plurality of subsystems and a control system that controls the execution target system; システム制御部のハードウェア構成例を模式的に示す。4 schematically shows a hardware configuration example of a system control unit; サブシステムの計測部による購買行動の計測例を模式的に示す。An example of measurement of purchasing behavior by the measurement unit of the subsystem is schematically shown. ショッピングモールにおける、サブシステムの計測部による混雑度の計測例を模式的に示す。4 schematically shows an example of congestion degree measurement by a measuring unit of a subsystem in a shopping mall. システム制御部が実行する処理例のフローチャートを示す。4 shows a flowchart of an example of processing executed by a system control unit. サブシステムの計測データから生成された、所定期間における利用者の購買行動を表すグラフの例を示す。4 shows an example of a graph representing a user's purchasing behavior over a predetermined period of time, generated from measurement data of a subsystem; サブシステムの計測データに基づき生成された、所定期間における混雑度を表すグラフの例を示す。6 shows an example of a graph representing the degree of congestion during a predetermined period, which is generated based on the measured data of the subsystem; グラフ特徴抽出部の論理構成例及び状態行動価値更新部の一部の論理構成例を模式的に示す。4 schematically shows an example of a logical configuration of a graph feature extraction unit and an example of a partial logical configuration of a state-action-value update unit;

以下においては、便宜上その必要があるときは、複数のセクションまたは実施例に分割して説明するが、特に明示した場合を除き、それらは互いに無関係なものではなく、一方は、他方の一部または全部の変形例、詳細、補足説明等の関係にある。また、以下において、要素の数等（個数、数値、量、範囲等を含む）に言及する場合、特に明示した場合及び原理的に明らかに特定の数に限定される場合等を除き、その特定の数に限定されるものではなく、特定の数以上でも以下でもよい。 The following description is divided into multiple sections or examples when necessary for convenience, but they are not independent of each other and one may be a part of the other or All modifications, details, supplementary explanations, etc. are related. In addition, hereinafter, when referring to the number of elements (including number, numerical value, amount, range, etc.), unless otherwise specified or clearly limited to a specific number in principle, is not limited to the number of , and may be greater than or less than a specific number.

本明細書の一実施例にかかるシステムは、環境に対して行動を実行する実行システム及び当該実行システムを制御する制御システムを含む。実行システムは、異種の複数のサブシステムを含み、それぞれ異なるＫＰＩ（ＫｅｙＰｅｒｆｏｒｍａｎｃｅＩｎｄｉｃａｔｏｒ）を有している。 A system according to one embodiment herein includes an execution system that performs actions on an environment and a control system that controls the execution system. The execution system includes a plurality of different types of subsystems, each having a different KPI (Key Performance Indicator).

制御システムは、複数のサブシステムそれぞれをグラフで表す。グラフは、ノードとノードを接続するエッジとを含むデータ型である。グラフにより、データの種類、データ構造、データを取得するシステム、及びＫＰＩ等、データ領域が異なる異種のサブシステムを、共通の表現に抽象できる。 The control system graphically represents each of the multiple subsystems. A graph is a data type that contains nodes and edges that connect the nodes. Graphs allow abstraction of heterogeneous subsystems with different data domains, such as data types, data structures, data acquisition systems, and KPIs, into a common representation.

制御システムは、グラフを使用して、異なるデータ領域のサブシステムが組み合わさるシステムにおいて、システム全体の制御の最適化を行う。制御システムは、複数のサブシステム間の関係の記述が困難な実行システムにおいて、複数のサブシステムのＫＰＩのバランスをとり、実行システム全体の制御を最適化することができる。 The control system uses graphs to optimize overall system control in systems where different data domain subsystems are combined. The control system can balance the KPIs of multiple subsystems and optimize the control of the entire execution system in an execution system in which the relationships between multiple subsystems are difficult to describe.

以下において、本明細書の一実施例に係るシステムを説明する。システムは、環境に対して行動を行い働きかける実行システムと、当該実行システムを制御する制御システムとを含む。以下おいて、ショッピングモールにおいて、利用客の流れを制御する例を説明する。実行システムは、環境であるショッピングモールにおいて、特定の行動を実行して、利用客の行動を制御する。 A system according to one embodiment of the present specification is described below. The system includes an execution system that acts and affects the environment and a control system that controls the execution system. In the following, an example of controlling the flow of customers in a shopping mall will be described. The execution system controls the behavior of patrons by executing specific actions in the environment, the shopping mall.

本明細書に開示の制御は、サブシステムより得られるデータがグラフで表現可能であれば、任意のシステムに適用できる。本明細書に開示の制御システムは、ショッピングモールと異なる環境、例えば、電力網、通信網、路線・道路網、物流網、サプライチェーン、金融取引等において、異種のサブシステムを含む実行システムの制御に適用可能である。 The control disclosed herein can be applied to any system where the data obtained from the subsystems can be represented graphically. The control system disclosed herein is useful for controlling execution systems that include heterogeneous subsystems in environments different from shopping malls, such as power networks, communication networks, railway/road networks, distribution networks, supply chains, and financial transactions. Applicable.

上述のように、実行システムは、異種の複数のサブシステムを含む。サブシステムは、それぞれ異なるＫＰＩを有しており、異なる行動を実行する。以下の説明する例において、一つのサブシステムは、複数の店舗が立ち並ぶショッピングモールにおいて、ショッピングモール全体の売上げの向上のため、買い物客の購買行動を計測し、スマートフォンアプリへのクーポン配信などを介して購買行動を促す。 As mentioned above, the execution system includes heterogeneous subsystems. Each subsystem has different KPIs and performs different actions. In the example explained below, in a shopping mall with multiple stores, one subsystem measures the purchasing behavior of shoppers and distributes coupons to smartphone apps in order to improve the overall sales of the shopping mall. to encourage purchasing behavior.

他の一つのサブシステムの例は、ショッピングモール内の混雑の抑制に向けて、利用者の通路上での混雑状況を計測し、サイネージでの混雑状況の提示などによって、混雑が解消するように利用者の移動を促す。 Another example of a subsystem is to reduce congestion in a shopping mall by measuring the congestion situation on the aisles of users and presenting the congestion situation on signage to reduce congestion. Encourage users to move.

上記一つ目のＫＰＩはショッピングモール全体の売り上げであり、他の一つのＫＰＩは通路における混雑度である。このように、二つのサブシステムは、互いに異なるＫＰＩを有している。本明細書の一実施例に係る制御システムは、複数のサブシステムの間で、売上げの向上と混雑の抑制といった、異なる複数のＫＰＩ間のバランスをとりつつ、それらを最大化するタスクを実行する。制御システムは、このタスクに応じた最適制御を学習する。 The first KPI above is the sales of the entire shopping mall, and the other KPI is the degree of congestion in the aisles. Thus, the two subsystems have different KPIs. A control system according to one embodiment herein performs the task of balancing and maximizing different KPIs, such as increasing sales and reducing congestion, among subsystems. . The control system learns optimal control for this task.

図１は、複数のサブシステムを含む実行システム及び当該実行対象システムを制御する制御システムの機能構成例を模式的に示す。制御対象である実行システムは、複数のサブシステムを含み、図１は、例として、二つのサブシステム０及びサブシステム１を明示する。実行システムに含まれるサブシステムの数は任意である。 FIG. 1 schematically shows a functional configuration example of an execution system including a plurality of subsystems and a control system that controls the execution target system. The execution system to be controlled includes a plurality of subsystems, and FIG. 1 specifies two subsystems 0 and 1 as an example. Any number of subsystems may be included in the execution system.

サブシステム０は、計測部１０１及び制御部１０４を含む。サブシステム１は、計測部１０２及び制御部１０５を含む。環境１０６は、サブシステムそれぞれが働きかける対象である。以下に説明する例において、環境１０６は、ショッピングモールに相当する。システム制御部１０３は、複数のサブシステムそれぞれと連携し、実行システム全体を制御する。 Subsystem 0 includes a measurement unit 101 and a control unit 104 . Subsystem 1 includes a measurement unit 102 and a control unit 105 . The environment 106 is what each subsystem works on. In the example described below, environment 106 corresponds to a shopping mall. The system control unit 103 cooperates with each of the plurality of subsystems and controls the entire execution system.

計測部１０１、１０２は、それぞれ、対応するサブシステム０、１のＫＰＩ改善のために必要なデータを、環境１０６において収集する。制御部１０４、１０５は、それぞれ、対応するサブシステムのＫＰＩ改善のために必要と計測部１０１、１０２が取集したデータに基づき判定された行動を、環境１０６において実行する。 The measurement units 101 and 102 collect data necessary for improving the KPIs of the corresponding subsystems 0 and 1 in the environment 106 . Control units 104 and 105 execute actions in environment 106 that are determined based on the data collected by measurement units 101 and 102 as necessary for improving the KPI of the corresponding subsystem, respectively.

サブシステム０のＫＰＩは、ショッピングモール全体の売上げである。売上げが高いほど、好ましい状態であると評価される。サブシステム０の計測部１０１は、ポイントカードを利用したＰＯＳ（ＰｏｉｎｔｏｆＳａｌｅ）システムを含む。計測部１０１は、ポイントカードに割り当てられたユニークなＩＤに紐づけて、購買行動の日時、利用店舗、購買額、利用者の性別、利用者の年齢等の計測データを収集する。また、計測部１０１は、ショッピングモール内の店舗の情報、例えば、店舗種別、面積、総売上げ等を収集する。 The KPI for subsystem 0 is the sales for the entire shopping mall. The higher the sales, the more favorable the situation is evaluated. The measurement unit 101 of the subsystem 0 includes a POS (Point of Sale) system using point cards. The measurement unit 101 collects measurement data such as the date and time of purchase behavior, the store used, the purchase amount, the user's gender, the user's age, etc., in association with the unique ID assigned to the point card. The measurement unit 101 also collects information on stores in the shopping mall, such as store type, area, total sales, and the like.

サブシステム０の制御部１０４は、ショッピングモールの利用者それぞれが使用する、スマートフォン及びそれらが実行するアプリ（アプリケーションプログラム）を含む。スマートフォンは、システム制御部１０３から送信された情報、例えば、割引きクーポンや店舗位置を表示することができる。 The control unit 104 of the subsystem 0 includes smartphones used by shopping mall users and applications (application programs) executed by them. The smartphone can display information transmitted from the system control unit 103, such as discount coupons and store locations.

サブシステム１のＫＰＩは、ショッピングモールにおける混雑度である。混雑度が小さい程、好ましい状態であると評価される。サブシステム１の計測部１０２は、ショッピングモール内に設置された複数のＴＯＦ（ＴｉｍｅｏｆＦｌｉｇｈｔ）センサを含む。 The KPI of subsystem 1 is the degree of congestion in the shopping mall. The smaller the degree of congestion, the more preferable the state is evaluated. The measurement unit 102 of the subsystem 1 includes multiple TOF (Time of Flight) sensors installed in the shopping mall.

ＴＯＦセンサは、計測範囲内での、ショッピングモール利用者それぞれの位置を、計測データとして収集可能である。ＴＯＦセンサにより得られる利用者の位置は、ショッピングモール全体での位置として把握する必要がある。計測部１０２は、ＴＯＦセンサそれぞれの位置、姿勢（向き）及びショッピングモールの地図（店舗及び通路の幾何形状及び属性を記したデータ）を保持している。 The TOF sensor can collect the position of each shopping mall user within the measurement range as measurement data. The position of the user obtained by the TOF sensor must be grasped as the position of the entire shopping mall. The measurement unit 102 holds the position and orientation (orientation) of each TOF sensor and a map of the shopping mall (data describing geometric shapes and attributes of stores and aisles).

制御部１０５は、デジタルサイネージとスマートフォンアプリを実行する利用者のスマートフォン含む。デジタルサイネージやスマートフォンは、システム制御部１０３から送信された情報を表示することができる。例えば、ショッピングモール内の場所毎の混雑状況、特定添付の割引ク―ポン、イベント会場でのイベント情報等、利用者の移動を促す情報が配信され得る。 The control unit 105 includes a digital signage and a user's smartphone that executes a smartphone application. Digital signage and smart phones can display information transmitted from the system control unit 103 . For example, information that encourages movement of the user, such as congestion status for each location in the shopping mall, discount coupons with specific attachments, event information at event venues, etc., can be distributed.

システム制御部１０３は、グラフ処理部１０７、計測データ受信部１０８、状態行動価値更新部１１１、行動選択部１１２、行動・制御データ変換部１１３、制御データ送信部１１４、学習管理部１１５、誤差生成部１１６及び計測データ・報酬変換部１１７を含む。グラフ処理部１０７は、計測データ・グラフ変換部１０９及びグラフ特徴抽出部１１０を含む。 The system control unit 103 includes a graph processing unit 107, a measurement data reception unit 108, a state action value update unit 111, an action selection unit 112, an action/control data conversion unit 113, a control data transmission unit 114, a learning management unit 115, and an error generation unit. It includes a unit 116 and a measurement data/reward conversion unit 117 . The graph processing unit 107 includes a measurement data/graph conversion unit 109 and a graph feature extraction unit 110 .

計測データ受信部１０８は、各サブシステムの計測部で得られた計測データを受信する。計測データ・報酬変換部１１７は、計測データ受信部１０８により得られた計測データを、報酬と呼ばれる、各サブシステムのＫＰＩに相当する数値に変換する。 A measurement data reception unit 108 receives measurement data obtained by the measurement unit of each subsystem. The measurement data/reward conversion unit 117 converts the measurement data obtained by the measurement data reception unit 108 into a numerical value called a reward, which corresponds to the KPI of each subsystem.

計測データ・グラフ変換部１０９は、計測データ受信部１０８により得られた計測データを、ノード、エッジ及びノードへの属性データからなるグラフデータに変換する。計測データにグラフが含まれている場合は、変換処理は不要である。 The measurement data/graph conversion unit 109 converts the measurement data obtained by the measurement data reception unit 108 into graph data including nodes, edges, and attribute data for nodes. If the measurement data contains graphs, conversion processing is unnecessary.

グラフ特徴抽出部１１０は、計測データ・グラフ変換部１０９が生成したグラフから、特徴量を抽出する。本明細書の一実施例は、グラフ内の各ノードの属性データに対して、所定ステップ以下の範囲内のノードの属性データの畳み込み処理を行うことで、各ノードが隣接のノードの値を反映した量（特徴量）を算出する処理を担う。ステップ数は、一つのノードから他ノードへのエッジを介した最短距離であり、隣接ノードまでの距離は１ステップである。 The graph feature extraction unit 110 extracts feature amounts from the graph generated by the measurement data/graph conversion unit 109 . According to one embodiment of the present specification, each node reflects the value of an adjacent node by performing a convolution process on the attribute data of each node within a range of a predetermined number of steps or less. It is responsible for the processing of calculating the amount (feature amount) obtained by The number of steps is the shortest distance from one node to another node via an edge, and the distance to an adjacent node is one step.

状態行動価値更新部１１１は、強化学習の枠組みに沿って、ある状態において取り得る行動（サブシステムに指令可能な制御データの組合せ）の価値を推定し、推定結果と行動の結果として得られた報酬をもとに、ある状態でとった行動の価値を更新（学習）する。行動選択部１１２は、強化学習の枠組みに沿って、ある状態において取り得る行動の価値をもとに、高い報酬が得られることが期待される行動を確率的に選択する。 The state action value updating unit 111 estimates the value of actions that can be taken in a certain state (a combination of control data that can be commanded to a subsystem) along the framework of reinforcement learning, and Based on the reward, it updates (learns) the value of actions taken in a certain state. The action selection unit 112 stochastically selects an action that is expected to give a high reward, based on the value of actions that can be taken in a certain state, in accordance with the framework of reinforcement learning.

行動・制御データ変換部１１３は、行動選択部１１２によって選択された行動に対応する、サブシステムに指令可能な制御データの組合せに変換する。制御データ送信部１１４は、各サブシステムに、行動・制御データ変換部１１３により生成された制御データを送信する。 The action/control data conversion unit 113 converts the action selected by the action selection unit 112 into a combination of control data that can be commanded to the subsystem. The control data transmission unit 114 transmits control data generated by the action/control data conversion unit 113 to each subsystem.

学習管理部１１５は、全体の処理を管理し、また、特に学習の進行状況を評価し、計測データに誤差を印加して学習するかを判定する。誤差生成部１１６は、学習に用いられる計測データに誤差を印加する。 The learning management unit 115 manages the overall processing, evaluates the progress of learning in particular, and determines whether to apply an error to the measurement data for learning. The error generator 116 applies an error to the measurement data used for learning.

図２は、システム制御部１０３のハードウェア構成例を模式的に示す。システム制御部１０３は、計算機構成を有することができる。演算性能を有するプロセッサ１５１と、プロセッサ１５１が実行するプログラム及びデータを格納する揮発性一時記憶領域を与える主記憶装置であるＤＲＡＭ１５２と、を含む。さらに、システム制御部１０３は、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）やフラッシュメモリなどを利用した永続的な情報記憶領域を与える補助記憶装置１５４を含む。ＤＲＡＭ１５２、補助記憶装置１５４及びこれらの組み合わせは、それぞれ記憶装置である。 FIG. 2 schematically shows a hardware configuration example of the system control unit 103. As shown in FIG. The system control unit 103 can have a computer configuration. It includes a processor 151 having arithmetic performance and a DRAM 152, which is a main memory providing a volatile temporary storage area for storing programs and data executed by the processor 151. FIG. Furthermore, the system control unit 103 includes an auxiliary storage device 154 that provides a permanent information storage area using a HDD (Hard Disk Drive), flash memory, or the like. DRAM 152, secondary memory 154, and combinations thereof are each memory devices.

システム制御部１０３は、さらに、本システムにおける他の装置を含む他の装置とデータ通信をおこなう通信装置１５３と、ユーザからの操作を受け付ける入力装置１５５と、各プロセスでの出力結果をユーザに提示するモニタ１５６（出力装置の例）と、を含む。これら構成要素は、バスを介して通信可能である。システム制御部１０３の構成要素のそれぞれの数は任意であり、一部の構成要素、例えば、入力装置１５５及びモニタ１５６は省略されてもよい。 The system control unit 103 further includes a communication device 153 that performs data communication with other devices including other devices in this system, an input device 155 that receives operations from the user, and presents the output results of each process to the user. and a monitor 156 (an example of an output device). These components can communicate via a bus. The number of each component of the system control unit 103 is arbitrary, and some components such as the input device 155 and the monitor 156 may be omitted.

図１を参照して説明したシステム制御部１０３の機能部は、例えば、命令コードを含むプログラムを実行するプロセッサ１５１により実装することができる。機能部を実現するためのプログラムは、例えば補助記憶装置１５４に格納される。プロセッサ１５１が実行するプログラム及び処理対象のデータは、補助記憶装置１５４からＤＲＡＭＵ１５２にロードされる。システム内の機能は、プログラムに従って動作するプロセッサに代えて、特定の機能向けの回路により実装されてもよい。 The functional units of the system control unit 103 described with reference to FIG. 1 can be implemented, for example, by the processor 151 that executes programs including instruction codes. Programs for realizing the functional units are stored in the auxiliary storage device 154, for example. Programs to be executed by the processor 151 and data to be processed are loaded from the auxiliary storage device 154 to the DRAMU 152 . Functions within the system may be implemented by function-specific circuitry in place of a processor operating according to a program.

システム制御部１０３は、図２に示すような物理的な計算機システム（一つ以上の物理的な計算機）でもよいし、クラウド基盤のような計算リソース群（複数の計算リソース）上に構築されたシステムでもよい。計算機システムあるいは計算リソース群は、１以上のインタフェース装置（例えば通信装置及び入出力装置を含む）、１以上の記憶装置（例えば、メモリ（主記憶）及び補助記憶装置を含む）、及び、１以上のプロセッサを含む。 The system control unit 103 may be a physical computer system (one or more physical computers) as shown in FIG. It can be system. A computer system or a group of computing resources includes one or more interface devices (for example, including communication devices and input/output devices), one or more storage devices (for example, including memory (main storage) and auxiliary storage devices), and one or more processor.

プログラムがプロセッサによって実行されることで機能が実現される場合、定められた処理が、適宜に記憶装置及び／またはインタフェース装置等を用いながら行われるため、機能はプロセッサの少なくとも一部とされてもよい。機能を主語として説明された処理は、プロセッサあるいはそのプロセッサを含むシステムが行う処理としてもよい。 When a function is realized by executing a program by a processor, the defined processing is performed while appropriately using a storage device and/or an interface device, etc., so the function may be at least part of the processor. good. A process described with a function as the subject may be a process performed by a processor or a system including the processor.

プログラムは、プログラムソースからインストールされてもよい。プログラムソースは、例えば、プログラム配布計算機または計算機が読み取り可能な記憶媒体（例えば計算機読み取り可能な非一過性記憶媒体）であってもよい。各機能の説明は一例であり、複数の機能が一つの機能にまとめられたり、一つの機能が複数の機能に分割されたりしてもよい。 Programs may be installed from program sources. The program source may be, for example, a program distribution computer or a computer-readable storage medium (eg, a computer-readable non-transitory storage medium). The description of each function is an example, and multiple functions may be combined into one function, or one function may be divided into multiple functions.

図１に示すシステムに含まれる他の構成要素、具体的には、計測部１０１、１０２及び制御部１０４、１０５は、それぞれ、必要な処理を実行する計算機システムを含むことができる。また、計測部１０１、１０２及び制御部１０４、１０５の少なくとも一部の機能は、システム制御部１０３の機能が実装された計算機に実装されてもよい。 Other components included in the system shown in FIG. 1, specifically, the measurement units 101 and 102 and the control units 104 and 105 can each include a computer system that performs necessary processing. At least part of the functions of the measurement units 101 and 102 and the control units 104 and 105 may be implemented in a computer in which the functions of the system control unit 103 are implemented.

次に、システム全体での処理の流れについて述べる。まず、各サブシステムが、環境１０６において所定の対象を計測する。上述のように、サブシステム０の計測部１０１は、ショッピングモール利用者の購買行動を計測する。サブシステム１の計測部１０２は、ショッピングモール１０６における混雑度を計測する。 Next, the flow of processing in the entire system will be described. First, each subsystem measures a given target in environment 106 . As described above, the measuring unit 101 of the subsystem 0 measures the purchasing behavior of shopping mall users. The measurement unit 102 of the subsystem 1 measures the degree of congestion in the shopping mall 106 .

図３は、サブシステム０の計測部１０１による購買行動の計測例を模式的に示す。図３は、ショッピングモール２００の複数の店舗３０１及び店舗３０１をつなぐ通路３０２を示す。図３において、店舗３０１は矩形で示されている。図３において、図示の容易のため、一つの店舗及び通路が、符号３０１及び３０２で指示されている。店舗３０１は、通路３０２に沿って並んでいる。 FIG. 3 schematically shows an example of purchase behavior measurement by the measurement unit 101 of the subsystem 0 . FIG. 3 shows a plurality of stores 301 in shopping mall 200 and aisles 302 connecting stores 301 . In FIG. 3, store 301 is indicated by a rectangle. In FIG. 3, one store and aisle are indicated by numerals 301 and 302 for ease of illustration. Stores 301 are lined up along aisles 302 .

図３は、利用者の購買行動を伴う移動３０３を矢印で示す。利用者の購買は、ＰＯＳシステムによって利用者によるポイントカードの利用を参照することで、計測することができる。図３においては、図示の容易のため、一つの矢印が符号３０３で指示されている。各矢印３０３は、利用者が続けて購買を行った二つの店舗３０１を示す。各矢印３０３は、利用者がその矢印３０３の開始点の店舗３０１において購買を行い、その後、同矢印３０３の終了点の店舗３０１において次の購買を行ったことを示す。 FIG. 3 shows movement 303 accompanied by the user's purchasing behavior with an arrow. A user's purchases can be measured by referring to the use of the point card by the user through the POS system. In FIG. 3, one arrow is indicated by reference numeral 303 for ease of illustration. Each arrow 303 indicates two stores 301 where the user made purchases in succession. Each arrow 303 indicates that the user made a purchase at the store 301 at the start point of the arrow 303 and then made the next purchase at the store 301 at the end point of the arrow 303 .

利用者が、ある店舗でポイントカードを利用し、次の店舗２０５に移動してポイントカードを利用する購買行動をとったとする。矢印３０３は、最初の店舗及び次の店舗をそれぞれ開始点及び終了点で示すことで、上記２店舗の利用順序を伴う利用者の移動を示すことができる。なお、矢印３０３は、利用者による店舗での購買及びそれに伴う移動を便宜的に表現しているに過ぎず、店舗間の実際の物理的な移動経路を表しているわけではない。 It is assumed that the user uses a point card at a certain store, moves to the next store 205, and takes a purchasing action of using the point card. The arrow 303 can indicate the movement of the user accompanied by the order of use of the two shops by indicating the first shop and the next shop with starting and ending points, respectively. Note that the arrow 303 merely expresses the user's purchase at the store and the accompanying movement for convenience, and does not represent the actual physical movement route between the stores.

図４は、ショッピングモール２００における、サブシステム１の計測部１０２による混雑度の計測例を模式的に示す。図４において、扇形４０１は、ＴＯＦセンサの計測範囲を示す。扇形４０１の頂点４０２は、ＴＯＦセンサの視点位置を示す。図４は、図示の容易のため、一つノＴＯＦセンサの計測範囲を例として符号４０１で示す。 FIG. 4 schematically shows an example of congestion degree measurement by the measurement unit 102 of the subsystem 1 in the shopping mall 200. As shown in FIG. In FIG. 4, sector 401 indicates the measurement range of the TOF sensor. A vertex 402 of the sector 401 indicates the viewpoint position of the TOF sensor. In FIG. 4, for ease of illustration, reference numeral 401 denotes the measurement range of a single TOF sensor as an example.

扇形部分４０１はＴＯＦセンサの計測範囲を示す。各ＴＯＦセンサは、計測範囲が通路３０２の一部を含むように設置されている。各ＴＯＦセンサは、計測範囲４０１内の利用者を連続的に計測でき、具体的には、各計測時刻の利用者の人数、利用者それぞれの移動経路、及び利用者それぞれの各計測時刻での速度、を計測することができる。なお、ＴＯＦセンサは、店舗を計測範囲に含むように設置されてもよい。 A sector 401 indicates the measuring range of the TOF sensor. Each TOF sensor is installed so that its measurement range includes a portion of passageway 302 . Each TOF sensor can continuously measure users within the measurement range 401, specifically, the number of users at each measurement time, the movement route of each user, and each user at each measurement time Velocity can be measured. Note that the TOF sensor may be installed so as to include the store in its measurement range.

前述のとおり、簡単のため、サブシステム０及びサブシステム１のみが例示されているが、さらに多くのサブシステムが含まれ得る。他のサブシステムは、それぞれ、サブシステム０やサブシステム１と同様に、計測処理を実行する。また、各サブシステムにおいて得られた計測データは、システム制御部１０３に送信される。 As previously mentioned, only Subsystem 0 and Subsystem 1 are illustrated for simplicity, but more subsystems may be included. The other subsystems, like subsystems 0 and 1, respectively, perform measurement processing. Also, measurement data obtained in each subsystem is transmitted to the system control unit 103 .

次に、サブシステムそれぞれにおいて計測されたデータに基づく、システム制御部１０３での処理を説明する。図５は、システム制御部１０３が実行する処理例のフローチャートを示す。システム制御部１０３は、グラフ畳み込みによる特徴抽出、サブシステムそれぞれに対して実行すべき行動の決定、及び行動価値の学習を実行する。 Next, processing in the system control unit 103 based on data measured in each subsystem will be described. FIG. 5 shows a flowchart of a processing example executed by the system control unit 103 . The system control unit 103 executes feature extraction by graph convolution, determination of actions to be executed for each subsystem, and learning of action values.

サブシステムは、システム制御部１０３の行動に対応する行動を環境１０６に対して実行する。そのため、システム制御部１０３は、サブシステムの環境１０６に対する行動を、計測データに基づき決定する。 The subsystem executes an action on the environment 106 corresponding to the action of the system control unit 103 . Therefore, the system control unit 103 determines the behavior of the subsystems with respect to the environment 106 based on the measurement data.

図５を参照して、学習管理部１１５は、初期化処理を実行する（５０１）。初期化処理は、例えば、行動を決定するための機械学習モデルのパラメータの初期値を設定する。次に、計測データ受信部１０８は、サブシステムそれぞれの計測部から送信された計測データを、受信する（５０２）。計測データ・報酬変換部１１７は、サブシステムそれぞれの計測部から受信した計測データから、実行システム全体の報酬の値を算出する（５０３）。報酬は、サブシステムそれぞれのＫＰＩ（報酬）に基づき算出される。 Referring to FIG. 5, learning management unit 115 executes initialization processing (501). The initialization process, for example, sets initial values of parameters of the machine learning model for determining actions. Next, the measurement data receiving unit 108 receives the measurement data transmitted from the measurement unit of each subsystem (502). The measurement data/remuneration conversion unit 117 calculates the value of the remuneration of the entire execution system from the measurement data received from the measurement unit of each subsystem (503). Rewards are calculated based on KPIs (rewards) for each subsystem.

より具体的に説明する。サブシステム０の報酬Ｒ０が、ショッピングモール内の全店舗の一定期間における売上げの総和で表現されるとする。計測データ・報酬変換部１１７は、サブシステム０のＰＯＳシステムが示す計測データに基づき、報酬Ｒ０を算出する。 More specific description will be given. Assume that the reward R0 of subsystem 0 is expressed as the total sales of all stores in the shopping mall over a certain period of time. The measurement data/reward conversion unit 117 calculates a reward R0 based on the measurement data indicated by the POS system of the subsystem 0. FIG.

同様にサブシステム１の報酬Ｒ１が、ショッピングモール内のＴＯＦセンサの計測範囲内の利用者間の平均距離で表現されるとする。例えば、各ＴＯＦセンサの計測範囲において、一定期間における利用者間の平均距離が算出される。計測範囲に存在する利用者及び利用者間の距離は時間と共に変化する。 Similarly, it is assumed that the reward R1 of the subsystem 1 is represented by the average distance between users within the measurement range of the TOF sensor in the shopping mall. For example, in the measurement range of each TOF sensor, the average distance between users over a certain period of time is calculated. Users present in the measurement range and distances between users change with time.

算出される平均距離は、例えば、一定期間内の各時刻における平均距離の平均値である。計測データ・報酬変換部１１７は、全てのＴＯＦセンサの平均距離の平均値を算出して、報酬と決定する。なお、ＴＯＦセンサの計測値それぞれに重みを与えてもよい。本例において、サブシステム０及び１それぞれの報酬を算出するための期間は共通である。 The calculated average distance is, for example, the average value of the average distances at each time within a certain period. The measurement data/reward conversion unit 117 calculates the average value of the average distances of all TOF sensors and determines it as a reward. A weight may be given to each measurement value of the TOF sensor. In this example, the period for calculating rewards for each of subsystems 0 and 1 is common.

計測データをもとにこのＲ１が算出される。また、同様に他のサブシステムについても、サブシステムのＫＰＩに対応した報酬が、それぞれの計測データより算出される。なお、一つのサブシステムに対して複数の報酬が定義されていてもよい。 This R1 is calculated based on the measurement data. Similarly, for other subsystems, rewards corresponding to the KPIs of the subsystems are calculated from their respective measurement data. Note that multiple rewards may be defined for one subsystem.

全サブシステムの総報酬Ｒは、サブシステムの個数が（Ｎ＋１）個だとすると、下記式によって算出される。
Ｒ＝α０・Ｋ０・Ｒ０＋α１・Ｋ１・Ｒ１＋…＋αＮ・ＫＮ・ＲＮ Assuming that the number of subsystems is (N+1), the total reward R of all subsystems is calculated by the following formula.
R=α0・K0・R0+α1・K1・R1+...+αN・KN・RN

α(ｉ)は各サブシステムに対する重み係数であって、各サブシステムをどの程度重視するかを表す。α(ｉ)は、０以上、１以下の実数であり、α０＋α１＋…＋αＮ＝１である。Ｋ（ｉ）は、各サブシステムの報酬を正規化するための係数である。α(ｉ)及びＫ（ｉ）は予め設定された定数である。 α(i) is a weighting factor for each subsystem and represents how much importance is given to each subsystem. α(i) is a real number equal to or greater than 0 and equal to or less than 1, and α0+α1+ . . . +αN=1. K(i) is a coefficient for normalizing the reward of each subsystem. α(i) and K(i) are preset constants.

次に、計測データ・グラフ変換部１０９は、各サブシステムより受信した一定期間の計測データから、各サブシステムのグラフデータを生成する（５０４）。本例において、グラフ生成のための計測データの期間は、全サブシステムに共通であり、また、報酬を算出する期間と同一である。これによって、より適切な行動が可能となる。なお、実行システムの構成によっては、グラフ生成のための期間がサブシステムの間で異なっていてもよい。 Next, the measurement data/graph conversion unit 109 generates graph data for each subsystem from the measurement data received from each subsystem for a certain period of time (504). In this example, the period of measurement data for graph generation is common to all subsystems and is the same as the period for calculating rewards. This allows for more appropriate action. Depending on the configuration of the execution system, the period for graph generation may differ between subsystems.

サブシステム０のＰＯＳシステムによって前述の通り計測された計測データは、例えば、購買日時、利用者年齢、利用店舗、購買額、店舗の種別、店舗の面積、店舗の総売上げ等を含み得る。計測データ・グラフ変換部１０９は、計測データの値を用いて、店舗をノードとし、店舗間の利用客の購買を伴う移動をエッジとするグラフを生成する。ノードには、計測データから生成される、店舗の属性を示す１以上の値からなる属性ベクトルが、与えれる。エッジは、例えば、規定数を超える利用者による移動に対して定義される。 The measurement data measured by the POS system of subsystem 0 as described above may include, for example, date and time of purchase, user age, store used, purchase amount, type of store, area of store, total sales of store, and the like. The measurement data/graph conversion unit 109 uses the values of the measurement data to generate a graph having the stores as nodes and the customer's purchase-related movements between stores as edges. The node is given an attribute vector composed of one or more values representing store attributes generated from the measurement data. Edges are defined, for example, for movements by users exceeding a specified number.

図６は、サブシステム０の計測データから生成された、所定期間における利用者の購買行動を表すグラフ６００の例を示す。図６において、円６０１や６０３は、店舗を表すノードである。線６０４は、利用者の店舗間の移動を表すエッジである。図６においては一部のノード及びエッジが、例として符号で指示されている。破線で示す範囲６０２は、ノード６０１及び当該ノードの隣接ノードを示す。上述のように、ノードそれぞれに対して、利用者の計測データに基づく属性データ（属性ベクトル）が割り当てられている。 FIG. 6 shows an example of a graph 600 representing a user's purchasing behavior over a period of time, generated from sub-system 0 measurement data. In FIG. 6, circles 601 and 603 are nodes representing stores. Line 604 is an edge that represents a user's movement between stores. In FIG. 6, some nodes and edges are indicated by symbols as an example. A dashed range 602 indicates the node 601 and adjacent nodes of the node. As described above, attribute data (attribute vector) based on the user's measurement data is assigned to each node.

計測データは、各店舗に固有のデータを含み、ノードそれぞれの属性データに反映される。属性データは、計測データから得られる１以上の値からなるベクトルで表すことができる。店舗ノードの属性データは、例えば、店舗の利用者平均購買額、総売り上げ、利用者年齢平均、店舗面積、店舗種類等を示すことができる。 The measurement data includes data unique to each store and is reflected in the attribute data of each node. Attribute data can be represented by a vector consisting of one or more values obtained from measurement data. The attribute data of the store node can indicate, for example, the store's user average purchase amount, total sales, user age average, store area, store type, and the like.

サブシステム１の各ＴＯＦセンサによって前述の通り計測される計測データは、所定期間内の計測範囲内の利用者それぞれの位置の時間変化を含む。サブシステム１は、各ＴＯＦセンサの計測データに加えて、ショッピングモールの地図情報、並びに、各ＴＯＦセンサの属性情報、例えば、ＴＯＦセンサの位置及び姿勢の情報を、システム制御部１０３に送信することができる。地図情報は、店舗及び通路の幾何形状及び属性を示す。ショッピングモールの地図情報やＴＯＦセンサの属性情報は、システム管理者によりシステム制御部１０３に予め設定されていてもよい。 The measurement data measured by each TOF sensor of the subsystem 1 as described above includes temporal changes in the position of each user within the measurement range within a predetermined period. In addition to the measurement data of each TOF sensor, the subsystem 1 transmits map information of the shopping mall and attribute information of each TOF sensor, such as position and orientation information of the TOF sensor, to the system control unit 103. can be done. Map information indicates the geometry and attributes of stores and aisles. The map information of the shopping mall and the attribute information of the TOF sensor may be preset in the system control unit 103 by the system administrator.

計測データ・グラフ変換部１０９が生成するグラフは、ショッピングモールの通路内の予め設定された複数の代表地点を表すノード及び代表地点間の通路を表すエッジで構成することができる。計測データ・グラフ変換部１０９は、ショッピングモールの地図データからグラフのノード及びエッジを定義することができる。 The graph generated by the measurement data/graph conversion unit 109 can be composed of nodes representing a plurality of preset representative points in the passage of the shopping mall and edges representing passages between the representative points. The measurement data/graph conversion unit 109 can define the nodes and edges of the graph from the map data of the shopping mall.

計測データ・グラフ変換部１０９は、ＴＯＦセンサそれぞれの位置及び姿勢、並びに地図情報からＴＯＦセンサそれぞれの計測範囲を特定し、さらに、ＴＯＦセンサそれぞれの計測データから、代表地点それぞれの混雑度を算出する。混雑度は、例えば、上述のように利用者間の平均距離で表すことができる。 The measurement data/graph conversion unit 109 specifies the measurement range of each TOF sensor from the position and orientation of each TOF sensor and map information, and further calculates the degree of congestion at each representative point from the measurement data of each TOF sensor. . The degree of congestion can be represented, for example, by the average distance between users as described above.

計測データ・グラフ変換部１０９は、代表地点に対応するノードそれぞれの属性データに含める。属性データは、さらに、利用者の平均人数のような混雑度を示す他の値や、地点の属性を含むことができる。地点の属性は、例えば、特定の属性を有する店舗前の地点である、交差点である、休憩広場である、イベント会場である、等を示すことができる。 The measurement data/graph conversion unit 109 includes the attribute data of each node corresponding to the representative point. Attribute data can also include other values indicative of congestion, such as the average number of users, and location attributes. The attribute of a point can indicate, for example, a point in front of a store having a specific attribute, an intersection, a resting plaza, an event site, and the like.

図７は、サブシステム１の計測データに基づき生成された、所定期間における混雑度を表すグラフ７００の例を示す。図７において、円７０１や７０３は、店舗前の地点や交差点等の代表地点を表すノードである。線７０４は代表地点をつなぐ通路を表すエッジである。図７においては、一部の円及びエッジが、例として、符号で指示されている。破線７０２で示す範囲は、ノード７０１及びその隣接ノードを示す。上述のように、ノードそれぞれに対して、利用者の計測データに基づく属性データ（属性ベクトル）が割り当てられている。 FIG. 7 shows an example of a graph 700 representing the degree of congestion during a predetermined period, which is generated based on the measurement data of subsystem 1 . In FIG. 7, circles 701 and 703 are nodes representing representative points such as points in front of the store and intersections. A line 704 is an edge representing a path connecting representative points. In FIG. 7, some circles and edges are indicated by symbols as an example. The range indicated by dashed line 702 indicates node 701 and its neighboring nodes. As described above, attribute data (attribute vector) based on the user's measurement data is assigned to each node.

図５に戻って、次に、グラフ特徴抽出部１１０は、サブシステムそれぞれのグラフデータの特徴を抽出する（５０５）。本明細書の一実施例に係るグラフ特徴抽出部１１０は、グラフ畳み込みネットワークを使用する。グラフ畳み込みは、グラフの各ノードについて畳み込み演算を行い、具体的には、各ノードに接続されている隣接ノードの属性データ（属性ベクトル）に重みを掛けて加算する。なお、隣接ノード（１ステップのノード）より遠いノード（２ステップ以上のノード）の属性データを畳み込み演算に含めてもよい。 Returning to FIG. 5, next, the graph feature extraction unit 110 extracts features of the graph data of each subsystem (505). The graph feature extractor 110 according to one embodiment of this specification uses a graph convolutional network. In graph convolution, a convolution operation is performed for each node of a graph. Specifically, attribute data (attribute vectors) of adjacent nodes connected to each node are weighted and added. Note that attribute data of nodes (nodes of two or more steps) farther than adjacent nodes (nodes of one step) may be included in the convolution operation.

サブシステム０によるグラフデータのうち、図６のノード６０１に着目すると、ノード６０１に接続されている隣接ノード（範囲６０２内のノード）の属性データが、ノード６０１の属性データに畳み込まれる。これによって、ノード６０１に接続されている隣接ノードの属性データがノード６０１の属性データに反映され、ノード６０１周辺の属性データの傾向、つまりは特徴が求められる。 Focusing on the node 601 in FIG. 6 among the graph data by the subsystem 0, the attribute data of the adjacent nodes (nodes within the range 602) connected to the node 601 are folded into the attribute data of the node 601. FIG. As a result, the attribute data of the adjacent nodes connected to the node 601 are reflected in the attribute data of the node 601, and the tendency of the attribute data around the node 601, that is, the characteristics are obtained.

サブシステム１によるグラフデータについても同様である。図７のノード７０１に着目すると、ノード７０１に接続されている隣接ノード（範囲７０２内のノード）の属性データが、ノード７０１の属性データに畳み込まれる。他のサブシステムによって得られるグラフデータについても同様の畳み込み処理が行われる。 The graph data by the subsystem 1 is also the same. Focusing on the node 701 in FIG. 7, the attribute data of adjacent nodes (nodes within the range 702) connected to the node 701 are convoluted with the attribute data of the node 701. FIG. Similar convolution processing is performed on graph data obtained by other subsystems.

図８は、グラフ特徴抽出部１１０の論理構成例及び状態行動価値更新部１１１の一部の論理構成例を模式的に示す。グラフ特徴抽出部１１０は、サブシステムそれぞれの計測データに基づくグラフデータを、計測データ・グラフ変換部１０９から受け取る。図８は、例として、サブシステム０及び１それぞれのグラフデータを、符号８０１及び８０２で指示している。グラフデータは、グラフ構造及びノードそれぞれの属性データを示す。 FIG. 8 schematically shows a logical configuration example of the graph feature extraction unit 110 and a partial logical configuration example of the state-action-value update unit 111 . The graph feature extraction unit 110 receives graph data based on the measurement data of each subsystem from the measurement data/graph conversion unit 109 . FIG. 8, by way of example, indicates graph data for subsystems 0 and 1 respectively at 801 and 802 . The graph data indicates the graph structure and the attribute data of each node.

グラフ特徴抽出部１１０は、サブシステムそれぞれのグラフデータを、対応するグラフ畳み込みニューラルネットワーク８０７に入力する。グラフ畳み込みニューラルネットワーク８０７は、入力されたグラフの特徴量を抽出するモデルの例であり、当該モデルを構成するためのデータは記憶装置に格納されている。 The graph feature extractor 110 inputs the graph data of each subsystem to the corresponding graph convolutional neural network 807 . A graph convolutional neural network 807 is an example of a model for extracting feature amounts of an input graph, and data for constructing the model is stored in a storage device.

図８は、一つのグラフ畳み込みニューラルネットワークを、例として符号８０７で指示している。図８の構成例において、グラフ畳み込みニューラルネットワークは、複数の畳み込み層８０５で構成されている。図８は、例として、一つの畳み込み層を符号８０５で指示する。 FIG. 8 designates one graph convolutional neural network as an example at 807 . In the configuration example of FIG. 8 , the graph convolutional neural network is composed of multiple convolution layers 805 . FIG. 8 designates one convolutional layer at 805 as an example.

一つの畳み込み層８０５による畳み込み処理（１回の畳み込み処理）は、直接接続されている隣接ノードの属性データを、各ノードの属性データに畳み込む。複数の畳み込み層８０５が接続され、各属性データに対して、複数回の畳み込み処理が実行される。これにより、直接接続されていない、離れたノードの属性データが各ノードの属性データに畳み込まれ、各ノードから離れたノードの属性データを反映した特徴量が抽出される。 Convolution processing (one convolution processing) by one convolution layer 805 convolves the attribute data of directly connected adjacent nodes into the attribute data of each node. A plurality of convolution layers 805 are connected, and multiple convolution processes are performed on each attribute data. As a result, the attribute data of a distant node that is not directly connected is convoluted with the attribute data of each node, and a feature quantity reflecting the attribute data of a node distant from each node is extracted.

図５に戻って、次に、状態行動価値更新部１１１は、状態行動価値更新処理を実行する（５０６）。この処理は、強化学習の枠組みにおける状態行動価値の更新に対応する。状態は、グラフの構造に基づいたノードの属性データの畳み込み処理によって得られた特徴量を指す。 Returning to FIG. 5, next, the state-action-value update unit 111 executes state-action-value update processing (506). This process corresponds to updating state-action values in the framework of reinforcement learning. A state indicates a feature amount obtained by convolution processing of node attribute data based on the graph structure.

行動は、サブシステムそれぞれに対する制御データの候補群からの選択を指す。一つの制御データ候補は、例えば、スマートフォンアプリを用いた割引きクーポンと店舗位置の配信、デジタルサイネージとスマートフォンアプリを用いたショッピングモール内の場所毎の混雑状況の配信などを含むことができる。異なる制御データ候補は、例えば、異なる配信情報及び／又は異なる配信先を示すことができる。 Action refers to the selection from a candidate set of control data for each subsystem. One control data candidate can include, for example, distribution of discount coupons and store locations using a smartphone application, distribution of congestion conditions for each location in a shopping mall using digital signage and a smartphone application, and the like. Different control data candidates may indicate different delivery information and/or different delivery destinations, for example.

状態行動価値の更新は、１つ前の状態のもとで選択した行動の結果として得られた報酬をもとに、１つ前の状態でその行動を選択することの価値の推定値が、得られた報酬に一致するように、推定値又は推定に用いる関数のパラメータを更新する。 The update of the state action value is based on the reward obtained as a result of the action selected in the previous state, and the estimated value of choosing that action in the previous state is Update the parameters of the estimated value or function used for estimation to match the reward obtained.

図８を参照して、状態行動価値の更新の例を説明する。状態行動価値更新部１１１は、グラフ特徴抽出部１１０から、サブシステムそれぞれのグラフデータの特徴量（ベクトル）を受け取る。図８は、例として、購買行動の特徴量８０３及び混雑度の特徴量８０４を明示する。特徴量のセットは、現在の状態ｓｉを示す。 An example of updating the state action value will be described with reference to FIG. The state-action-value updating unit 111 receives the feature amount (vector) of the graph data of each subsystem from the graph feature extracting unit 110 . FIG. 8 clearly shows a purchasing behavior feature quantity 803 and a congestion degree feature quantity 804 as an example. A set of features indicates the current state si.

状態行動価値更新部１１１は、全結合層８０６を含み、グラフ特徴抽出部１１０から特徴量を入力する。全結合層８０６は、状態ｓｉにおける行動ａ０からａＭのＱ値を出力する。Ｍは自然数である。図８は、例として、行動ａ０のＱ値（ｓｉ，ａ０）８０８及び行動ａＭのＱ値（ｓｉ，ａＭ）８０９を明示する。Ｑ値は、行動価値を示す。状態行動価値更新部１１１は、算出されたＱ値と実際に得られた報酬との差（誤差）が小さくなるように、グラフ畳み込みニューラルネットワーク８０７及び全結合層８０６のニューラルネットワークのパラメータを更新する。 The state-action-value update unit 111 includes a fully connected layer 806 and receives feature quantities from the graph feature extraction unit 110 . Fully connected layer 806 outputs the Q-values of actions a0 through aM in state si. M is a natural number. FIG. 8 demonstrates, as an example, the Q-value (si, a0) 808 of action a0 and the Q-value (si, aM) 809 of action aM. The Q value indicates action value. The state action value updating unit 111 updates the parameters of the graph convolution neural network 807 and the neural network of the fully connected layer 806 so that the difference (error) between the calculated Q value and the actually obtained reward becomes small. .

全結合層８０６は、実行システムを制御する制御データを決定するための指標値をグラフ特徴抽出部１１０から特徴量に基づき決定するモデルの例である。当該モデルを構成するためのデータは、記憶装置に格納される。全結合層と異なるモデルが利用されてもよい。 The fully connected layer 806 is an example of a model that determines index values for determining control data for controlling the execution system based on feature amounts from the graph feature extraction unit 110 . Data for constructing the model is stored in a storage device. Different models than fully connected layers may be used.

図５に戻って、次に、行動選択部１１２は、行動選択処理を実行する（５０７）。行動選択部１１２は、現在の状態で取り得る行動のうち、価値が高い行動を高確率で選択する。次に、行動・制御データ変換部１１３は、行動・制御データの変換処理を実行する（５０８）。行動は、前述のとおり、スマートフォンアプリを用いた割引きクーポンや店舗位置の配信などに対応するが、これを実現するため、各デバイスに対応するデータなどに変換される。 Returning to FIG. 5, next, the action selection unit 112 executes action selection processing (507). The action selection unit 112 selects a high-value action with a high probability from actions that can be taken in the current state. Next, the action/control data conversion unit 113 executes action/control data conversion processing (508). As mentioned above, actions correspond to the distribution of discount coupons and store locations using smartphone applications, but in order to realize this, they are converted into data corresponding to each device.

次に、制御データ送信部１１４は、制御データ送信処理を実行する（５０９）。これにより、各サブシステムの制御部に制御データが送信され、各制御部は、制御データに基づく処理を実行する。例えば、前述のスマートフォンアプリへのクーポンの配信や、サイネージへの混雑状況の提示などが行われる。 Next, the control data transmission unit 114 executes control data transmission processing (509). Thereby, the control data is transmitted to the control unit of each subsystem, and each control unit executes processing based on the control data. For example, distribution of coupons to the above-mentioned smartphone application, presentation of congestion status on signage, etc. are performed.

以上の一連の処理を１エピソードとして、学習管理部１１５は、複数エピソードを実行することで学習を進める。学習管理部１１５は、各エピソード完了後に、学習管理判定処理を行う（５１０）。例えば、エピソード回数が所定数に達する、又は、エピソードの報酬の総和の変化率が閾値より小さくなった場合、学習完了と判定される。学習未完了の場合（５１０：未完了）、フローは、ステップ５０２に戻る。 With the series of processes described above as one episode, the learning management unit 115 advances learning by executing a plurality of episodes. After completing each episode, the learning management unit 115 performs learning management determination processing (510). For example, when the number of episodes reaches a predetermined number, or when the change rate of the sum of episode rewards becomes smaller than a threshold, learning is determined to be completed. If learning is not completed (510: not completed), the flow returns to step 502;

学習が完了の場合（５１０：完了）、学習管理部１１５は、ロバスト学習完了判定処理を実行する（５１１）。学習管理部１１５は、通常学習の後に、計測データに誤差を印加した状態での学習を既に行ったか判定する。誤差を印加して学習が行われてない場合（５１１：未完了）、誤差生成部１１６は、誤差を設定する（５１２）。その後、フローはステップ５０２に戻り、計測データに誤差が印加された状態で前述の一連の処理が行われる。 When learning is completed (510: completed), the learning management unit 115 executes robust learning completion determination processing (511). After the normal learning, the learning management unit 115 determines whether or not learning has already been performed with an error applied to the measurement data. If learning is not performed by applying an error (511: incomplete), the error generator 116 sets an error (512). Thereafter, the flow returns to step 502, and the series of processes described above are performed with the error applied to the measurement data.

誤差は、計測データのうち、数値データに加算される。例えば、ＴＯＦセンサにより計測される利用者の数に対して誤差を加算することができる。グラフの属性データに誤差がのった状態で、学習が行われる。初期の学習で得られた学習結果を基準に、さらに誤差が加わった状況での学習が行われることになり、誤差に対してよりロバストな制御を学習することができる。 An error is added to numerical data among measurement data. For example, an error can be added to the number of users measured by the TOF sensor. Learning is performed with errors in the attribute data of the graph. Based on the learning result obtained in the initial learning, learning is performed in a situation in which an error is further added, making it possible to learn more robust control against errors.

以上のような、誤差が印加された状態での学習を行い、学習管理部１１５は、ロバスト学習完了判定処理を実行する（５１１）。学習が完了と判定されれば（５１１：完了）、本処理は終了する。例えば所定のエピソード回数に達するか、エピソード毎の報酬の総和の変化率が閾値より小さくなった場合、ロバスト学習処理が完了と判定される。なお、ロバスト学習未完了の場合（５１１：未完了）、誤差を変更して、学習・行動選択部の処理が再度実行される。 Learning is performed with the error applied as described above, and the learning management unit 115 executes robust learning completion determination processing (511). If it is determined that the learning is completed (511: completed), this process ends. For example, when a predetermined number of episodes is reached or when the rate of change in the sum of rewards for each episode becomes smaller than a threshold, it is determined that the robust learning process is completed. If the robust learning is not completed (511: incomplete), the error is changed and the processing of the learning/behavior selecting unit is executed again.

図５に示す処理は、実環境１０６における実行システムの運用を行いながら、学習モデルの機械学習を行う。図５に示す機械学習の完了後、システム制御部１０３は、学習済みのモデルを使用して、計測データに基づく実行システムの制御を実行できる、又は、システム制御部１０３は、図５に示すフローを終了することなく続けてもよい。システム制御部１０３による学習処理は、実際の環境１０６に対して実行されてもよく、シミュレーションによる環境１０６に対して実行されてもよい。 The processing shown in FIG. 5 performs machine learning of the learning model while operating the execution system in the real environment 106 . After completing the machine learning shown in FIG. 5, the system control unit 103 can use the trained model to control the execution system based on the measurement data, or the system control unit 103 can perform the flow shown in FIG. may continue without ending. The learning process by the system control unit 103 may be performed on the actual environment 106 or may be performed on the simulated environment 106 .

以上により、システムは、サブシステムのＫＰＩを報酬という形で、バランスをとりながら、それらの総和が大きくなるような各サブシステムの制御則を獲得する。 As described above, the system acquires the control law of each subsystem that increases the sum of the KPIs of the subsystems in the form of a reward while maintaining a balance.

なお、以上の実施例は、ノードとエッジのうち、ノードのみに属性データを付加する。他の例は、エッジにも属性データを付加してこれらを畳み込み処理してもよい。また、特徴抽出のための、グラフニューラルネットワークは、上記構成に限定されない。例えば、プーリング層やドロップアウト層を、畳み込み層と組み合わせることで、特徴の強調などを図ることができる。また、グラフデータの特徴量の抽出のため、畳み込みやニューラルネットワークと異なるモデルが使用されてもよい。 In the above embodiment, attribute data is added only to nodes, out of nodes and edges. Another example is to add attribute data to edges and convolve them. Also, the graph neural network for feature extraction is not limited to the above configuration. For example, by combining a pooling layer or a dropout layer with a convolution layer, it is possible to emphasize features. Also, a model different from convolution or neural network may be used for extracting feature values of graph data.

上記例は、強化学習の枠組みにおいて、行動価値に基づき実行システムに対するデータを決定し、行動価値及び報酬に基づき学習モデルのパラメータを更新する。システムは、他の手法に基づいて、サブモデルのグラフから抽出された特徴量に基づき実行システムを制御してもよい。 The above example determines data for the execution system based on behavioral values and updates learning model parameters based on behavioral values and rewards in a reinforcement learning framework. The system may control the execution system based on features extracted from graphs of sub-models according to other techniques.

本明細書の一実施例に係るシステムは、異なる計測データのサブシステムを含む実行システムの制御に適用できる。一部のサブシステムの計測データは同種でもよく、同種の計測データのサブシステムからなる実行システムに、本明細書の一実施例のシステムを適用してもよい。 A system according to one embodiment of the present disclosure can be applied to control an execution system that includes different metrology data subsystems. The measurement data of some subsystems may be of the same type, and the system of one embodiment of the present specification may be applied to an execution system comprising subsystems of the same type of measurement data.

なお、本発明は上記した実施例に限定されるものではなく、様々な変形例が含まれる。例えば、上記した実施例は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明したすべての構成を備えるものに限定されるものではない。また、ある実施例の構成の一部を他の実施例の構成に置き換えることが可能であり、また、ある実施例の構成に他の実施例の構成を加えることも可能である。また、各実施例の構成の一部について、他の構成の追加・削除・置換をすることが可能である。 In addition, the present invention is not limited to the above-described embodiments, and includes various modifications. For example, the above-described embodiments have been described in detail in order to explain the present invention in an easy-to-understand manner, and are not necessarily limited to those having all the described configurations. In addition, it is possible to replace part of the configuration of one embodiment with the configuration of another embodiment, and it is also possible to add the configuration of another embodiment to the configuration of one embodiment. Moreover, it is possible to add, delete, or replace a part of the configuration of each embodiment with another configuration.

また、上記の各構成・機能・処理部等は、それらの一部又は全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また、上記の各構成、機能等は、プロセッサがそれぞれの機能を実現するプログラムを解釈し、実行することによりソフトウェアで実現してもよい。各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリや、ハードディスク、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の記録装置、または、ＩＣカード、ＳＤカード等の記録媒体に置くことができる。 Further, each of the configurations, functions, processing units, etc. described above may be realized by hardware, for example, by designing a part or all of them using an integrated circuit. Moreover, each of the above configurations, functions, etc. may be realized by software by a processor interpreting and executing a program for realizing each function. Information such as programs, tables, and files that implement each function can be stored in recording devices such as memories, hard disks, SSDs (Solid State Drives), or recording media such as IC cards and SD cards.

また、制御線や情報線は説明上必要と考えられるものを示しており、製品上必ずしもすべての制御線や情報線を示しているとは限らない。実際には殆どすべての構成が相互に接続されていると考えてもよい。 In addition, the control lines and information lines indicate those considered necessary for explanation, and not all control lines and information lines are necessarily indicated on the product. In fact, it may be considered that almost all configurations are interconnected.

１０１、１０２計測部、１０３システム制御部、１０４、１０５制御部、１０６環境、１０７グラフ処理部、１０８計測データ受信部、１０９計測データ・グラフ変換部、１１０グラフ特徴抽出部、１１１状態行動価値更新部、１１２行動選択部、１１３行動・制御データ変換部、１１４制御データ送信部、１１５学習管理部、１１６誤差生成部、１１７計測データ・報酬変換部、１５１プロセッサ、１５２ＤＲＡＭ、１５３通信装置、１５３モニタ、１５４補助記憶装置、１５５入力装置、２００ショッピングモール、２０５、３０１店舗、３０２通路、３０３利用者の移動、４０１計測範囲、４０２センサ視点、６００、７００グラフデータ、６０１、６０３、７０１、７０３ノード、６０２、７０２隣接ノード、６０４、７０４エッジ、８０３、８０４特徴量、８０５畳み込み層、８０６全結合層、８０７グラフ畳み込みニューラルネットワーク、８０８、８０９Ｑ値 101, 102 measurement unit 103 system control unit 104, 105 control unit 106 environment 107 graph processing unit 108 measurement data reception unit 109 measurement data/graph conversion unit 110 graph feature extraction unit 111 state action value update Unit 112 Action Selection Unit 113 Action/Control Data Conversion Unit 114 Control Data Transmission Unit 115 Learning Management Unit 116 Error Generation Unit 117 Measurement Data/Reward Conversion Unit 151 Processor 152 DRAM 153 Communication Device 153 Monitor, 154 Auxiliary storage device, 155 Input device, 200 Shopping mall, 205, 301 Store, 302 Passage, 303 Movement of user, 401 Measurement range, 402 Sensor viewpoint, 600, 700 Graph data, 601, 603, 701, 703 node, 602, 702 adjacent node, 604, 704 edge, 803, 804 feature quantity, 805 convolution layer, 806 fully connected layer, 807 graph convolution neural network, 808, 809 Q value

Claims

A system for controlling an execution system comprising multiple subsystems, comprising:
one or more processors;
and one or more storage devices that store programs executed by the one or more processors,
The one or more storage devices are
storing data for constructing a first model for extracting feature values of each input graph;
The one or more processors
obtaining measurement data from each of the plurality of subsystems;
generating a graph representing the metrology data for each of the plurality of subsystems based on the metrology data from each of the plurality of subsystems;
extracting features from each of the graphs using the first model;
A system that controls the execution system based on the feature quantity.

2. The system of claim 1, wherein
The one or more storage devices are
data for configuring a second model for determining an index value for determining control data for controlling the execution system based on the feature value;
The one or more processors
using the second model to obtain an index value based on the feature amount of the graph;
A system that determines control data for the executing system based on the index value.

2. The system of claim 1, wherein
the first model includes a sub-model for each of the plurality of subsystems;
The system, wherein each of the sub-models includes multiple convolutional layers.

2. The system of claim 1, wherein
The system, wherein the metrology data of the plurality of subsystems includes different types of metrology data.

2. The system of claim 1, wherein
The system, wherein the measurement data of the plurality of subsystems are measured in the same period.

3. The system of claim 2, wherein
The system, wherein the one or more processors update parameters of the first model and the second model based on the obtained index value.

7. The system of claim 6, wherein
index values from the second model represent behavioral values for the plurality of subsystems;
The one or more processors
selecting an action for the plurality of subsystems based on the value of the action;
determining a reward for the selected behavior based on measured data from the plurality of subsystems;
A system that updates parameters of the first model and the second model based on the value and the reward.

7. The system of claim 6, wherein
The system, wherein the one or more processors use the graph plus error data to update the parameters of the first model and the second model.

2. The system of claim 1, wherein
the execution system controls the flow of people in the shopping mall;
measurement data of a first subsystem in the plurality of subsystems includes data related to sales at stores in the shopping mall;
The system, wherein the metrology data of a second subsystem in the plurality of subsystems includes data regarding the location of the person in the shopping mall.

A method for a system to control an execution system comprising multiple subsystems, comprising:
The system stores data for constructing a first model that extracts features of each input graph,
The method includes
the system obtaining metrology data from each of the plurality of subsystems;
the system generating graphs representing metrology data for each of the plurality of subsystems based on metrology data from each of the plurality of subsystems;
the system extracting features from each of the graphs using the first model;
A method comprising: said system controlling said execution system based on said feature quantity.