JP2020109551A

JP2020109551A - Information processing device and information processing system

Info

Publication number: JP2020109551A
Application number: JP2018248665A
Authority: JP
Inventors: 嘉以三原; Kai Mihara; 勇一朗池田; Yuichiro Ikeda
Original assignee: Fujitsu Client Computing Ltd
Current assignee: Fujitsu Client Computing Ltd
Priority date: 2018-12-28
Filing date: 2018-12-28
Publication date: 2020-07-16
Anticipated expiration: 2038-12-28
Also published as: JP6614324B1

Abstract

To provide an information processing device and an information processing system which can increase processing efficiency of a plurality of processors.SOLUTION: An information processing device is disclosed in the present application. In one aspect, the information processing device comprises: a receiving unit which receives resource information about processing state from each of N processors where N is an integer greater than or equal to two; a calculation unit which fits the received resource information of the N processors to a computational expression to calculate a processing time for a processing request by an application for each of the N processors; a selection unit which selects one processor from the N processors on the basis of the processing time of the N processors; and a transmission unit which transmits the processing request to the one processor.SELECTED DRAWING: Figure 2

Description

本発明は、情報処理装置及び情報処理システムに関する。 The present invention relates to an information processing device and an information processing system.

複数のプロセッサが通信可能に接続された情報処理装置では、複数のプロセッサに処理を振り分ける並列分散制御が行われることがある。 In an information processing apparatus in which a plurality of processors are communicably connected, parallel distributed control may be performed in which processing is distributed to the plurality of processors.

特開２０１７−３７５３３号公報JP, 2017-37533, A

しかしながら、情報処理装置において、並列分散制御が固定的に行われると、複数のプロセッサによる処理効率が低下する可能性がある。 However, if parallel distributed control is fixedly performed in the information processing apparatus, the processing efficiency of the plurality of processors may be reduced.

開示の技術は、上記に鑑みてなされたものであって、複数のプロセッサによる処理効率を向上できる情報処理装置及び情報処理システムを提供することを目的とする。 The disclosed technology has been made in view of the above, and an object thereof is to provide an information processing apparatus and an information processing system that can improve the processing efficiency of a plurality of processors.

本願の開示する情報処理装置は、一つの態様において、Ｎを２以上の整数とするとき、処理状況に関するリソース情報をＮ個のプロセッサのそれぞれから受信する受信部と、前記受信されたＮ個のプロセッサのリソース情報を計算式に適用して、アプリケーションによる処理要求に対する処理時間を前記Ｎ個のプロセッサのそれぞれについて計算する計算部と、前記Ｎ個のプロセッサの処理時間に基づいて、前記Ｎ個のプロセッサから１つのプロセッサを選択する選択部と、前記処理要求を前記１つのプロセッサへ送信する送信部とを有する。 In one aspect, the information processing apparatus disclosed in the present application, when N is an integer of 2 or more, a receiving unit that receives resource information regarding a processing status from each of N processors, and the received N pieces of resource information. A calculation unit that applies the resource information of the processor to a calculation formula to calculate the processing time for the processing request by the application for each of the N processors, and the N calculation units based on the processing times of the N processors. It has a selection part which selects one processor from a processor, and a transmission part which transmits the processing request to the one processor.

本願の開示する情報処理システムは、一つの態様において、上記の情報処理装置と、それぞれが前記情報処理装置と通信可能に接続された複数のプロセッサとを有する。 In one aspect, an information processing system disclosed in the present application includes the above information processing device and a plurality of processors, each of which is communicably connected to the information processing device.

本願の開示する情報処理装置の一つの態様によれば、複数のプロセッサによる処理効率を向上できる。 According to one aspect of the information processing device disclosed in the present application, the processing efficiency of the plurality of processors can be improved.

図１は、実施形態にかかる情報処理システムのハードウェア構成を示す図である。FIG. 1 is a diagram showing a hardware configuration of an information processing system according to the embodiment. 図２は、実施形態における情報処理装置の機能構成を示す図である。FIG. 2 is a diagram illustrating a functional configuration of the information processing device according to the embodiment. 図３は、実施形態における推論処理装置の機能構成を示す図である。FIG. 3 is a diagram illustrating a functional configuration of the inference processing device according to the embodiment. 図４は、実施形態における判別パラメータ情報のデータ構造を示す図である。FIG. 4 is a diagram showing a data structure of the discrimination parameter information in the embodiment. 図５は、実施形態における判別パラメータ情報の更新処理（コプロセッサＡ、処理Ｊの場合）を示す図である。FIG. 5 is a diagram showing an update process (in the case of coprocessor A and process J) of the discrimination parameter information in the embodiment. 図６は、実施形態における判別パラメータ情報の更新処理（コプロセッサＡ、処理Ｋの場合）を示す図である。FIG. 6 is a diagram showing an update process (in the case of coprocessor A and process K) of the discrimination parameter information in the embodiment. 図７は、実施形態における判別パラメータ情報の更新処理（コプロセッサＢ、処理Ｊの場合）を示す図である。FIG. 7 is a diagram showing an update process (in the case of coprocessor B and process J) of the discrimination parameter information in the embodiment. 図８は、実施形態における推論処理装置のモデルファイルを用いた動作を示す図である。FIG. 8 is a diagram showing an operation using a model file of the inference processing device in the embodiment. 図９は、実施形態における共通フォーマットからモデルファイルの入力層のフォーマットへの変換処理を示す図である。FIG. 9 is a diagram showing a conversion process from the common format to the format of the input layer of the model file in the embodiment. 図１０は、実施形態におけるモデルファイルの出力層のフォーマットから共通フォーマットへの変換処理を示す図である。FIG. 10 is a diagram illustrating conversion processing from the format of the output layer of the model file to the common format in the embodiment. 図１１は、実施形態における推論処理装置の他のモデルファイルを用いた動作を示す図である。FIG. 11 is a diagram showing an operation using another model file of the inference processing device in the embodiment. 図１２は、実施形態における共通フォーマットから他のモデルファイルの入力層のフォーマットへの変換処理を示す図である。FIG. 12 is a diagram showing a conversion process from the common format to the format of the input layer of another model file in the embodiment. 図１３は、実施形態における他のモデルファイルの出力層のフォーマットから共通フォーマットへの変換処理を示す図である。FIG. 13 is a diagram showing a conversion process from another model file output layer format to a common format in the embodiment. 図１４は、実施形態の変形例における推論処理装置の機能構成を示す図である。FIG. 14 is a diagram illustrating a functional configuration of the inference processing device according to the modified example of the embodiment. 図１５は、実施形態の他の変形例における情報処理システムのハードウェア構成を示す図である。FIG. 15 is a diagram illustrating a hardware configuration of an information processing system according to another modification of the embodiment. 図１６は、実施形態の他の変形例における情報処理システムのソフトウェア構成を示す図である。FIG. 16 is a diagram showing a software configuration of an information processing system in another modification of the embodiment. 図１７は、実施形態の他の変形例におけるＰＣＩｅブリッジコントローラのハードウェア構成を示す図である。FIG. 17 is a diagram illustrating a hardware configuration of a PCIe bridge controller according to another modification of the embodiment. 図１８は、実施形態の他の変形例におけるＰＣＩｅのレイヤ構成を示す図である。FIG. 18 is a diagram illustrating a PCIe layer configuration according to another modification of the embodiment. 図１９は、実施形態の他の変形例におけるコプロセッサＧからの他のプロセッサの見え方を示す図である。FIG. 19 is a diagram showing how another processor looks from the coprocessor G in another modification of the embodiment. 図２０は、実施形態の他の変形例におけるコプロセッサＤからの他のプロセッサの見え方を示す図である。FIG. 20 is a diagram showing how another processor looks from the coprocessor D in another modification of the embodiment. 図２１は、実施形態の他の変形例におけるＰＣＩｅブリッジコントローラを介したプロセッサ間のデータ転送処理を示す図である。FIG. 21 is a diagram illustrating a data transfer process between processors via a PCIe bridge controller according to another modification of the embodiment.

以下に、本願の開示する情報処理システムの実施形態を図面に基づいて詳細に説明する。なお、この実施形態により開示技術が限定されるものではない。また、実施形態において同一の機能を有する構成には同一の符号を付し、重複する説明は省略される。 Hereinafter, an embodiment of an information processing system disclosed in the present application will be described in detail with reference to the drawings. The disclosed technology is not limited to this embodiment. Further, in the embodiments, configurations having the same function are denoted by the same reference numerals, and overlapping description will be omitted.

（実施形態）
実施形態にかかる情報処理システムは、メインプロセッサ及び複数のコプロセッサを有し、複数のコプロセッサに処理を振り分ける並列分散制御がメインプロセッサ側で行われる並列分散システムである。情報処理システムでは、メインプロセッサ側における並列分散制御が固定的に行われると、複数のコプロセッサによる処理効率が低下する可能性がある。 (Embodiment)
The information processing system according to the embodiment is a parallel distributed system having a main processor and a plurality of coprocessors, and parallel distributed control in which processing is distributed to the plurality of coprocessors is performed on the main processor side. In the information processing system, if the parallel distributed control is fixedly performed on the main processor side, the processing efficiency of the plurality of coprocessors may decrease.

例えば、複数のコプロセッサにおける演算処理能力の差を考慮して、負荷の大きい処理を演算処理能力の高いコプロセッサに割り振り、負荷の小さい処理を演算処理能力の低いコプロセッサに割り振る制御が考えられる。この制御では、並列分散システム内に特定の処理に対して高速であるようなコプロセッサが混在している場合、複数のコプロセッサにおける現在のリソースの空き状況によっては、適切な（例えば、最適な）並列分散制御とならないことがある。 For example, in consideration of the difference in the arithmetic processing capacities of a plurality of coprocessors, control that allocates a processing with a large load to a coprocessor with a high processing capacity and allocates a processing with a small load to a coprocessor with a low processing capacity can be considered. .. In this control, when coprocessors that are high-speed for a specific process are mixed in the parallel distributed system, it may be appropriate (for example, the optimum coprocessor depending on the current availability of resources in the coprocessors). ) It may not be parallel distributed control.

また、各コプロセッサ（各推論処理装置）が予め機械学習により生成された学習済みモデルを有する場合、各推論処理装置では、その学習済みモデルを用いて様々な推論処理を行うことがある。このとき、推論処理装置において、様々な推論処理に対して、同じ学習済みモデルが用いられると、推論処理が効率的に行われない可能性がある。 When each coprocessor (each inference processing device) has a learned model generated by machine learning in advance, each inference processing device may perform various inference processes using the learned model. At this time, if the same learned model is used for various inference processes in the inference processing device, the inference process may not be performed efficiently.

そこで、本実施形態では、情報処理システムにおいて、複数のコプロセッサから収集されたリソース情報をコプロセッサの個数分の計算式にそれぞれ適用して処理時間を計算し、得られた複数のコプロセッサの処理時間に基づいて、推論処理を割り振るコプロセッサを選択して推論要求を送信することで、並列分散制御のリアルタイム的な効率化を目指す。 Therefore, in the present embodiment, in the information processing system, the resource information collected from the plurality of coprocessors is applied to the calculation formulas corresponding to the number of coprocessors to calculate the processing time, and the obtained coprocessors are processed. We aim to improve the efficiency of parallel distributed control in real time by selecting a coprocessor to allocate inference processing based on processing time and sending an inference request.

また、推論処理を識別する推論処理識別子が推論要求に含まれるようにし、各コプロセッサ（各推論処理装置）において、複数のモデルファイルのうち受信された推論要求に含まれた推論処理識別子に対応するモデルファイルを特定して読み込み、推論処理を実行させることで、各コプロセッサ（各推論処理装置）における推論処理の効率化を目指す。 Further, the inference processing identifier for identifying the inference processing is included in the inference request so that each coprocessor (each inference processing device) corresponds to the inference processing identifier included in the received inference request among the plurality of model files. The inference processing in each coprocessor (each inference processing device) is made efficient by specifying the model file to be read and executing the inference processing.

具体的には、情報処理システムにおいて、処理内容及びコプロセッサの組み合わせに対応付けられた複数の計算式を用意する。各計算式は、複数のコプロセッサに対応した複数のパラメータ（複数の判別パラメータ）を含み、複数のコプロセッサのリソース情報を適用（代入）して処理時間が求められるように構成されている。情報処理システムは、用意された複数の計算式のうちアプリケーションによる推論要求に対応したコプロセッサの個数分の計算式に、それぞれ、複数のコプロセッサから収集されたリソース情報を適用して処理時間を計算する。情報処理システムは、計算で得られた複数のコプロセッサの処理時間に基づいて、処理を割り振るコプロセッサを選択して推論要求を送信する。例えば、情報処理システムは、処理時間が最も短いコプロセッサを選択して推論要求を送信することができる。これにより、性能差として演算処理能力に加えて処理内容も考慮できるので、並列分散制御を高精度化できるとともに、現在のリソースの空き状況を考慮できるので、並列分散制御をリアルタイム的に効率化できる。 Specifically, in the information processing system, a plurality of calculation formulas associated with combinations of processing contents and coprocessors are prepared. Each calculation formula includes a plurality of parameters (a plurality of discrimination parameters) corresponding to a plurality of coprocessors, and is configured to apply (substitute) the resource information of the plurality of coprocessors to obtain the processing time. The information processing system applies the resource information collected from the plurality of coprocessors to the calculation formulas corresponding to the number of coprocessors corresponding to the inference request by the application among the prepared formulas to reduce the processing time. calculate. The information processing system selects a coprocessor to which processing is to be allocated and transmits an inference request based on the processing times of the plurality of coprocessors obtained by calculation. For example, the information processing system can select the coprocessor with the shortest processing time and send the inference request. As a result, since the processing content can be considered as a performance difference in addition to the processing capacity, the parallel distributed control can be made highly accurate, and the current availability of resources can be considered, so that the parallel distributed control can be made efficient in real time. ..

また、情報処理システムは、推論処理識別子と入力データとを含む推論要求を、その選択したコプロセッサ（各推論処理装置）に送信する。複数のコプロセッサ（複数の推論処理装置）は、それぞれ、互いに異なる推論処理に対応した複数のモデルファイルを有する。複数のモデルファイルは、それぞれ、互いに異なる推論処理に対して適切化されるように機械学習が行われた学習済み推論モデルに対応している。推論要求を受信したコプロセッサ（推論処理装置）は、推論要求に含まれた推論処理識別子に対応するモデルファイルを特定し、特定されたモデルファイルを読み込み、推論処理を実行する。これにより、様々な推論処理に対して同じ学習済みモデルが用いられる場合に比較して、推論処理毎に機械学習で効率化されたモデルファイルを用いることができるので、各コプロセッサ（各推論処理装置）における推論処理を効率化できる。 Further, the information processing system transmits an inference request including the inference processing identifier and the input data to the selected coprocessor (each inference processing device). The plurality of coprocessors (the plurality of inference processing devices) each have a plurality of model files corresponding to different inference processes. Each of the plurality of model files corresponds to a learned inference model in which machine learning has been performed so as to be suitable for different inference processes. The coprocessor (inference processing device) that has received the inference request identifies the model file corresponding to the inference processing identifier included in the inference request, reads the identified model file, and executes the inference processing. As a result, compared to the case where the same learned model is used for various inference processes, a model file that has been made efficient by machine learning can be used for each inference process, so that each coprocessor (each inference process) can be used. The reasoning process in the device can be made efficient.

具体的には、情報処理システム１は、図１及び図２のように構成され得る。図１は、情報処理システム１のハードウェア構成を示す図である。図２は、情報処理装置１００の機能構成を示す図である。 Specifically, the information processing system 1 can be configured as shown in FIGS. 1 and 2. FIG. 1 is a diagram showing a hardware configuration of the information processing system 1. FIG. 2 is a diagram showing a functional configuration of the information processing device 100.

Ｎを任意の２以上の整数とするとき、情報処理システム１は、情報処理装置１００、中継装置２００、及びＮ個の推論処理装置３００−１〜３００−Ｎを有する。 When N is an arbitrary integer of 2 or more, the information processing system 1 includes an information processing device 100, a relay device 200, and N inference processing devices 300-1 to 300-N.

情報処理装置１００は、ハードウェア構成として、図１に示すように、マザーボード１０１、メインプロセッサ１０２、ディスプレイ（Ｄｉｓｐｌａｙ）１０３、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）インタフェース１０４、イーサネット（登録商標）（Ｅｔｈｅｒｎｅｔ）インタフェース１０５、ＤＩＭＭ（ＤｕａｌＩｎｌｉｎｅＭｅｍｏｒｙＭｏｄｕｌｅ）１０６、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）１０７、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）１０８、及びＴＰＭ（ＴｒｕｓｔｅｄＰｌａｔｆｏｒｍＭｏｄｕｌｅ）１０９を有する。 The information processing apparatus 100 has, as a hardware configuration, a motherboard 101, a main processor 102, a display (Display) 103, a USB (Universal Serial Bus) interface 104, and an Ethernet (Ethernet) interface 105 as illustrated in FIG. 1. , A DIMM (Dual Inline Memory Module) 106, an SSD (Solid State Drive) 107, an HDD (Hard Disk Drive) 108, and a TPM (Trusted Platform Module) 109.

図１に示すマザーボード１０１は、情報処理装置１００の主機能を担う部品が装着された基板である。メインプロセッサ１０２は、情報処理装置１００の主機能を担うプロセッサであり、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＭＰＵ（ＭｉｃｒｏＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）等の電子回路を採用できる。ディスプレイ１０３は、各種の情報を表示する表示部として機能する。ＵＳＢインタフェース１０４は、ＵＳＢデバイスが接続可能であり、ＵＳＢデバイスとメインプロセッサ１０２との通信を媒介可能である。イーサネットインタフェース１０５は、イーサネットケーブルが接続可能であり、イーサネットケーブルを介した外部機器とメインプロセッサ１０２との通信を媒介可能である。ＤＩＭＭ１０７は、各種の情報を一時記憶可能なＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）などの揮発性の記憶媒体である。ＳＳＤ１０７及びＨＤＤ１０８は、各種の情報を電源断後も記憶可能な不揮発性の記憶媒体である。ＴＰＭ１０９は、システムのセキュリティ機能を実現するモジュールである。 The mother board 101 shown in FIG. 1 is a board on which components having the main function of the information processing apparatus 100 are mounted. The main processor 102 is a processor having a main function of the information processing apparatus 100, and can adopt an electronic circuit such as a CPU (Central Processing Unit) and an MPU (Micro Processing Unit). The display 103 functions as a display unit that displays various kinds of information. A USB device can be connected to the USB interface 104, and communication between the USB device and the main processor 102 can be performed. An Ethernet cable can be connected to the Ethernet interface 105, and communication between an external device and the main processor 102 via the Ethernet cable can be mediated. The DIMM 107 is a volatile storage medium such as a RAM (Random Access Memory) capable of temporarily storing various information. The SSD 107 and the HDD 108 are non-volatile storage media that can store various types of information even after power is turned off. The TPM 109 is a module that realizes the security function of the system.

また、情報処理装置１００は、機能構成として、図２に示すように、上位アプリケーション１１０−１〜１１０−ｎ（ｎは任意の２以上の整数）、ＡＰＩ（ＡｐｐｌｉｃａｔｉｏｎＰｒｏｇｒａｍｍｉｎｇＩｎｔｅｒｆａｃｅ）１２０、及びＡＩクラスタ管理部１３０を有する。図２に示す各機能構成は、メインプロセッサ１０２内で機能的に構成され得る。 In addition, as shown in FIG. 2, the information processing apparatus 100 has, as a functional configuration, upper applications 110-1 to 110-n (n is an integer of 2 or more), an API (Application Programming Interface) 120, and an AI cluster. It has a management unit 130. Each functional configuration shown in FIG. 2 can be functionally configured in the main processor 102.

図１に示す中継装置２００は、ブリッジボード２０１及びブリッジコントローラ２０２を有する。ブリッジボード２０１は、ブリッジコントローラ２０２が搭載されるための基板である。ブリッジコントローラ２０２は、情報処理装置１００に対して複数の推論処理装置３００をブリッジ接続しており、情報処理装置１００及び複数の推論処理装置３００の間の通信を媒介（中継）する。 The relay device 200 illustrated in FIG. 1 includes a bridge board 201 and a bridge controller 202. The bridge board 201 is a board on which the bridge controller 202 is mounted. The bridge controller 202 bridges a plurality of inference processing devices 300 to the information processing device 100, and mediates (relays) communication between the information processing device 100 and the plurality of inference processing devices 300.

複数の推論処理装置３００−１〜３００−Ｎは、互いに並列して中継装置２００に接続されている。各推論処理装置３００−１〜３００−Ｎは、変換ボード（ｃｏｎｖ．ボード）３０１−１〜３０１−Ｎ及びコプロセッサ３０２−１〜３０２−Ｎを有する。変換ボード３０１は、アクセラレータボードとも呼ばれ、情報処理システム１の処理能力を高めるために、追加して利用するハードウェアが搭載された基板である。 The plurality of inference processing devices 300-1 to 300-N are connected to the relay device 200 in parallel with each other. Each of the inference processing devices 300-1 to 300-N has conversion boards (conv. boards) 301-1 to 301-N and coprocessors 302-1 to 302-N. The conversion board 301 is also called an accelerator board, and is a board on which hardware to be additionally used to increase the processing capability of the information processing system 1 is mounted.

コプロセッサ３０２は、ＡＩ（ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ）推論処理や画像処理等の並列演算処理に適したプロセッサであり、ＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）や専用チップなどのアクセラレータ等を採用できる。また、コプロセッサ３０２は、ＣＰＵ及びＧＰＵの組み合わせであってもよい。 The coprocessor 302 is a processor suitable for parallel arithmetic processing such as AI (Artificial Intelligence) inference processing and image processing, and may be an accelerator such as a GPU (Graphics Processing Unit) or a dedicated chip. The coprocessor 302 may also be a combination of CPU and GPU.

ＡＩ推論処理は、人工知能（ＡＩ）を用いた推論処理であり、多層構造のニューラルネットワーク（階層ニューラルネットワーク）を用いた推論モデルによる推論処理を含む。各プロセッサ３０２は、階層ニューラルネットワークを用いた推論モデルに対して機械学習を行って学習済み推論モデルを生成できるとともに、学習済み推論モデルの利活用を行うことができる。 The AI inference process is an inference process using artificial intelligence (AI) and includes an inference process using an inference model using a neural network having a multilayer structure (hierarchical neural network). Each processor 302 can perform a machine learning on an inference model using a hierarchical neural network to generate a learned inference model and can utilize the learned inference model.

学習済み推論モデルとしてのモデルファイル３０４は、様々な推論処理に対して、同じ学習済みモデルを用いると、推論処理が効率的に行われない可能性がある。 If the same learned model is used for various inference processes, the model file 304 as the learned inference model may not perform the inference process efficiently.

例えば、機械学習による推論処理では、大きなデータから複数の着目するデータを抽出し、その複数の着目したデータをさらに深掘りするような複数の推論処理を組み合わせたシステムが必要とされ得る。一例として、以下の（ａ）及び（ｂ）の２つのやり方で実現することが考えられる。
（ａ）高性能の推論処理装置で複数の処理を実行する。
（ｂ）複数の推論処理装置で、それぞれの装置を行う推論処理を固定して実行する。 For example, in inference processing by machine learning, a system that combines a plurality of inference processing that extracts a plurality of focused data from large data and further digs the plurality of focused data may be required. As an example, it can be considered to realize it by the following two methods (a) and (b).
(A) A high-performance inference processing device executes a plurality of processes.
(B) In a plurality of inference processing devices, the inference process performed by each device is fixed and executed.

例えば、カメラ画像から人物を検出し、男女別の人数をカウントするシステムの場合、以下の（１）、（２）のような処理になる。
（１）カメラ画像から人物抽出推論処理を実行する。
（２）（１）の結果、抽出された人物ごとに以下の（ア）及び（イ）の推論処理を実行する。
（ア）男女判定推論処理を実行する。
（イ）（ア）の判定結果に応じて、男性、女性の人数をカウントアップする。 For example, in the case of a system in which a person is detected from a camera image and the number of persons by gender is counted, the following processes (1) and (2) are performed.
(1) A person extraction inference process is executed from a camera image.
(2) As a result of (1), the following inference processes (A) and (A) are executed for each extracted person.
(A) Execute the gender inference process.
(A) Count up the number of men and women according to the judgment result of (a).

この処理を、前述の（ａ）、（ｂ）のパターンに当てはめると、以下の（ａ’）及び（ｂ’）のようになる。
（ａ’）高性能の推論処理装置では、人物抽出推論処理、男女判定推論処理をすべて実行可能な状態に展開して、（１）、（２）の処理を実行する。
（ｂ’）複数の推論処理装置で実行する場合は、人物抽出推論処理を１台、抽出した複数の人物の男女判定推論処理を複数台で実行する。 When this processing is applied to the above-mentioned patterns (a) and (b), the following (a′) and (b′) are obtained.
(A') In the high-performance inference processing apparatus, all the person extraction inference processing and the male/female determination inference processing are developed into an executable state, and the processing of (1) and (2) is executed.
(B') When it is executed by a plurality of inference processing devices, one person extraction inference process is executed, and a plurality of extracted male and female judgment inference processes are executed by a plurality of people.

高性能な推論処理装置で実行する場合と複数の推論処理装置で実行する場合とのいずれも、実行する推論処理が固定化されてしまい、推論処理装置の性能を効率的に活用することができない可能性がある。 Inference processing to be executed is fixed regardless of whether it is executed by a high-performance inference processing device or a plurality of inference processing devices, and the performance of the inference processing device cannot be utilized efficiently. there is a possibility.

その点を考慮し、推論処理の内容をある程度分類・類型化し、各推論処理装置３００−１〜３００−Ｎにおいて、互いに異なる推論処理に対応した複数のモデルファイル３０４を用意し、実行すべき推論処理を推論要求識別子で指定できるようにする。 In consideration of this point, the contents of the inference processing are classified and categorized to some extent, and in each of the inference processing devices 300-1 to 300-N, a plurality of model files 304 corresponding to different inference processing are prepared and the inference to be executed Allow the process to be specified by the inference request identifier.

すなわち、Ｍを任意の２以上の整数とするとき、各推論処理装置３００−１〜３００−Ｎは、図３に示すように、ホストＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）３０７、ドライバ３０６、ミドルウェア３０５、Ｍ個のモデルファイル３０４−１〜３０４−Ｍ、及び推論アプリケーション３０３を有する。図３は、推論処理装置３００の機能構成を示す図である。 That is, when M is an arbitrary integer of 2 or more, each inference processing device 300-1 to 300-N has a host OS (Operating System) 307, a driver 306, middleware 305, and M units, as shown in FIG. Model files 304-1 to 304-M and an inference application 303. FIG. 3 is a diagram showing a functional configuration of the inference processing device 300.

Ｍ個のモデルファイル３０４−１〜３０４−Ｍは、互いに異なる推論処理に対応している。ミドルウェア３０５は、Ｍ個のモデルファイル３０４−１〜３０４−Ｍをそれぞれ実行可能である。推論アプリケーション３０３は、推論処理識別子と入力データとパラメータとを含む推論要求を外部から受信し、Ｍ個のモデルファイル３０４−１〜３０４−Ｍのうち推論処理識別子に対応するモデルファイルを特定する。ミドルウェア３０５は、推論アプリケーション３０３で特定されたモデルファイル３０４を読み込み、推論処理を実行する。 The M model files 304-1 to 304-M correspond to different inference processes. The middleware 305 can execute each of the M model files 304-1 to 304-M. The inference application 303 receives an inference request including an inference processing identifier, input data, and a parameter from the outside, and specifies a model file corresponding to the inference processing identifier among the M model files 304-1 to 304-M. The middleware 305 reads the model file 304 specified by the inference application 303 and executes inference processing.

例えば、初期状態において、階層ニューラルネットワークを用いた推論モデルを含むモデルファイル３０４−１〜３０４−Ｍとモデルファイルを読み込み、推論処理を実行すべきミドルウェア３０５とは、図１に示す情報処理装置１００におけるＳＳＤ１０７及び／又はＨＤＤ１０８に格納されている。 For example, in the initial state, the model files 304-1 to 304-M including the inference model using the hierarchical neural network and the middleware 305 that should execute the inference process are the information processing device 100 shown in FIG. Stored in the SSD 107 and/or the HDD 108.

階層ニューラルネットワークは、階層構造を持ち、入力層と出力層との間に複数の中間層を有し得る。複数の中間層は、例えば、畳み込み層、活性化関数層、プーリング層、全結合層、及びソフトマックス層を含む。畳み込み層では、入力層により入力したニューロンデータの畳み込み演算（畳み込み処理）を行い、入力ニューロンデータの特徴を抽出する。活性化関数層では、畳み込み層で抽出された特徴を強調する。プーリング層では、入力したニューロンデータの間引きを行う。全結合層では、抽出された特徴を結合して特徴を示す変数を生成する。ソフトマックス層は、全結合層で生成された変数を確率に変換する。ソフトマックス層による演算結果のニューロンデータは、出力層に出力され、出力層で所定の処理（例えば、画像の識別）が行われる。各層の数及び位置は、要求されるアーキテクチャに応じて随時変更され得る。すなわち、ニューラルネットワークの階層構造や各層の構成は、識別する対象などに応じて、設計者が予め定めることができる。 The hierarchical neural network has a hierarchical structure and may have a plurality of intermediate layers between an input layer and an output layer. The plurality of intermediate layers include, for example, a convolutional layer, an activation function layer, a pooling layer, a fully connected layer, and a softmax layer. In the convolutional layer, the convolutional operation (convolution processing) is performed on the neuron data input by the input layer, and the features of the input neuron data are extracted. In the activation function layer, the features extracted in the convolutional layer are emphasized. In the pooling layer, the input neuron data is thinned out. In the fully connected layer, the extracted features are combined to generate a variable indicating the feature. The softmax layer transforms the variables generated in the fully connected layer into probabilities. The neuron data obtained as a result of the calculation by the softmax layer is output to the output layer, and the output layer performs a predetermined process (for example, image identification). The number and location of each layer may change at any time depending on the required architecture. That is, the hierarchical structure of the neural network and the configuration of each layer can be predetermined by the designer according to the object to be identified.

すなわち、推論モデルとしてのモデルファイル３０４−１〜３０４−Ｍは、推論モデル（階層ニューラルネットワーク）に対する定義情報及び重み情報を含んでいる。定義情報は、ニューラルネットワークに関する情報を記憶したデータである。例えば、定義情報には、ニューラルネットワークの階層構造や各階層のユニットの構成、ユニットの接続関係などのニューラルネットワークの構成を示す情報が記憶される。画像の認識を行う場合、定義情報には、例えば、設計者等によって定められた畳み込みニューラルネットワークの構成を示す情報が記憶される。重み情報は、ニューラルネットワークの各層の演算で用いられる重み値などの重みの値を記憶したデータである。重み情報に記憶された重みの値は、初期状態では、所定の初期値とされ、学習に応じて更新される。 That is, the model files 304-1 to 304-M as the inference model include definition information and weight information for the inference model (hierarchical neural network). The definition information is data that stores information about the neural network. For example, the definition information stores information indicating the configuration of the neural network such as the hierarchical structure of the neural network, the configuration of units in each layer, and the connection relationship of the units. When recognizing an image, the definition information stores, for example, information indicating the configuration of a convolutional neural network determined by a designer or the like. The weight information is data in which weight values such as weight values used in the calculation of each layer of the neural network are stored. The value of the weight stored in the weight information is set to a predetermined initial value in the initial state, and is updated according to learning.

情報処理システム１の起動時等において、モデルファイル３０４−１〜３０４−Ｍ及びミドルウェア３０５は、メインプロセッサ１０２によりＳＳＤ１０７及び／又はＨＤＤ１０８から読み出され、ブリッジコントローラ２０２経由で所定の推論処理装置３００（所定のコプロセッサ３０２）へロードされる。所定の推論処理装置３００（所定のコプロセッサ３０２）は、ミドルウェア３０５により各モデルファイル３０４−１〜３０４−Ｍの機械学習を行う。ミドルウェア３０５は、互いに異なる推論処理に対応する各モデルファイル３０４−１〜３０４−Ｍに対して、その対応する推論処理に適した機械学習をそれぞれ行う。 When the information processing system 1 is started, the model files 304-1 to 304-M and the middleware 305 are read from the SSD 107 and/or the HDD 108 by the main processor 102, and a predetermined inference processing device 300 (via the bridge controller 202). It is loaded into a given coprocessor 302). The predetermined inference processing device 300 (predetermined coprocessor 302) performs machine learning of the model files 304-1 to 304-M by the middleware 305. The middleware 305 performs machine learning suitable for the corresponding inference processing on each of the model files 304-1 to 304-M corresponding to different inference processing.

この機械学習は、多層構造のニューラルネットワークを用いた機械学習とすることができ、ディープラーニング（深層学習）とも呼ばれる。ディープラーニングは、ニューラルネットワークの多階層化が進んでおり、多くの分野で有効性が確認されている。例えば、ディープラーニングは、画像・音声の認識において人間に匹敵するほど高い認識精度を発揮している。 This machine learning can be machine learning using a multilayered neural network, and is also called deep learning. Deep learning is advancing in many fields as the number of layers of neural networks is increasing. For example, deep learning exhibits recognition accuracy as high as humans in image/sound recognition.

ディープラーニングでは、識別対象に関する教師あり学習を行うことにより、ニューラルネットワークに自動的に識別対象の特徴を学習する。ディープラーニングでは、特徴を学習したニューラルネットワークを用いて識別対象を識別する。所定の推論処理装置３００（所定のコプロセッサ３０２）は、複数のモデルファイル３０４−１〜３０４−Ｍに対して、互いに異なる推論処理に適したモデルとなるように、異なる識別対象の特徴を学習させてもよい。 In deep learning, the features of the identification target are automatically learned in the neural network by performing supervised learning on the identification target. In deep learning, a target to be identified is identified using a neural network that has learned features. The predetermined inference processing device 300 (predetermined coprocessor 302) learns different features of the identification target for the plurality of model files 304-1 to 304-M so that the models are suitable for different inference processing. You may let me.

例えば、推論処理として画像における人物検出を例示すると、ディープラーニングでは、人物全体が写った大量の画像を学習用の画像として教師あり学習を行うことにより、画像に写った人物全体の特徴をニューラルネットワークに自動的に学習する。あるいは、推論処理として画像における顔検出を例示すると、ディープラーニングでは、人物の顔が写った大量の画像を学習用の画像として教師あり学習を行うことにより、画像に写った人物の顔の特徴をニューラルネットワークに自動的に学習する。 For example, in the case of person detection in an image as an inference process, in deep learning, a large amount of an image of an entire person is used as a learning image for supervised learning, so that the features of the entire person in the image are analyzed by a neural network. Learn automatically. Alternatively, exemplifying face detection in an image as the inference processing, in deep learning, a large number of images of human faces are used as learning images to perform supervised learning to determine the features of the human faces in the images. Automatically learn neural networks.

教師あり学習で一般的に使用される誤差逆伝播法では、学習用のデータをニューラルネットワークに順伝播させて認識を行い、認識結果と正解とを比較して誤差を求める。そして、誤差逆伝播法では、認識結果と正解との誤差を認識時と逆方向にニューラルネットワークに伝播させ、ニューラルネットワークの各階層の重みを変更して最適解に近づけていく。 In the error back-propagation method generally used in supervised learning, learning data is forward-propagated to a neural network for recognition, and a recognition result is compared with a correct answer to obtain an error. In the error back-propagation method, the error between the recognition result and the correct answer is propagated to the neural network in the opposite direction to that at the time of recognition, and the weight of each layer of the neural network is changed to approach the optimal solution.

脳には、多数のニューロン（神経細胞）が存在する。各ニューロンは、他のニューロンから信号を受け取り、他のニューロンへ信号を受け渡す。脳は、この信号の流れによって、様々な情報処理を行う。ニューラルネットワークは、このような脳の機能の特性を計算機上で実現したモデルである。ニューラルネットワークは、脳のニューロンを模したユニットを階層的に結合している。ユニットは、ノードとも呼ばれる。各ユニットは、他のユニットからデータを受け取り、データに重みを適用して他のユニットへ受け渡す。ニューラルネットワークは、ユニットの重みを学習によって変化させて受け渡すデータを変化させることで様々な識別対象を識別（認識）できる。 There are many neurons (nerve cells) in the brain. Each neuron receives a signal from another neuron and passes a signal to another neuron. The brain performs various information processing according to the flow of this signal. The neural network is a model that realizes such characteristics of brain functions on a computer. A neural network connects units that imitate brain neurons hierarchically. Units are also called nodes. Each unit receives data from another unit, applies a weight to the data, and passes the data to the other unit. The neural network can identify (recognize) various identification targets by changing the weight of the unit by learning and changing the data to be passed.

ディープラーニングでは、このように特徴を学習したニューラルネットワークを用いることで、画像に写った識別対象を識別することなどの推論処理が可能な学習済み推論モデルを生成できる。 In deep learning, a learned inference model capable of inference processing such as identifying an identification target reflected in an image can be generated by using a neural network that has learned features in this way.

学習済み推論モデルが生成されると、所定のコプロセッサ３０２は、学習済み推論モデルとしてのモデルファイル３０４と機械学習に用いたミドルウェア３０５とをブリッジコントローラ２０２及びメインプロセッサ１０２経由でＳＳＤ１０７及び／又はＨＤＤ１０８に格納させる。 When the learned inference model is generated, the predetermined coprocessor 302 uses the model file 304 as the learned inference model and the middleware 305 used for machine learning via the bridge controller 202 and the main processor 102 to drive the SSD 107 and/or the HDD 108. To be stored in.

例えば、学習済み推論モデルを利活用すべきタイミング又はその前において、モデルファイル３０４及びミドルウェア３０５は、メインプロセッサ１０２によりＳＳＤ１０７及び／又はＨＤＤ１０８から読み出され、ブリッジコントローラ２０２経由で各コプロセッサ３０２−１〜３０２−Ｎへロードされ得る。各コプロセッサ３０２は、ミドルウェア３０５により、学習済み推論モデルとしてのモデルファイル３０４を用いることで、所定のＡＩ推論処理を実行可能である（図３参照）。 For example, the model file 304 and the middleware 305 are read from the SSD 107 and/or the HDD 108 by the main processor 102 at or before the time when the learned inference model is to be used, and each coprocessor 302-1 is read via the bridge controller 202. ˜302-N. The middleware 305 allows each coprocessor 302 to execute a predetermined AI inference process by using the model file 304 as a learned inference model (see FIG. 3 ).

ＡＩ推論処理では、同じ処理が繰り返し実行され得る。例えば、人物検出の推論処理では、画像中の人物の検出が繰り返し行われ得る。ＡＩ推論処理のような、同じ処理を繰り返し実行するような処理がコプロセッサ３０２で実行される場合、複数のコプロセッサ３０２−１〜３０２−Ｎのリソース状況からおおよその処理時間を推定することができ、それを並列分散制御に活用すれば、情報処理装置１００側で、より効率的な分散制御が可能となる。すなわち、複数のコプロセッサ３０２−１〜３０２−Ｎの間で処理性能が互いに異なる場合でも、処理内容ごとに複数のコプロセッサ３０２−１〜３０２−Ｎのプロセッサのリソース状況の情報をもとにその処理内容の処理時間を各コプロセッサ３０２−１〜３０２−Ｎについて計算できる。そして、計算された複数のコプロセッサ３０２−１〜３０２−Ｎの処理時間に基づいて、その処理内容（タスク）に対して適切な（例えば、最も速い応答を返す）コプロセッサ３０２を選択すれば、タスクの振り分けを効率化でき、これにより、システム全体としての処理性能を向上できる。 In the AI inference process, the same process can be repeatedly executed. For example, in the person detection inference process, the detection of a person in an image may be repeatedly performed. When a process such as the AI inference process that repeatedly executes the same process is executed by the coprocessor 302, it is possible to estimate an approximate processing time from the resource status of the plurality of coprocessors 302-1 to 302-N. If it is possible to utilize it for parallel distributed control, more efficient distributed control becomes possible on the information processing apparatus 100 side. That is, even when the processing performances of the plurality of coprocessors 302-1 to 302-N are different from each other, based on the resource status information of the processors of the plurality of coprocessors 302-1 to 302-N for each processing content. The processing time of the processing content can be calculated for each coprocessor 302-1 to 302-N. Then, based on the calculated processing times of the plurality of coprocessors 302-1 to 302-N, the coprocessor 302 appropriate for the processing content (task) (for example, returning the fastest response) can be selected. , The tasks can be distributed more efficiently, and the processing performance of the entire system can be improved.

例えば、図２に示すＡＰＩ１２０は、上位アプリケーション１１０−１〜１１０−ｎがＡＩクラスタ管理部１３０とプロセス間通信を行うためのインタフェースとして機能し、例えば、オブジェクト交換フォーマットを共通フォーマットとして規定している。すなわち、ＡＰＩ１２０は、上位アプリケーション１１０−１〜１１０−ｎ間のフォーマットの違いを吸収してＡＩクラスタ管理部１３０との間で共通フォーマットによる情報の送受信を行うことを可能にする。上位アプリケーション１１０からの推論指示に応じて、ＡＰＩ１２０は、共通フォーマットでの推論要求を、推論処理識別子、入力データ、及びパラメータを含む推論要求（図９（ａ）参照）として生成し得る。 For example, the API 120 shown in FIG. 2 functions as an interface for the higher-level applications 110-1 to 110-n to perform interprocess communication with the AI cluster management unit 130, and defines, for example, the object exchange format as a common format. .. In other words, the API 120 absorbs the difference in format between the upper applications 110-1 to 110-n and enables transmission and reception of information in the common format with the AI cluster management unit 130. In response to the inference instruction from the higher-level application 110, the API 120 can generate the inference request in the common format as the inference request including the inference processing identifier, the input data, and the parameter (see FIG. 9A).

アプリケーションからＡＰＩ１２０への推論指示は、例えば、関数ｅｓｔｉｍａｔｅ（ＡＩＮａｍｅ，ＩｎＤａｔａ，Ｐａｒａｍ，Ｆｕｎｃ）として実装され得る。 The inference instruction from the application to the API 120 can be implemented as, for example, a function estimate(AIName, InData, Param, Func).

関数ｅｓｔｉｍａｔｅの引数として指定される「ＡＩＮａｍｅ」は、推論要求における推論処理識別子に対応し、実行させたいＡＩ推論処理（モデルファイル）を識別する情報を文字列（string）型のデータで指定し得る。例えば、「ＡＩＮａｍｅ」は、モデルファイル３０４−１〜３０４−ｋ（図３参照）を識別する情報（例えば、モデルファイルの名前）であってもよい（図９（ａ）、図１２（ａ）参照）。 "AIName" specified as an argument of the function estimate corresponds to the inference processing identifier in the inference request, and information for identifying the AI inference processing (model file) to be executed can be specified by string type data. .. For example, “AIName” may be information (for example, the name of the model file) identifying the model files 304-1 to 304-k (see FIG. 3) (FIG. 9A, FIG. 12A). reference).

関数ｅｓｔｉｍａｔｅの引数として指定される「ＩｎＤａｔａ」は、推論要求における入力データに対応し、ＡＩ推論処理で処理したい入力データを多次元配列（ｎｕｍｐｙ）型のデータで指定し得る。「ＩｎＤａｔａ」は、例えば画像データであり、画素位置（横位置、縦位置）を引数として与えるとその階調値（色深度）を要素として返す多次元配列とされ得る（図９（ａ）、図１２（ａ）参照）。 "InData" specified as an argument of the function estimate corresponds to the input data in the inference request, and the input data to be processed by the AI inference process can be specified by multidimensional array (numpy) type data. “InData” is, for example, image data, and can be a multidimensional array that returns the gradation value (color depth) as an element when the pixel position (horizontal position, vertical position) is given as an argument (FIG. 9A, FIG. 12A).

関数ｅｓｔｉｍａｔｅの引数として指定される「Ｐａｒａｍ」は、推論要求におけるパラメータに対応し、ＡＩ推論処理で絞り込みを行う条件などを文字列（string）型のデータで指定し得る（図９（ａ）、図１２（ａ）参照）。 “Param” specified as an argument of the function estimate corresponds to the parameter in the inference request, and the condition for narrowing down in the AI inference process can be specified by the data of the string type (FIG. 9(a), FIG. 12A).

関数ｅｓｔｉｍａｔｅは、戻り値として、受け付け番号を整数（ｉｎｔ）型のデータで返し得る。 As a return value, the function estimate can return an acceptance number as integer (int) type data.

上位アプリケーション１１０−１〜１１０−ｎは、ユーザからの指示又はシステム側から要求などに応じて、所定の推論指示を生成してＡＰＩ１２０を実行する。ＡＰＩ１２０は、例えば関数ｅｓｔｉｍａｔｅ（ＡＩＮａｍｅ，ＩｎＤａｔａ，Ｐａｒａｍ，Ｆｕｎｃ）として指示された要求指示に応じて、推論処理識別子、入力データ、及びパラメータを含む推論要求を共通フォーマット（図９（ａ）参照）で生成してＡＩクラスタ管理部１３０へ送信する。すなわち、上位アプリケーション１１０からＡＰＩ１２０が実行されるごとに、アプリケーションの推論指示に応じた推論要求が、上位アプリケーション１１０による推論要求として、共通フォーマットでＡＩクラスタ管理部１３０へ送信される。その後、ＡＰＩ１２０は、例えば関数ｅｓｔｉｍａｔｅ（ＡＩＮａｍｅ，ＩｎＤａｔａ，Ｐａｒａｍ，Ｆｕｎｃ）の戻り値として、受け付け番号を上位アプリケーション１１０へ返す。 The upper applications 110-1 to 110-n generate a predetermined inference instruction and execute the API 120 in response to an instruction from the user or a request from the system side. The API 120 uses a common format (see FIG. 9A) for an inference request including an inference processing identifier, input data, and a parameter in response to a request instruction specified as a function estimate (AIName, InData, Param, Func), for example. It is generated and transmitted to the AI cluster management unit 130. That is, every time the upper application 110 executes the API 120, an inference request corresponding to the inference instruction of the application is transmitted to the AI cluster management unit 130 in a common format as an inference request by the upper application 110. After that, the API 120 returns the reception number to the higher-level application 110 as a return value of the function estimate (AIName, InData, Param, Func), for example.

図２に示すＡＩクラスタ管理部１３０は、複数の推論処理装置３００−１〜３００−ＮによるＡＩ推論処理を管理する。ＡＩクラスタ管理部１３０は、どのＡＩ推論処理（どのモデルファイル）がどの推論処理装置３００（どのコプロセッサ３０２）で実行されているかを監視するとともに、どのＡＩ推論処理（どのモデルファイル）をどの推論処理装置３００（どのコプロセッサ３０２）で実行すべきであるのかを制御する。ＡＩクラスタ管理部１３０は、複数のコプロセッサ３０２−１〜３０２−ＮにＡＩ推論処理を振り分ける並列分散制御を行い得る。 The AI cluster management unit 130 illustrated in FIG. 2 manages AI inference processing by the plurality of inference processing devices 300-1 to 300-N. The AI cluster management unit 130 monitors which AI inference process (which model file) is being executed by which inference processing device 300 (which coprocessor 302), and which AI inference process (which model file) is inferred. It controls which processing unit 300 (which coprocessor 302) should be executed. The AI cluster management unit 130 can perform parallel distributed control in which the AI inference processing is distributed to the plurality of coprocessors 302-1 to 302-N.

すなわち、ＡＩクラスタ管理部１３０は、上位アプリケーション１１０による推論要求を受けると、推論要求に応じて、複数のコプロセッサ３０２−１〜３０２−Ｎから収集されたリソース情報をコプロセッサ３０２の個数分（図２の場合、７個）の計算式に適用して処理時間を計算する。ＡＩクラスタ管理部１３０は、計算で得られた複数のコプロセッサ３０２−１〜３０２−Ｎの処理時間に基づいて、推論要求に含まれた推論処理識別子で示されるＡＩ推論処理を割り振るコプロセッサ３０２を選択する。ＡＩクラスタ管理部１３０は、ＡＩ推論処理を実行させるべきコプロセッサとして、複数のコプロセッサ３０２−１〜３０２−Ｎのうち、計算で得られた処理時間が最も短いコプロセッサ３０２を選択し得る。 That is, when the AI cluster management unit 130 receives an inference request from the higher-level application 110, the resource information collected from the plurality of coprocessors 302-1 to 302-N corresponding to the number of coprocessors 302 ( In the case of FIG. 2, the processing time is calculated by applying the calculation formula of 7 pieces. The AI cluster management unit 130 allocates the AI inference processing indicated by the inference processing identifier included in the inference request based on the processing times of the plurality of coprocessors 302-1 to 302-N obtained by the calculation. Select. The AI cluster management unit 130 may select, as the coprocessor to execute the AI inference process, the coprocessor 302 that has the shortest processing time obtained by the calculation among the plurality of coprocessors 302-1 to 302-N.

ＡＩクラスタ管理部１３０は、並列分散制御部１３１、送信部１３２、受信部１３３、プロセッサ監視部１３４、判別パラメータ情報１３５、測定部１３６、及び更新部１３７を有する。プロセッサ監視部１３４は、リソース情報収集部１３４１を有する。 The AI cluster management unit 130 includes a parallel distribution control unit 131, a transmission unit 132, a reception unit 133, a processor monitoring unit 134, discrimination parameter information 135, a measurement unit 136, and an updating unit 137. The processor monitoring unit 134 has a resource information collecting unit 1341.

並列分散制御部１３１は、アプリケーションによる推論要求をＡＰＩ１２０から受け、推論要求に応じて、複数のコプロセッサ３０２−１〜３０２−ＮにＡＩ推論処理を振り分ける並列分散制御を行い得る。並列分散制御部１３１は、送信キュー１３１１及び送信先判別部１３１２を有する。 The parallel distributed control unit 131 can receive an inference request by an application from the API 120 and perform parallel distributed control in which the AI inference processing is distributed to the plurality of coprocessors 302-1 to 302-N according to the inference request. The parallel distribution control unit 131 has a transmission queue 1311 and a transmission destination determination unit 1312.

送信キュー１３１１は、ＡＰＩ１２０から供給された推論要求をキューイングする。送信キュー１３１１は、ＦＩＦＯ（ＦｉｒｓｔＩｎＦｉｒｓｔＯｕｔ）構造を有する待ち行列バッファーであり、各推論要求が、キューイングされた順番にデキューされる。 The transmission queue 1311 queues the inference request supplied from the API 120. The transmission queue 1311 is a queue buffer having a FIFO (First In First Out) structure, and each inference request is dequeued in the queued order.

送信先判別部１３１２は、計算部１３１２ａ及び選択部１３１２ｂを有する。計算部１３１２ａは、デキューされた推論要求（共通フォーマットでの推論要求）を送信キュー１３１１から受けると、判別パラメータ情報１３５を参照し、デキューされた推論要求に含まれた推論処理識別子に対応する計算式を特定する。 The transmission destination determination unit 1312 has a calculation unit 1312a and a selection unit 1312b. When the calculation unit 1312a receives the dequeued inference request (inference request in the common format) from the transmission queue 1311, the calculation unit 1312a refers to the discrimination parameter information 135 and performs a calculation corresponding to the inference processing identifier included in the dequeued inference request. Identify the expression.

判別パラメータ情報１３５は、異なるＮ個のコプロセッサと異なるＭ個の処理内容との組み合わせに対応付けられたＮ×Ｍ個の計算式を含む。判別パラメータ情報１３５に含まれた各計算式は、Ｎ個のコプロセッサに対応したＮ個の判別パラメータを含む。Ｎ個の判別パラメータのそれぞれは、リソース情報の値を処理時間に変換する変換係数と、対応するコプロセッサのリソース情報が処理時間（予測値）にどの程度影響を与えるかを示す寄与率とを含み得る。計算部１３１２ａは、Ｎ×Ｍ個の計算式のうち、推論処理識別子に対応したＮ個の計算式を特定し得る。 The discrimination parameter information 135 includes N×M calculation formulas associated with combinations of N different coprocessors and different M processing contents. Each calculation formula included in the discrimination parameter information 135 includes N discrimination parameters corresponding to N coprocessors. Each of the N discriminant parameters includes a conversion coefficient for converting the value of the resource information into the processing time, and a contribution rate indicating how much the resource information of the corresponding coprocessor affects the processing time (predicted value). May be included. The calculation unit 1312a can specify N calculation formulas corresponding to the inference processing identifier among the N×M calculation formulas.

例えば、判別パラメータ情報１３５は、図４に示すようなデータ構造を有する。図４は、判別パラメータ情報１３５のデータ構造を示す図である。判別パラメータ情報１３５は、プロセッサ識別情報欄１３５１、処理識別情報欄１３５２、及び計算式欄１３５３を有する。プロセッサ識別情報欄１３５１には、プロセッサを識別する情報が記録されており、例えば、コプロセッサＡ、コプロセッサＢ、・・・、コプロセッサＧなどのプロセッサ名が記録され得る。コプロセッサＡ、コプロセッサＢ、・・・、コプロセッサＧは、例えば、コプロセッサ３０２−１、コプロセッサ３０２−２、・・・、コプロセッサ３０２−Ｎの名前である。処理識別情報欄１３５２には、処理内容を識別する情報（すなわち、推論処理識別子に対応した情報）が記録されており、例えば、処理Ｊ、処理Ｋ、・・・などの処理名が記録され得る。処理Ｊ、処理Ｋ、・・・は、例えば、モデルファイル３０４−１，３０４−２，・・・の名前である。処理Ｊは、例えば、画像中の人物を検出する推論処理に対応していてもよい。処理Ｋは、例えば、画像中の人物の顔領域を検出する推論処理に対応していてもよい。計算式欄１３５３には、処理時間の計算に用いる計算式が記録されている。図４に示す判別パラメータ情報１３５を参照することで、コプロセッサ及び処理内容の各組み合わせに対応した計算式が特定され得る。 For example, the discrimination parameter information 135 has a data structure as shown in FIG. FIG. 4 is a diagram showing a data structure of the discrimination parameter information 135. The discrimination parameter information 135 has a processor identification information column 1351, a process identification information column 1352, and a calculation formula column 1353. Information for identifying a processor is recorded in the processor identification information column 1351, and for example, processor names such as coprocessor A, coprocessor B,..., Coprocessor G can be recorded. Coprocessor A, coprocessor B,..., Coprocessor G are names of coprocessor 302-1, coprocessor 302-2,..., Coprocessor 302-N, for example. In the process identification information column 1352, information identifying the process content (that is, information corresponding to the inference process identifier) is recorded, and for example, process names such as process J, process K,... Can be recorded. .. Process J, process K,... Are names of the model files 304-1, 304-2,. The process J may correspond to, for example, an inference process for detecting a person in an image. The process K may correspond to, for example, an inference process for detecting a face area of a person in an image. The calculation formula column 1353 stores a calculation formula used for calculating the processing time. By referring to the discrimination parameter information 135 shown in FIG. 4, the calculation formula corresponding to each combination of the coprocessor and the processing content can be specified.

例えば、コプロセッサＡ及び処理Ｊ（人物検出処理）の組み合わせに対応した計算式は、次の数式１であると特定され得る。
ｔ_ＡＪ＝ｋ_ＡＪ１ｘ_１＋ｋ_ＡＪ２ｘ_２＋・・・＋ｋ_ＡＪＮｘ_Ｎ・・・数式１ For example, the calculation formula corresponding to the combination of the coprocessor A and the process J (person detection process) can be specified as the following formula 1.
_{_{_{_{t AJ = k AJ1 x 1 +}}}} k AJ2 x 2 + ··· + k AJN x N ··· Equation 1

数式１において、ｔ_ＡＪは、処理ＪをコプロセッサＡで実行した場合に計算される処理時間（予測値）を示す。ｋ_ＡＪ１，ｋ_ＡＪ２，・・・，ｋ_ＡＪＮは、それぞれ、コプロセッサ３０２−１、コプロセッサ３０２−２、・・・、コプロセッサ３０２−Ｎに対応した判別パラメータであり、リソース情報の値を処理時間に変換する変換係数と、コプロセッサ３０２−１、コプロセッサ３０２−２、・・・、コプロセッサ３０２−Ｎのリソース情報が処理時間（予測値）にどの程度影響を与えるかを示す寄与率とを含み得る。ｘ_１，ｘ_２，・・・，ｘ_Ｎは、それぞれ、コプロセッサ３０２−１、コプロセッサ３０２−２、・・・、コプロセッサ３０２−Ｎのリソース情報の値が適用（代入）されるべき変数を示す。 In Expression 1, t _AJ indicates a processing time (predicted value) calculated when the processing J is executed by the coprocessor A. k _AJ1 , k _AJ2 ,..., k _AJN are discrimination parameters corresponding to the coprocessor 302-1, the coprocessor 302-2,..., The coprocessor 302-N, respectively. Contribution indicating how much the conversion coefficient to be converted into the processing time and the resource information of the coprocessor 302-1, the coprocessor 302-2,..., The coprocessor 302-N influence the processing time (predicted value). And rates. _{_{x 1, x 2, ···,}} x N are each coprocessor 302-1, coprocessor 302-2, ..., to the value of the resource information of the coprocessor 302-N is applied (assignment) Indicates a variable.

あるいは、例えば、コプロセッサＡ及び処理Ｋ（顔検出処理）の組み合わせに対応した計算式は、次の数式２であると特定され得る。
ｔ_ＡＫ＝ｋ_ＡＫ１ｘ_１＋ｋ_ＡＫ２ｘ_２＋・・・＋ｋ_ＡＫＮｘ_Ｎ・・・数式２ Alternatively, for example, the calculation formula corresponding to the combination of the coprocessor A and the process K (face detection process) can be specified as the following formula 2.
t _AK =k _AK1 x ₁ +k _AK2 x ₂ +...+k _AKN x _N ...Equation 2

数式２において、ｔ_ＡＫは、処理ＫをコプロセッサＡで実行した場合に計算される処理時間（予測値）を示す。ｋ_ＡＫ１，ｋ_ＡＫ２，・・・，ｋ_ＡＫＮは、それぞれ、コプロセッサ３０２−１、コプロセッサ３０２−２、・・・、コプロセッサ３０２−Ｎに対応した判別パラメータであり、リソース情報の値を処理時間に変換する変換係数と、コプロセッサ３０２−１、コプロセッサ３０２−２、・・・、コプロセッサ３０２−Ｎのリソース情報が処理時間（予測値）にどの程度影響を与えるかを示す寄与率とを含み得る。ｘ_１，ｘ_２，・・・，ｘ_Ｎは、それぞれ、コプロセッサ３０２−１、コプロセッサ３０２−２、・・・、コプロセッサ３０２−Ｎのリソース情報の値が適用（代入）されるべき変数を示す。 In Expression 2, t _AK indicates a processing time (predicted value) calculated when the processing K is executed by the coprocessor A. k _AK1 , k _AK2 ,..., K _AKN are determination parameters corresponding to the coprocessor 302-1, the coprocessor 302-2,..., The coprocessor 302-N, respectively. Contribution indicating how much the conversion coefficient to be converted into the processing time and the resource information of the coprocessor 302-1, the coprocessor 302-2,..., The coprocessor 302-N influence the processing time (predicted value). And rates. _{_{x 1, x 2, ···,}} x N are each coprocessor 302-1, coprocessor 302-2, ..., to the value of the resource information of the coprocessor 302-N is applied (assignment) Indicates a variable.

あるいは、例えば、コプロセッサＢ（コプロセッサ３０２−２）及び処理Ｊ（人物検出処理）の組み合わせに対応した計算式は、次の数式３であると特定され得る。
ｔ_ＢＪ＝ｋ_ＢＪ１ｘ_１＋ｋ_ＢＪ２ｘ_２＋・・・＋ｋ_ＢＪＮｘ_Ｎ・・・数式３ Alternatively, for example, the calculation formula corresponding to the combination of the coprocessor B (coprocessor 302-2) and the process J (person detection process) can be specified as the following formula 3.
_{_{_{_{t BJ = k BJ1 x 1 +}}}} k BJ2 x 2 + ··· + k BJN x N ··· Equation 3

数式３において、ｔ_ＢＪは、処理ＪをコプロセッサＢで実行した場合に計算される処理時間（予測値）を示す。ｋ_ＢＪ１，ｋ_ＢＪ２，・・・，ｋ_ＢＪＮは、それぞれ、コプロセッサ３０２−１、コプロセッサ３０２−２、・・・、コプロセッサ３０２−Ｎに対応した判別パラメータであり、リソース情報の値を処理時間に変換する変換係数と、コプロセッサ３０２−１、コプロセッサ３０２−２、・・・、コプロセッサ３０２−Ｎのリソース情報が処理時間（予測値）にどの程度影響を与えるかを示す寄与率とを含み得る。ｘ_１，ｘ_２，・・・，ｘ_Ｎは、それぞれ、コプロセッサ３０２−１、コプロセッサ３０２−２、・・・、コプロセッサ３０２−Ｎのリソース情報の値が適用（代入）されるべき変数を示す。 In Expression 3, t _BJ indicates the processing time (predicted value) calculated when the processing J is executed by the coprocessor B. k _BJ1 , k _BJ2 ,..., k _BJN are determination parameters corresponding to the coprocessor 302-1, the coprocessor 302-2,..., The coprocessor 302-N, respectively. Contribution indicating how much the conversion coefficient to be converted into the processing time and the resource information of the coprocessor 302-1, the coprocessor 302-2,..., The coprocessor 302-N influence the processing time (predicted value). And rates. _{_{x 1, x 2, ···,}} x N are each coprocessor 302-1, coprocessor 302-2, ..., to the value of the resource information of the coprocessor 302-N is applied (assignment) Indicates a variable.

図２に示す計算部１３１２ａは、推論処理識別子に対応する計算式を特定すると、各コプロセッサ３０２−１〜３０２−Ｎのリソース情報が供給されるまで待機する。 When the calculation unit 1312a illustrated in FIG. 2 specifies the calculation formula corresponding to the inference processing identifier, the calculation unit 1312a waits until the resource information of each coprocessor 302-1 to 302-N is supplied.

各コプロセッサ３０２−１〜３０２−Ｎは、所定の周期毎に（例えば、数分〜数時間毎に）、及び／又は、プロセッサ監視部１３４から並列分散制御部１３１及び送信部１３２経由で受けた要求に応じて、自身のリソース情報をＡＩクラスタ管理部１３０へ送信する。リソース情報は、例えば、プロセッサの使用率［％］を含む。 Each of the coprocessors 302-1 to 302-N receives it at a predetermined cycle (for example, every few minutes to several hours) and/or from the processor monitoring unit 134 via the parallel distributed control unit 131 and the transmission unit 132. In response to the request, the own resource information is transmitted to the AI cluster management unit 130. The resource information includes, for example, the processor usage rate [%].

受信部１３３は、処理状況に関するリソース情報を各コプロセッサ３０２−１〜３０２−Ｎから受信すると、各コプロセッサ３０２−１〜３０２−Ｎのリソース情報をリソース情報収集部１３４１へ供給する。リソース情報収集部１３４１は、各コプロセッサ３０２−１〜３０２−Ｎのリソース情報を受けると、各コプロセッサ３０２−１〜３０２−Ｎのリソース情報を計算部１３１２ａへ供給する。 Upon receiving the resource information regarding the processing status from each coprocessor 302-1 to 302-N, the receiving unit 133 supplies the resource information of each coprocessor 302-1 to 302-N to the resource information collecting unit 1341. Upon receiving the resource information of each coprocessor 302-1 to 302-N, the resource information collection unit 1341 supplies the resource information of each coprocessor 302-1 to 302-N to the calculation unit 1312a.

これに応じて、計算部１３１２ａは、各コプロセッサ３０２−１〜３０２−Ｎのリソース情報の値を計算式に適用して、各コプロセッサ３０２−１〜３０２−Ｎの処理時間を計算する。 In response to this, the calculation unit 1312a applies the value of the resource information of each coprocessor 302-1 to 302-N to the calculation formula to calculate the processing time of each coprocessor 302-1 to 302-N.

コプロセッサの個数をＮ個とし、処理内容の個数をＭ個とするとき、計算部１３１２ａは、Ｎ個のプロセッサのリソース情報の値をＮ個の計算式のそれぞれに適用（代入）して、計算結果としてＮ個のプロセッサの処理時間（予測値）を得る。 When the number of coprocessors is N and the number of processing contents is M, the calculation unit 1312a applies (substitutes) the resource information values of the N processors to each of the N calculation formulas. The processing time (predicted value) of N processors is obtained as the calculation result.

計算部１３１２ａは、計算で得られた各コプロセッサ３０２−１〜３０２−Ｎの処理時間を選択部１３１２ｂへ供給する。 The calculation unit 1312a supplies the processing time of each of the coprocessors 302-1 to 302-N obtained by the calculation to the selection unit 1312b.

選択部１３１２ｂは、各コプロセッサ３０２−１〜３０２−Ｎの処理時間に基づいて、推論処理識別子で示される推論処理を実行すべきコプロセッサとして、複数のコプロセッサ３０２−１〜３０２−Ｎから１つのコプロセッサ３０２を選択する。選択部１３１２ｂは、複数のコプロセッサ３０２−１〜３０２−Ｎのうち、最も短い処理時間（予測値）に対応する１つのコプロセッサ３０２を選択してもよい。選択部１３１２ｂは、選択された１つのコプロセッサ３０２を送信先として指定する送信先情報と、共通フォーマットでの推論要求とを送信部１３２へ供給する。 The selection unit 1312b selects a plurality of coprocessors 302-1 to 302-N as a coprocessor to execute the inference processing indicated by the inference processing identifier based on the processing time of each of the coprocessors 302-1 to 302-N. Select one coprocessor 302. The selection unit 1312b may select one coprocessor 302 corresponding to the shortest processing time (predicted value) from the plurality of coprocessors 302-1 to 302-N. The selection unit 1312b supplies the transmission unit 132 with the transmission destination information designating one selected coprocessor 302 as the transmission destination and the inference request in the common format.

送信部１３２は、共通フォーマットでの推論要求（ＡＩ推論処理の推論要求）を、送信先情報で指定されたコプロセッサ３０２（推論処理装置３００）へ送信する。これにより、複数のコプロセッサ３０２−１〜３０２−ＮにＡＩ推論処理が振り分ける並列分散制御を効率的に行われ得る。 The transmission unit 132 transmits an inference request (inference request for AI inference processing) in the common format to the coprocessor 302 (inference processing device 300) designated by the destination information. Thereby, the parallel distributed control in which the AI inference processing is distributed to the plurality of coprocessors 302-1 to 302-N can be efficiently performed.

ここで、複数のコプロセッサ３０２−１〜３０２−Ｎの処理性能は、そのファームウェアのバージョンアップ及び／又はハードウェアの置き換え等により変化し得る。その点を考慮し、判別パラメータ情報１３５は、所定の更新タイミングで更新され得る。 Here, the processing performance of the plurality of coprocessors 302-1 to 302-N may change due to version upgrade of the firmware and/or replacement of hardware. Considering that point, the determination parameter information 135 can be updated at a predetermined update timing.

例えば、選択されたコプロセッサ３０２がＡＩ推論処理を実行した際に判別パラメータ情報１３５が更新される場合、測定部１３６は、そのコプロセッサ３０２に対して推論要求が送信されてからそのコプロセッサ３０２の処理完了通知が受信されるまでの時間を、リソース情報収集部１３４１で収集された各コプロセッサ３０２−１〜３０２−Ｎのリソース情報の値に対する処理時間として測定し、その測定された処理時間を蓄積できる。更新部１３７は、所定の更新タイミング（例えば、数週間ごとの更新タイミング）になると、蓄積された各コプロセッサの処理時間の測定結果を測定部１３６から取得し、蓄積された各コプロセッサの処理時間を用いて重回帰分析を行い、各コプロセッサの計算式を新たに求める。このとき、各コプロセッサの計算式における判別パラメータが更新される。更新部１３７は、判別パラメータ情報１３５にアクセスし、各コプロセッサの計算式を新たに求められた計算式に上書きで置き換える。これにより、更新部１３７は、判別パラメータ情報１３５を更新する。 For example, when the discriminant parameter information 135 is updated when the selected coprocessor 302 executes the AI inference process, the measurement unit 136 causes the coprocessor 302 to transmit the inference request and then the coprocessor 302. Until the processing completion notification is received as the processing time for the value of the resource information of each coprocessor 302-1 to 302-N collected by the resource information collection unit 1341 and the measured processing time Can be accumulated. The update unit 137 acquires the measurement result of the accumulated processing time of each coprocessor from the measurement unit 136 at a predetermined update timing (for example, the update timing of every several weeks), and executes the accumulated processing of each coprocessor. Multiple regression analysis is performed using time, and the calculation formula of each coprocessor is newly obtained. At this time, the discrimination parameter in the calculation formula of each coprocessor is updated. The updating unit 137 accesses the discrimination parameter information 135 and replaces the calculation formula of each coprocessor with the newly calculated calculation formula by overwriting. As a result, the updating unit 137 updates the discrimination parameter information 135.

例えば、数式１で求められた処理時間（予測値）に基づいてコプロセッサＡが選択され処理Ｊ（人物検出処理）を実行する場合を考える。この実行により、測定部１３６は、図５（ａ）の１行目に示された、リソース情報ｘ_１＝９０［％］、ｘ_２＝４５［％］、・・・、ｘ_Ｎ＝２０［％］に対する処理時間の測定値ｔ_ＡＪｍ＝０．４［ｓ］を測定する。コプロセッサＡ及び処理Ｊの組み合わせに対応する数式１の計算式が、図５（ａ）の２行目以降の過去の測定結果に対する重回帰分析で得られた式であるとすると、新たに得られた１行目の測定結果を加えて重回帰分析することで計算式が変わる可能性がある。そのため、測定部１３６は、図５（ａ）の２行目以降の過去の測定結果に１行目の測定結果を加えた測定結果を更新部１３７へ供給する。更新部１３７は、処理時間の測定値ｔ_ＡＪｍを目的変数とし、リソース情報ｘ_１〜ｘ_Ｎを説明変数として、重回帰分析を行い、次の数式４を新たに求める。
ｔ_ＡＪ＝ｋ_ＡＪ１’ｘ_１＋ｋ_ＡＪ２’ｘ_２＋・・・＋ｋ_ＡＪＮ’ｘ_Ｎ・・・数式４ For example, consider a case in which the coprocessor A is selected based on the processing time (predicted value) obtained by Expression 1 and the processing J (person detection processing) is executed. By this execution, the measurement unit 136 causes the resource information x ₁ =90[%], x ₂ =45[%],..., X _N =20[ shown in the first line of FIG. %] of the processing time t _AJm =0.4 [s] is measured. If the calculation formula of Formula 1 corresponding to the combination of the coprocessor A and the process J is a formula obtained by multiple regression analysis for the past measurement results after the second line in FIG. There is a possibility that the calculation formula may be changed by adding the measurement result in the first line and performing the multiple regression analysis. Therefore, the measurement unit 136 supplies the measurement result obtained by adding the measurement result of the first line to the past measurement result of the second and subsequent lines in FIG. The updating unit 137 performs the multiple regression analysis using the measured value t _AJm of the processing time as the objective variable and the resource information x _{1 to} x _N as the explanatory variables, and newly obtains Formula 4 below.
_{_{_{t AJ = k AJ1 'x 1}}} + k AJ2' x 2 + ··· + k AJN 'x N ··· equation (4)

数式４において、ｋ_ＡＪ１’，ｋ_ＡＪ２’，・・・，ｋ_ＡＪＮ’は、それぞれ、コプロセッサ３０２−１、コプロセッサ３０２−２、・・・、コプロセッサ３０２−Ｎに対応した更新後の判別パラメータである。数式１及び数式４を比較すると、計算式において、コプロセッサ３０２−１、コプロセッサ３０２−２、・・・、コプロセッサ３０２−Ｎに対応した判別パラメータが更新されており、他の部分は同様であることが分かる。更新部１３７は、新たに求められた数式４の計算式を保持し得る。 In Equation _{_{4, k AJ1 ', k AJ2}} ', ···, k AJN ' are each coprocessor 302-1, coprocessor 302-2, ..., the updated corresponding to the coprocessor 302-N This is a discrimination parameter. Comparing Expression 1 and Expression 4, the discrimination parameters corresponding to the coprocessor 302-1, the coprocessor 302-2,..., Coprocessor 302-N in the calculation formula are updated, and the other parts are the same. It turns out that The updating unit 137 can hold the newly calculated calculation formula of Expression 4.

更新部１３７は、所定の更新タイミングになると、判別パラメータ情報１３５にアクセスし、図５（ｂ）に示すように、コプロセッサＡ及び処理Ｊの組み合わせに対応する計算式を新たに求められた数式４の計算式に上書きで置き換える。これにより、更新部１３７は、判別パラメータ情報１３５を更新する。なお、図５は、判別パラメータ情報１３５の更新処理（コプロセッサＡ、処理Ｊの場合）を示す図である。 At a predetermined update timing, the update unit 137 accesses the discrimination parameter information 135, and as shown in FIG. 5B, a newly calculated formula corresponding to the combination of the coprocessor A and the process J is obtained. Replace the calculation formula of 4 by overwriting. As a result, the updating unit 137 updates the discrimination parameter information 135. Note that FIG. 5 is a diagram showing an update process (in the case of coprocessor A and process J) of the discrimination parameter information 135.

あるいは、例えば、数式２で求められた処理時間（予測値）に基づいてコプロセッサＡが選択され処理Ｋ（顔検出処理）を実行する場合を考える。この実行により、測定部１３６は、図６（ａ）の１行目に示された、リソース情報ｘ_１＝９０［％］、ｘ_２＝４５［％］、・・・、ｘ_Ｎ＝２０［％］に対する処理時間の測定値ｔ_ＡＫｍ＝０．３［ｓ］を測定する。コプロセッサＡ及び処理Ｋの組み合わせに対応する数式２の計算式が、図６（ａ）の２行目以降の過去のデータに対する重回帰分析で得られた式であるとすると、新たに得られた１行目の測定結果を加えて重回帰分析することで計算式が変わる可能性がある。そのため、測定部１３６は、図６（ａ）の２行目以降の過去のデータに１行目のデータを加えた測定結果を更新部１３７へ供給する。更新部１３７は、処理時間の測定値ｔ_ＡＪｍを目的変数とし、リソース情報ｘ_１〜ｘ_Ｎを説明変数として、重回帰分析を行い、次の数式５を新たに求める。
ｔ_ＡＫ＝ｋ_ＡＫ１’ｘ_１＋ｋ_ＡＫ２’ｘ_２＋・・・＋ｋ_ＡＫＮ’ｘ_Ｎ・・・数式５ Alternatively, for example, consider a case where the coprocessor A is selected based on the processing time (predicted value) obtained by Expression 2 and the processing K (face detection processing) is executed. By this execution, the measurement unit 136 causes the resource information x ₁ =90[%], x ₂ =45[%],..., X _N =20[ shown in the first line of FIG. %] of the processing time t _AKm =0.3 [s]. If the calculation formula of the formula 2 corresponding to the combination of the coprocessor A and the process K is the formula obtained by the multiple regression analysis on the past data on and after the second line in FIG. 6A, it is newly obtained. The calculation formula may be changed by adding the measurement results of the first line and performing multiple regression analysis. Therefore, the measurement unit 136 supplies the measurement result obtained by adding the data of the first line to the past data of the second and subsequent lines in FIG. 6A to the updating unit 137. The updating unit 137 performs the multiple regression analysis using the measured value t _AJm of the processing time as the objective variable and the resource information x _{1 to} x _N as the explanatory variables, and newly obtains Equation 5 below.
t _AK =k _AK1'x ₁ +k _AK2'x ₂ +...+k _AKN'x _N ...Equation 5

数式５において、ｋ_ＡＫ１’，ｋ_ＡＫ２’，・・・，ｋ_ＡＫＮ’は、それぞれ、コプロセッサ３０２−１、コプロセッサ３０２−２、・・・、コプロセッサ３０２−Ｎに対応した更新後の判別パラメータである。数式２及び数式５を比較すると、計算式において、コプロセッサ３０２−１、コプロセッサ３０２−２、・・・、コプロセッサ３０２−Ｎに対応した判別パラメータが更新されており、他の部分は同様であることが分かる。更新部１３７は、新たに求められた数式５の計算式を保持し得る。 In Equation 5, k _AK1 ′, k _AK2 ′,..., K _AKN ′ are the updated values corresponding to the coprocessor 302-1, coprocessor 302-2,..., Coprocessor 302-N, respectively. This is a discrimination parameter. Comparing Expression 2 and Expression 5, the discrimination parameters corresponding to the coprocessor 302-1, the coprocessor 302-2,..., The coprocessor 302-N in the calculation formula are updated, and other parts are the same. It turns out that The updating unit 137 can hold the newly obtained calculation formula of Expression 5.

更新部１３７は、所定の更新タイミングになると、判別パラメータ情報１３５にアクセスし、図６（ｂ）に示すように、コプロセッサＡ及び処理Ｋの組み合わせに対応する計算式を新たに求められた数式５の計算式に上書きで置き換える。これにより、更新部１３７は、判別パラメータ情報１３５を更新する。なお、図６は、判別パラメータ情報１３５の更新処理（コプロセッサＡ、処理Ｋの場合）を示す図である。 At a predetermined update timing, the update unit 137 accesses the determination parameter information 135, and as shown in FIG. 6B, a newly calculated calculation formula corresponding to the combination of the coprocessor A and the process K is obtained. Replace the calculation formula of 5 by overwriting. As a result, the updating unit 137 updates the discrimination parameter information 135. 6 is a diagram showing an update process (in the case of coprocessor A and process K) of the discrimination parameter information 135.

あるいは、例えば、数式３で求められた処理時間（予測値）に基づいてコプロセッサＢが選択され処理Ｊ（人物検出処理）を実行する場合を考える。この実行により、測定部１３６は、図７（ａ）の１行目に示された、リソース情報ｘ_１＝９０［％］、ｘ_２＝４５［％］、・・・、ｘ_Ｎ＝２０［％］に対する処理時間の測定値ｔ_ＢＪｍ＝０．３［ｓ］を測定する。コプロセッサＢ及び処理Ｊの組み合わせに対応する数式３の計算式が、図７（ａ）の２行目以降の過去のデータに対する重回帰分析で得られた式であるとすると、新たに得られた１行目の測定結果を加えて重回帰分析することで計算式が変わる可能性がある。そのため、測定部１３６は、図７（ａ）の２行目以降の過去のデータに１行目のデータを加えた測定結果を更新部１３７へ供給する。更新部１３７は、処理時間の測定値ｔ_ＢＪｍを目的変数とし、リソース情報ｘ_１〜ｘ_Ｎを説明変数として、重回帰分析を行い、次の数式６を新たに求める。
ｔ_ＢＪ＝ｋ_ＢＪ１’ｘ_１＋ｋ_ＢＪ２’ｘ_２＋・・・＋ｋ_ＢＪＮ’ｘ_Ｎ・・・数式６ Alternatively, for example, consider a case where the coprocessor B is selected based on the processing time (predicted value) obtained by Expression 3 and the processing J (person detection processing) is executed. By this execution, the measurement unit 136 causes the resource information x ₁ =90[%], x ₂ =45[%],..., X _N =20[ shown in the first line of FIG. 7A. %] of the processing time t _BJm =0.3 [s]. If the calculation formula of the formula 3 corresponding to the combination of the coprocessor B and the process J is the formula obtained by the multiple regression analysis on the past data on and after the second line in FIG. 7A, it is newly obtained. The calculation formula may be changed by adding the measurement results of the first line and performing multiple regression analysis. Therefore, the measurement unit 136 supplies the measurement result obtained by adding the data of the first line to the past data of the second and subsequent lines in FIG. 7A to the updating unit 137. The updating unit 137 performs multiple regression analysis using the measurement value t _BJm of the processing time as an objective variable and the resource information x _{1 to} x _N as explanatory variables, and newly obtains the following Equation 6.
t _BJ =k _BJ1'x ₁ +k _BJ2'x ₂ +...+k _BJN'x _N ...Equation 6

数式６において、ｋ_ＢＪ１’，ｋ_ＢＪ２’，・・・，ｋ_ＢＪＮ’は、それぞれ、コプロセッサ３０２−１、コプロセッサ３０２−２、・・・、コプロセッサ３０２−Ｎに対応した更新後の判別パラメータである。数式３及び数式６を比較すると、計算式において、コプロセッサ３０２−１、コプロセッサ３０２−２、・・・、コプロセッサ３０２−Ｎに対応した判別パラメータが更新されており、他の部分は同様であることが分かる。更新部１３７は、新たに求められた数式６の計算式を保持し得る。 In Expression 6, k _BJ1 ′, k _BJ2 ′,..., K _BJN ′ are the _{updated values} corresponding to the coprocessor 302-1, the coprocessor 302-2,..., The coprocessor 302-N, respectively. This is a discrimination parameter. Comparing Expression 3 and Expression 6, the discrimination parameters corresponding to the coprocessor 302-1, the coprocessor 302-2,..., Coprocessor 302-N in the calculation expression are updated, and the other portions are the same. It turns out that The updating unit 137 may hold the newly calculated calculation formula of Expression 6.

更新部１３７は、所定の更新タイミングになると、判別パラメータ情報１３５にアクセスし、図７（ｂ）に示すように、コプロセッサＢ及び処理Ｊの組み合わせに対応する計算式を新たに求められた数式６の計算式に上書きで置き換える。これにより、更新部１３７は、判別パラメータ情報１３５を更新する。なお、図７は、判別パラメータ情報１３５の更新処理（コプロセッサＢ、処理Ｊの場合）を示す図である。 At a predetermined update timing, the update unit 137 accesses the determination parameter information 135, and as shown in FIG. 7B, a newly calculated formula corresponding to the combination of the coprocessor B and the process J is obtained. Replace the calculation formula of 6 by overwriting. As a result, the updating unit 137 updates the discrimination parameter information 135. 7. FIG. 7 is a diagram showing an update process (in the case of coprocessor B and process J) of the discrimination parameter information 135.

これにより、並列分散制御部１３１は、複数のコプロセッサ３０２−１〜３０２−Ｎの処理性能の変更に対応して、その並列分散制御を高精度化できる。 As a result, the parallel distributed control unit 131 can improve the accuracy of the parallel distributed control in response to changes in the processing performance of the coprocessors 302-1 to 302-N.

次に、推論処理装置３００のモデルファイル３０４を用いた動作について図８を用いて説明する。図８は、推論処理装置３００のモデルファイル３０４を用いた動作を示す図である。 Next, the operation of the inference processing device 300 using the model file 304 will be described with reference to FIG. FIG. 8 is a diagram showing an operation using the model file 304 of the inference processing device 300.

図８に示す推論処理装置３００において、推論アプリケーション３０３は、仮想環境技術によって、情報処理装置１００のＳＳＤ１０７及び／又はＨＤＤ１０８に格納された複数のミドルウェア及び複数のモデルファイルのうち推論アプリケーション３０３が利用するミドルウェア３０５及びモデルファイル３０４−１〜３０４−Ｍをロードする。推論アプリケーション３０３は、推論処理装置３００の起動時に、ミドルウェア３０５及びモデルファイル３０４−１〜３０４−Ｍの初期化を行う。 In the inference processing apparatus 300 shown in FIG. 8, the inference application 303 is used by the inference application 303 among the plurality of middleware and the plurality of model files stored in the SSD 107 and/or the HDD 108 of the information processing apparatus 100 by the virtual environment technology. The middleware 305 and the model files 304-1 to 304-M are loaded. The inference application 303 initializes the middleware 305 and the model files 304-1 to 304-M when the inference processing device 300 is activated.

推論処理装置３００がＡＩクラスタ管理部１３０により送信先として選択される場合、推論アプリケーション３０３は、推論要求を上位アプリケーション１１０からＡＰＩ１２０及びＡＩクラスタ管理部１３０経由で受信する。推論要求は、推論処理識別子、推論処理の対象となる入力データ（画像、音声、テキストなど）、推論処理の実行条件を設定するパラメータを含む。推論要求は、ＡＰＩ１２０で共通フォーマットに変換されており、推論アプリケーション３０３は、共通フォーマットでの推論要求を受信する。推論アプリケーション３０３は、推論要求に含まれた推論処理識別子に対応するモデルファイル３０４を特定し、推論要求に含まれた共通フォーマットの入力データをその特定されたモデルファイル３０４のフォーマットへ変換する。 When the inference processing device 300 is selected as the transmission destination by the AI cluster management unit 130, the inference application 303 receives the inference request from the upper application 110 via the API 120 and the AI cluster management unit 130. The inference request includes an inference process identifier, input data (image, voice, text, etc.) that is a target of the inference process, and parameters for setting execution conditions of the inference process. The inference request is converted to the common format by the API 120, and the inference application 303 receives the inference request in the common format. The inference application 303 identifies the model file 304 corresponding to the inference processing identifier included in the inference request, and converts the input data in the common format included in the inference request into the format of the identified model file 304.

ここで、Ｍ個のモデルファイル３０４−１〜３０４−Ｍのフォーマットは、互いに異なり得る。すなわち、推論アプリケーション３０３は、モデルファイル３０４−１〜３０４−Ｍ間のフォーマットの違いを吸収して上位アプリケーション１１０−１〜１１０−ｎ側との間で（すなわち、ＡＰＩ１２０との間で）共通フォーマットによる情報の送受信を行うことを可能にする。 Here, the formats of the M model files 304-1 to 304-M may be different from each other. That is, the inference application 303 absorbs the difference in format between the model files 304-1 to 304-M and shares a common format with the higher-level applications 110-1 to 110-n (that is, with the API 120). Enables to send and receive information.

例えば、推論処理識別子がモデルファイル３０４−１に対応している場合、図８に示すように、推論アプリケーション３０３は、推論要求に含まれた推論処理識別子から、実行すべきモデルファイルがモデルファイル３０４−１であると特定しミドルウェア３０５へ通知する。また、推論アプリケーション３０３は、共通フォーマットの入力データをモデルファイル３０４−１のフォーマットの入力データに変換してミドルウェア３０５へ供給する。ミドルウェア３０５は、モデルファイル３０４−１の入力層へフォーマット変換後の入力データを入力し、モデルファイル３０４−１で推論処理を実行させる。推論処理の実行結果としての出力データ（モデルファイル３０４−１のフォーマットの出力データ）がモデルファイル３０４−１からミドルウェア３０５へ供給されると、ミドルウェア３０５は、モデルファイル３０４−１の出力データを推論アプリケーション３０３へ供給する。推論アプリケーション３０３は、モデルファイル３０４−１のフォーマットの出力データを共通フォーマットの出力データに変換する。推論アプリケーション３０３は、フォーマット変換後の出力データをＡＩクラスタ管理部１３０及びＡＰＩ１２０経由で上位アプリケーション１１０へ送信する。 For example, when the inference processing identifier corresponds to the model file 304-1, as shown in FIG. 8, the inference application 303 determines that the model file to be executed is the model file 304 based on the inference processing identifier included in the inference request. -1 is specified and the middleware 305 is notified. Further, the inference application 303 converts the input data in the common format into the input data in the format of the model file 304-1 and supplies it to the middleware 305. The middleware 305 inputs the format-converted input data to the input layer of the model file 304-1 and causes the model file 304-1 to execute inference processing. When output data (output data in the format of the model file 304-1) as the execution result of the inference process is supplied from the model file 304-1 to the middleware 305, the middleware 305 infers the output data of the model file 304-1. Supply to the application 303. The inference application 303 converts the output data in the format of the model file 304-1 into the output data in the common format. The inference application 303 transmits the output data after the format conversion to the upper application 110 via the AI cluster management unit 130 and the API 120.

上位アプリケーション１１０側から推論アプリケーション３０３で受信される推論要求のフォーマットは、ＡＰＩ１２０で共通フォーマットに変換されており、例えば、図９（ａ）に示すようなフォーマットになっている。 The format of the inference request received by the inference application 303 from the higher-level application 110 side is converted into a common format by the API 120, for example, the format shown in FIG. 9A.

図９（ａ）に示す共通フォーマットは、モデルファイルの種類にかかわらず画像の物体判定で共通になっている。推論要求では、推論処理識別子として、ａｉ＿ｎａｍｅ［文字列］が指定され得る。推論処理の実行条件を設定するパラメータとして、画像の幅：ｗｉｄｔｈ［整数］、画像の高さ：ｈｅｉｇｈｔ［整数］、色深度パラメータ：ｃｏｌｏｒ＿ｄｅｐｔｈ［整数］、信頼度しきい値：ｃｏｎｆｉｄｅｎｃｅ＿ｔｈｒｅｓｈｏｌｄ［浮動小数点数］、最大返却数：［整数］が指定され得る。推論処理の対象となるデータ（画像）として、ｉｍａｇｅ＿ｄａｔａ［バイト配列］（「ｗｉｄｔｈ×ｈｅｉｇｈｔ×３×ｃｏｌｏｒ＿ｄｅｐｔｈ」のバイト配列（３＝ＲＧＢの３層））が指定され得る。 The common format shown in FIG. 9A is common for object determination of images regardless of the type of model file. In the inference request, ai_name [character string] may be designated as the inference processing identifier. Image width: width [integer], image height: height [integer], color depth parameter: color_depth [integer], reliability threshold: confidence_threshold [floating point number] ], the maximum number of returns: [integer] can be specified. Image_data [byte array] (byte array of “width×height×3×color_depth” (3=three layers of RGB)) can be designated as data (image) to be inferred.

モデルファイル３０４−１が数字認識モデルである場合、数字認識モデル（モデルファイル３０４−１）は、画像に０−９の数字が入っているかどうかを判定する。入力画像は２８×２８のモノクロ画像のみで、画像全体を判定する。 When the model file 304-1 is a numeral recognition model, the numeral recognition model (model file 304-1) determines whether or not the image contains numerals 0-9. The input image is a 28×28 monochrome image only, and the entire image is determined.

この場合、推論アプリケーション３０３は、図９（ａ）に示す共通フォーマットの入力データを、２８×２８にリサイズ、モノクロ化した入力データに変換し、図９（ｂ）に示すモデルファイル３０４−１のフォーマットの入力データとする。 In this case, the inference application 303 converts the input data in the common format shown in FIG. 9A into input data that has been resized to 28×28 and converted into monochrome, and then the model file 304-1 shown in FIG. Use as input data of format.

図９（ｂ）に示すモデルファイル３０４−１のフォーマットは、文字認識モデル（モデルファイル３０４−１）の入力データのフォーマットであり、推論処理の実行条件を設定するパラメータとして、指定がない（最大個数、しきい値などの指定がない）。推論処理の対象となるデータ（画像）として、２８×２８×１のｆｌｏａｔ型（浮動小数点数型）の３２ｂｉｔの配列が指定され得る。 The format of the model file 304-1 shown in FIG. 9B is the format of the input data of the character recognition model (model file 304-1), and is not specified (maximum as a parameter for setting the execution condition of the inference process). There is no specification of the number or threshold). A 28×28×1 float type (floating point number type) 32-bit array can be designated as the data (image) to be inferred.

例えば、パラメータとして信頼度のしきい値が０．５とされ、共通フォーマットの入力データとして、１００×１００×３×８ｂｉｔのカラーの画像を受けた場合、推論アプリケーション３０３は、２８×２８×３２ｂｉｔのモノクロ画像に変換してモデルファイル３０４−１の入力データとし得る。 For example, when the reliability threshold value is set to 0.5 as a parameter and a color image of 100×100×3×8 bits is received as input data in the common format, the inference application 303 determines that the inference application 303 is 28×28×32 bits. Can be converted into a monochrome image of the input data of the model file 304-1.

なお、図９は、共通フォーマットからモデルファイル３０４−１の入力層のフォーマットへの変換処理を示す図である。 Note that FIG. 9 is a diagram showing a conversion process from the common format to the format of the input layer of the model file 304-1.

また、モデルファイル３０４−１での推論処理の実行結果としての出力データは、図１０（ａ）に示すようなモデルファイル３０４−１のフォーマットの出力データになっている。 The output data as the execution result of the inference process in the model file 304-1 is output data in the format of the model file 304-1 as shown in FIG.

図１０（ａ）に示すモデルファイル３０４−１のフォーマットは、文字認識モデル（モデルファイル３０４−１）の出力データのフォーマットであり、推論処理の実行結果として、１０個のｆｌｏａｔ型（浮動小数点数型）の３２ｂｉｔの配列（それぞれの要素が０−９の数字である確率を示す。例えば、０番目の要素…“０”である確率〜９番目の要素…“９”である確率を示す。）が指定され得る。 The format of the model file 304-1 shown in FIG. 10A is the format of the output data of the character recognition model (model file 304-1). As the execution result of the inference process, 10 float type (floating point numbers) 32 bit array of type (indicates the probability that each element is a number of 0-9. For example, the probability of being the 0th element... "0" to the probability of the 9th element... "9". ) Can be specified.

この場合、推論アプリケーション３０３は、図１０（ａ）に示すモデルファイル３０４−１のフォーマットの出力データに対して、１０個の配列のうち、しきい値以上の要素を選択し、その要素番号を物体の種類、信頼度をその数字の確率として返却するとともに、位置は画像全体を指定し、図１０（ｂ）、図１０（ｃ）に示す共通フォーマットの出力データとする。 In this case, the inference application 303 selects an element that is equal to or larger than the threshold value out of 10 arrays for the output data in the format of the model file 304-1 shown in FIG. The type and reliability of the object are returned as the probability of that number, and the position designates the entire image, which is output data in the common format shown in FIGS. 10B and 10C.

図１０（ｂ）に示す共通フォーマットは、上位アプリケーション１１０側へ出力データのフォーマットであり、モデルファイルの種類にかかわらず画像の物体判定で共通になっている。共通フォーマットの出力データでは、推論処理識別子として、［文字列］が指定され、推論処理の実行結果として、推論処理の成否：ｒｅｓｕｌｔ＿ｃｏｄｅ［整数］（０：正常、０以外：異常）、推論処理の結果数：ｎｕｍｂｅｒ＿ｒｅｓｕｌｔ［整数］が指定される。また、ｎｕｍｂｅｒ＿ｒｅｓｕｌｔ［整数］（又は、ｎｕｍｂｅｒ＿ｒｅｓｕｌｔ個数）では、物体の種類として、ｏｂｊｅｃｔ＿ｃｌａｓｓ［整数］（物体の種類を示すコード、モデルごとに違う）が指定され得る。画像上の物体の位置として、左：ｌｅｆｔ［整数］、上：ｔｏｐ［整数］、幅：ｗｉｄｔｈ［整数］、高さ：ｈｅｉｇｈｔ［整数］、信頼度：ｃｏｎｆｉｄｅｎｃｅ［浮動小数点数］が指定され得る。なお、図１０は、モデルファイル３０４−１の出力層のフォーマットから共通フォーマットへの変換処理を示す図である。 The common format shown in FIG. 10B is a format of output data to the higher-level application 110 side and is common for object determination of images regardless of the type of model file. In the output data in the common format, [character string] is specified as the inference processing identifier, and the success or failure of the inference processing is: result_code [integer] (0: normal, other than 0: abnormal) as the execution result of the inference processing. Number of results: number_result [integer] is specified. Also, in the number_result [integer] (or the number of number_result), object_class [integer] (a code indicating the kind of the object, different for each model) can be specified as the kind of the object. Left: left [integer], top: top [integer], width: width [integer], height: height [integer], reliability: confidence [floating point number] can be specified as the position of the object on the image. .. Note that FIG. 10 is a diagram showing a conversion process from the format of the output layer of the model file 304-1 to the common format.

例えば、配列
０．０，０．０，０．０，０．３，０．７，０．０，０．８，０．６，０．０，０．０
がモデルファイル３０４−１から出力データとして出力されると、推論アプリケーション３０３は、その出力データを
推論処理の成否：０ ※成功
推論処理の結果数：３ ※１０のうち３つが０．５以上の数値
｛
［
物体の種類：６ ※６番目の要素（０始まり）
画像上の物体の位置：０，０，１００，１００ ※画像全体
信頼度：０．８
］，
［
物体の種類：４ ※４番目の要素（０始まり）
画像上の物体の位置：０，０，１００，１００ ※画像全体
信頼度：０．７ ※４番目の数値が信頼度
］，
［
物体の種類：７ ※７番目の要素（０始まり）
画像上の物体の位置：０，０，１００，１００ ※画像全体
信頼度：０．６ ※７番目の数値が信頼度
］
｝
に変換して、共通フォーマットの出力データとし得る。 For example, array 0.0, 0.0, 0.0, 0.3, 0.7, 0.0, 0.8, 0.6, 0.0, 0.0
When is output as output data from the model file 304-1, the inference application 303 outputs the output data to the success or failure of the inference process: 0 * success The number of results of the inference process: 3 * 3 out of 10 is 0.5 or more Numerical value {
[
Object type: 6 * 6th element (starting from 0)
Position of object on image: 0, 0, 100, 100 * Overall image reliability: 0.8
],
[
Object type: 4 * 4th element (starting from 0)
Position of the object on the image: 0, 0, 100, 100 * Overall image reliability: 0.7 * The fourth numerical value is the reliability],
[
Object type: 7 * 7th element (starting from 0)
Position of the object on the image: 0, 0, 100, 100 * Overall image reliability: 0.6 * 7th numerical value is reliability]
}
To output data in a common format.

あるいは、例えば、推論処理識別子がモデルファイル３０４−２に対応している場合、図８に示すように、推論アプリケーション３０３は、推論要求に含まれた推論処理識別子から、実行すべきモデルファイルがモデルファイル３０４−２であると特定しミドルウェア３０５へ通知する。また、推論アプリケーション３０３は、共通フォーマットの入力データをモデルファイル３０４−２のフォーマットの入力データに変換してミドルウェア３０５へ供給する。ミドルウェア３０５は、モデルファイル３０４−２の入力層へフォーマット変換後の入力データを入力し、モデルファイル３０４−２で推論処理を実行させる。推論処理の実行結果としての出力データ（モデルファイル３０４−２のフォーマットの出力データ）がモデルファイル３０４−２からミドルウェア３０５へ供給されると、ミドルウェア３０５は、モデルファイル３０４−２の出力データを推論アプリケーション３０３へ供給する。推論アプリケーション３０３は、モデルファイル３０４−２のフォーマットの出力データを共通フォーマットの出力データに変換する。推論アプリケーション３０３は、フォーマット変換後の出力データをＡＩクラスタ管理部１３０及びＡＰＩ１２０経由で上位アプリケーション１１０へ送信する。 Alternatively, for example, when the inference processing identifier corresponds to the model file 304-2, the inference application 303 determines that the model file to be executed is the model from the inference processing identifier included in the inference request, as shown in FIG. The file is identified as the file 304-2 and the middleware 305 is notified. Further, the inference application 303 converts the input data in the common format into the input data in the format of the model file 304-2 and supplies it to the middleware 305. The middleware 305 inputs the format-converted input data to the input layer of the model file 304-2, and causes the model file 304-2 to execute inference processing. When the output data (output data in the format of the model file 304-2) as the execution result of the inference process is supplied from the model file 304-2 to the middleware 305, the middleware 305 infers the output data of the model file 304-2. Supply to the application 303. The inference application 303 converts the output data in the format of the model file 304-2 into output data in the common format. The inference application 303 transmits the output data after the format conversion to the upper application 110 via the AI cluster management unit 130 and the API 120.

上位アプリケーション１１０側から推論アプリケーション３０３で受信される推論要求のフォーマットは、ＡＰＩ１２０で共通フォーマットに変換されており、例えば、図１２（ａ）に示すようなフォーマットになっている。図１２（ａ）に示す共通フォーマットは、図９（ａ）に示す共通フォーマットと同様である。 The format of the inference request received by the inference application 303 from the higher-level application 110 side is converted into a common format by the API 120, for example, the format shown in FIG. The common format shown in FIG. 12A is the same as the common format shown in FIG. 9A.

モデルファイル３０４−２が物体認識モデルである場合、物体認識モデル（モデルファイル３０４−２）は、画像に識別可能な物体（人物、車、猫など）が含まれているかどうかを判定する。入力画像は任意サイズの画像で、画像全体を判定し、物体を検出するとその物体の種類（数字）と位置、信頼度を出力する。 When the model file 304-2 is an object recognition model, the object recognition model (model file 304-2) determines whether or not the image includes an identifiable object (person, car, cat, etc.). The input image is an image of an arbitrary size, and when the entire image is determined and an object is detected, the type (number), position, and reliability of the object are output.

この場合、推論アプリケーション３０３は、図１２（ａ）に示す共通フォーマットの入力データ（入力画像データ）に１次元（深さ）を追加した入力データに変換し、図１２（ｂ）に示すモデルファイル３０４−２のフォーマットの入力データとする。 In this case, the inference application 303 converts the input data (input image data) of the common format shown in FIG. 12A into input data in which one dimension (depth) is added, and the model file shown in FIG. The input data is in the format 304-2.

図１２（ｂ）に示すモデルファイル３０４−２のフォーマットは、物体認識モデル（モデルファイル３０４−２）の入力データのフォーマットであり、推論処理の実行条件を設定するパラメータとして、最大個数、判定したい物体の種類、信頼度のしきい値、推論処理の対象となるデータ（画像）：高さ×幅×深さ×３のｕｎｉｔ３２の配列（深さは３次元計算用で通常不要）が指定され得る。 The format of the model file 304-2 shown in FIG. 12B is the format of the input data of the object recognition model (model file 304-2), and it is desired to determine the maximum number as a parameter for setting the execution condition of the inference process. Object type, reliability threshold, data (image) to be subjected to inference processing: height×width×depth×3 unit32 array (depth is normally not required for 3D calculation) obtain.

例えば、パラメータとして信頼度のしきい値が０．５とされ、最大個数が５とされ、共通フォーマットの入力データとして、１００×１００×３×８ｂｉｔのカラーの画像を受けた場合、推論アプリケーション３０３は、１００×１００×１×３×３２ｂｉｔの画像に変換してモデルファイル３０４−２の入力データとし、信頼度のしきい値として０．５、最大個数として５を推論処理の実行結果を絞り込むためのパラメータとし得る。 For example, when the reliability threshold value is set to 0.5 and the maximum number is set to 5 as a parameter, and a color image of 100×100×3×8 bits is received as input data of the common format, the inference application 303 Is converted to an image of 100×100×1×3×32 bits and used as input data of the model file 304-2, and the result of the inference process is narrowed down to 0.5 as the reliability threshold and 5 as the maximum number. Can be used as a parameter for

なお、図１２は、共通フォーマットからモデルファイル３０４−２の入力層のフォーマットへの変換処理を示す図である。 Note that FIG. 12 is a diagram showing a conversion process from the common format to the format of the input layer of the model file 304-2.

また、モデルファイル３０４−２での推論処理の実行結果としての出力データは、図１３（ａ）に示すようなモデルファイル３０４−２のフォーマットの出力データになっている。 The output data as the execution result of the inference process in the model file 304-2 is output data in the format of the model file 304-2 as shown in FIG.

図１３（ａ）に示すモデルファイル３０４−２のフォーマットは、物体認識モデル（モデルファイル３０４−２）の出力データのフォーマットであり、推論処理の実行結果として、判定した物体の個数：［整数］、物体の種類：最大個数分の配列、物体の位置：最大個数分の配列（位置は縦横を１とした比率で返却）、物体の信頼度：最大個数分の配列が指定され得る。 The format of the model file 304-2 shown in FIG. 13A is the format of the output data of the object recognition model (model file 304-2), and the number of judged objects is [integer] as the execution result of the inference process. The type of object: maximum number of arrays, the position of objects: maximum number of arrays (positions are returned at a ratio of 1 in the vertical and horizontal directions), object reliability: maximum number of arrays can be specified.

この場合、推論アプリケーション３０３は、図１３（ａ）に示すモデルファイル３０４−２のフォーマットの出力データに対して、種類、位置、信頼度、判定した物体の個数の配列から、信頼度がしきい値以上の要素のみ出力し、図１３（ｂ）、図１３（ｃ）に示す共通フォーマットの出力データとする。 In this case, the inference application 303 has a certain reliability with respect to the output data in the format of the model file 304-2 shown in FIG. 13A, from the array of the type, the position, the reliability, and the determined number of objects. Only the elements having the values or more are output, and the output data is in the common format shown in FIGS. 13B and 13C.

図１３（ｂ）、図１３（ｃ）に示す共通フォーマットは、図１０（ｂ）、図１０（ｃ）に示す共通フォーマットと同様である。 The common format shown in FIGS. 13B and 13C is the same as the common format shown in FIGS. 10B and 10C.

例えば、
判定した物体の個数２
物体の種類の配列［１，４，０，０，０］（最大個数分）
物体の位置の配列［［０．２，０．１５，０．３，０．２］，［０．７，０．８，０．２，０．１］，０，０，０］（最大個数分）
物体の信頼度の配列［０．８，０．３，０，０，０］（最大個数分）
がモデルファイル３０４−１から出力データとして出力されると、推論アプリケーション３０３は、その出力データを
推論処理の成否：０ ※成功
推論処理の結果数：１ ※５のうち１つだけ信頼度が０．５以上
｛
［
物体の種類：１ ※物体の種類を示すコード（モデルごとに定義）
画像上の物体の位置：２０，１５，３０，２０ ※位置を比率から座標値に変換
信頼度：０．８
］
｝
に変換して、共通フォーマットの出力データとし得る。 For example,
Number of judged objects 2
Array of object types [1, 4, 0, 0, 0] (maximum number)
Array of object positions [[0.2, 0.15, 0.3, 0.2], [0.7, 0.8, 0.2, 0.1], 0, 0, 0] (maximum (For the number of pieces)
Array of object reliability [0.8, 0.3, 0, 0, 0] (maximum number)
Is output as output data from the model file 304-1, the inference application 303 outputs the output data to the success or failure of the inference process: 0 *Successful number of results of the inference process: 1 *Only one of 5 has a reliability of 0 .5 or more {
[
Object type: 1 *Code indicating the object type (defined for each model)
Position of object on image: 20, 15, 30, 20 * Convert position from ratio to coordinate value Reliability: 0.8
]
}
To output data in a common format.

以上のように、実施形態では、情報処理システム１において、Ｎ個のコプロセッサ３０２−１〜３０２−Ｎから収集されたリソース情報を計算式に適用してＮ個のコプロセッサの処理時間をそれぞれ計算し、Ｎ個のコプロセッサの処理時間に基づいて、推論処理を割り振るコプロセッサを選択して推論要求を送信する。例えば、情報処理システム１は、処理時間が最も短いコプロセッサ３０２を選択して推論要求を送信することができる。これにより、現在のリソースの空き状況を考慮できるので、並列分散制御をリアルタイム的に効率化できる。 As described above, in the embodiment, in the information processing system 1, the resource information collected from the N coprocessors 302-1 to 302-N is applied to the calculation formula to calculate the processing time of each of the N coprocessors. Based on the processing times of the N coprocessors that are calculated, the coprocessor to which the inference processing is allocated is selected and the inference request is transmitted. For example, the information processing system 1 can select the coprocessor 302 having the shortest processing time and transmit the inference request. As a result, the current availability of resources can be taken into consideration, and parallel distributed control can be made efficient in real time.

また、実施形態では、情報処理装置１００において、測定部１３６が、推論要求に応じて推論処理を行うプロセッサ３０２による処理時間を測定する。更新部１３７は、受信されたＮ個のコプロセッサ３０２−１〜３０２−Ｎのリソース情報と測定された処理時間とに基づいて重回帰分析を行い、計算式を更新する。これにより、Ｎ個のコプロセッサ３０２−１〜３０２−Ｎの処理性能の変更に対応でき、並列分散制御を高精度化できる。 Further, in the embodiment, in the information processing device 100, the measurement unit 136 measures the processing time by the processor 302 that performs the inference process in response to the inference request. The updating unit 137 performs multiple regression analysis based on the received resource information of the N coprocessors 302-1 to 302-N and the measured processing time, and updates the calculation formula. As a result, it is possible to deal with a change in the processing performance of the N coprocessors 302-1 to 302-N, and it is possible to improve the accuracy of parallel distributed control.

また、実施形態では、情報処理装置１００において、計算式は、Ｎ個のコプロセッサコプロセッサ３０２−１〜３０２−Ｎに対応したＮ個の判別パラメータを含み、更新部１３７は、計算式におけるＮ個の判別パラメータを更新する。これにより、Ｎ個のコプロセッサ３０２−１〜３０２−Ｎの処理性能の変更に対応して計算式を更新できる。 Further, in the embodiment, in the information processing apparatus 100, the calculation formula includes N determination parameters corresponding to the N coprocessors coprocessors 302-1 to 302-N, and the updating unit 137 uses N in the calculation formula. Update the discriminant parameters. As a result, the calculation formula can be updated in response to the change in the processing performance of the N coprocessors 302-1 to 302-N.

また、実施形態では、情報処理装置１００において、推論要求は、推論処理の内容を識別する推論処理識別子を含む。計算１３１２ａ部は、Ｎ個のコプロセッサ３０２−１〜３０２−Ｎのリソース情報を、異なるＮ個のコプロセッサ３０２−１〜３０２−Ｎと異なるＭ個の推論処理内容との組み合わせに対応付けられたＮ×Ｍ個の計算式のうち推論処理識別子で識別される推論処理内容に対応したＮ個の計算式に適用する。これにより、並列分散システム内に特定の処理に対して高速であるようなコプロセッサが混在している場合に、性能差として演算処理能力に加えて処理内容も考慮できるので、並列分散制御を高精度化できる。 Further, in the embodiment, in the information processing device 100, the inference request includes the inference processing identifier that identifies the content of the inference processing. The calculation 1312a unit associates the resource information of the N coprocessors 302-1 to 302-N with a combination of different N coprocessors 302-1 to 302-N and different M inference processing contents. It is applied to N calculation formulas corresponding to the inference processing content identified by the inference processing identifier among the N×M calculation expressions. As a result, when coprocessors that are fast for specific processing are mixed in the parallel distributed system, the processing content can be taken into consideration as a performance difference in addition to the processing capacity, so parallel distributed control can be performed at a high level. The accuracy can be improved.

また、実施形態の情報処理装置１００は、推論処理の内容を識別する推論処理識別子を含む推論要求を生成して推論処理装置３００に送信する。これにより、推論処理装置３００は、推論処理識別子によってモデルファイルを特定し、不要なモデルファイルの展開を抑制して、推論処理に必要なモデルファイルのみをメモリに展開して、推論処理を実行することができる。このため、推論処理装置３００は、１つのモデルファイルを展開して推論処理を実行できる程度の性能のメモリ等を含むハードウェアを備えていればよい。この結果、情報処理システム１は、ハードウェア性能の低い推論処理装置３００であっても、効率的に推論処理を実行して、上位アプリケーションの要求に応えることができる。 The information processing apparatus 100 according to the embodiment also generates an inference request including an inference processing identifier that identifies the content of the inference processing and transmits the inference request to the inference processing apparatus 300. As a result, the inference processing device 300 identifies the model file by the inference processing identifier, suppresses the development of unnecessary model files, develops only the model files necessary for the inference processing in the memory, and executes the inference processing. be able to. Therefore, the inference processing apparatus 300 may be provided with hardware including a memory or the like having such a performance that one model file can be expanded and inference processing can be executed. As a result, the information processing system 1 can efficiently execute the inference processing even in the inference processing device 300 having low hardware performance and meet the request of the higher-level application.

また、実施形態では、情報処理装置１００において、並列分散制御の制御対象となるＮ個のコプロセッサコプロセッサ３０２−１〜３０２−Ｎが、推論要求に応じた推論処理を行う。これにより、同じ処理を繰り返し実行するような処理がコプロセッサ３０２で実行されるので、複数のコプロセッサ３０２−１〜３０２−Ｎのリソース状況からおおよその処理時間を容易に推定できる。 Further, in the embodiment, in the information processing apparatus 100, the N coprocessors coprocessors 302-1 to 302-N to be controlled by the parallel distributed control perform the inference processing according to the inference request. As a result, the coprocessor 302 executes a process for repeatedly executing the same process, so that the approximate processing time can be easily estimated from the resource statuses of the plurality of coprocessors 302-1 to 302-N.

また、実施形態では、推論処理装置３００において、Ｍ個のモデルファイル３０４−１〜３０４−Ｍが、互いに異なる推論処理に対応している。ミドルウェア３０５は、Ｍ個のモデルファイル３０４−１〜３０４−Ｍをそれぞれ実行可能である。推論アプリケーション３０３は、推論処理識別子と入力データとを含む推論要求を情報処理装置１００から受信し、Ｍ個のモデルファイル３０４−１〜３０４−Ｍのうち推論処理識別子に対応するモデルファイルを特定する。ミドルウェア３０５は、特定されたモデルファイルを読み込み、推論処理を実行する。これにより、様々な推論処理に対して同じ学習済みモデルが用いられる場合に比較して、推論処理毎に機械学習で効率化されたモデルファイル３０４を用いることができるので、推論処理装置３００における推論処理を効率化できる。 In the embodiment, in the inference processing device 300, the M model files 304-1 to 304-M correspond to different inference processes. The middleware 305 can execute each of the M model files 304-1 to 304-M. The inference application 303 receives an inference request including an inference processing identifier and input data from the information processing apparatus 100, and specifies a model file corresponding to the inference processing identifier from the M model files 304-1 to 304-M. .. The middleware 305 reads the identified model file and executes inference processing. As a result, compared to the case where the same learned model is used for various inference processes, the model file 304 that has been made efficient by machine learning can be used for each inference process. The processing can be made efficient.

また、実施形態では、推論処理装置３００において、推論アプリケーション３０３が、推論処理識別子と共通フォーマットの入力データとを含む推論要求を情報処理装置１００から受信する。推論アプリケーション３０３は、共通フォーマットの入力データを、Ｍ個のモデルファイル３０４−１〜３０４−Ｍのうち推論処理識別子に対応するモデルファイル３０４で実行可能であるフォーマットの入力データへ変換する。ミドルウェア３０５は、そのフォーマットの入力データを用いて、特定されたモデルファイル３０４を実行する。これにより、情報処理装置１００側から受けた入力データを用いてモデルファイル３０４で推論処理を実行させることができる。 Further, in the embodiment, in the inference processing device 300, the inference application 303 receives the inference request including the inference processing identifier and the input data in the common format from the information processing device 100. The inference application 303 converts the input data in the common format into input data in a format that can be executed by the model file 304 corresponding to the inference processing identifier among the M model files 304-1 to 304-M. The middleware 305 executes the specified model file 304 by using the input data of that format. This allows the model file 304 to execute the inference process using the input data received from the information processing apparatus 100 side.

また、実施形態では、推論処理装置３００において、推論アプリケーション３０３が、特定されたモデルファイル３０４の実行で得られたモデルファイル３０４用のフォーマットの出力データを共通フォーマットの出力データへ変換して情報処理装置１００へ送信する。これにより、モデルファイル３０４の推論処理の実行結果を情報処理装置１００側へ提供することができる。 Further, in the embodiment, in the inference processing device 300, the inference application 303 converts the output data in the format for the model file 304 obtained by executing the specified model file 304 into the output data in the common format for information processing. Send to the device 100. As a result, the execution result of the inference process of the model file 304 can be provided to the information processing device 100 side.

また、実施形態では、推論処理装置３００において、他のモデルファイル３０４に対応した推論処理識別子と共通フォーマットの入力データとを含む推論要求を情報処理装置１００から受信する。推論アプリケーション３０３は、共通フォーマットの入力データを、Ｍ個のモデルファイル３０４−１〜３０４−Ｍのうち推論処理識別子に対応する他のモデルファイル３０４で実行可能であるフォーマットの入力データへ変換する。ミドルウェア３０５は、そのフォーマットの入力データを用いて、特定された他のモデルファイル３０４を実行する。これにより、情報処理装置１００側から受けた入力データを用いて他のモデルファイル３０４で推論処理を実行させることができる。 Further, in the embodiment, the inference processing apparatus 300 receives an inference request including an inference processing identifier corresponding to another model file 304 and input data in a common format from the information processing apparatus 100. The inference application 303 converts the input data in the common format into input data in a format that can be executed by another model file 304 corresponding to the inference processing identifier among the M model files 304-1 to 304-M. The middleware 305 executes the specified other model file 304 by using the input data of the format. As a result, the inference process can be executed by the other model file 304 using the input data received from the information processing device 100 side.

また、実施形態では、推論処理装置３００において、推論アプリケーション３０３が、特定された他のモデルファイル３０４の実行で得られた他のモデルファイル３０４用のフォーマットの出力データを共通フォーマットの出力データへ変換して情報処理装置１００へ送信する。これにより、他のモデルファイル３０４の推論処理の実行結果を情報処理装置１００側へ提供することができる。 Further, in the embodiment, in the inference processing device 300, the inference application 303 converts the output data in the format for the other model file 304 obtained by executing the other identified model file 304 into the output data in the common format. Then, the information is transmitted to the information processing apparatus 100. As a result, the execution result of the inference process of the other model file 304 can be provided to the information processing apparatus 100 side.

なお、各推論処理装置３００−１〜３００−Ｎは、図１４に示すように、複数のミドルウェア３０５ｉ−１，３０５ｉ−２を有していてもよい。図１４は、実施形態の変形例における推論処理装置３００の機能構成を示す図である。 Each inference processing device 300-1 to 300-N may include a plurality of middlewares 305i-1 and 305i-2, as shown in FIG. FIG. 14 is a diagram illustrating a functional configuration of the inference processing device 300 according to the modified example of the embodiment.

この場合、複数のミドルウェア３０５ｉ−１，３０５ｉ−２のそれぞれは、Ｍ個のモデルファイル３０４−１〜３０４−Ｍにおける１以上のモデルファイルが対応している。例えば、図１４の場合、ミドルウェア３０５ｉ−１は、１個のモデルファイル３０４−１に対応しており、ミドルウェア３０５ｉ−２は、（Ｍ−１）個のモデルファイル３０４−２〜３０４−Ｍに対応している。 In this case, each of the plurality of middlewares 305i-1 and 305i-2 corresponds to one or more model files in the M model files 304-1 to 304-M. For example, in the case of FIG. 14, the middleware 305i-1 corresponds to one model file 304-1, and the middleware 305i-2 corresponds to (M-1) model files 304-2 to 304-M. It corresponds.

推論アプリケーション３０３は、それぞれがミドルウェア３０５ｉとミドルウェアに対応するモデルファイル３０４とを含む複数の組のうち推論処理識別子に対応する組を特定する。特定された組に含まれたミドルウェア３０５ｉは、特定された組に含まれたモデルファイル３０４を実行する。 The inference application 303 identifies a set corresponding to the inference processing identifier among a plurality of sets each including the middleware 305i and the model file 304 corresponding to the middleware. The middleware 305i included in the specified set executes the model file 304 included in the specified set.

例えば、図１４の場合、Ｍ個の組として、（ミドルウェア３０５ｉ−１，モデルファイル３０４−１）、（ミドルウェア３０５ｉ−２，モデルファイル３０４−２）、（ミドルウェア３０５ｉ−２，モデルファイル３０４−３）、・・・、（ミドルウェア３０５ｉ−２，モデルファイル３０４−Ｍ）が存在する。 For example, in the case of FIG. 14, as M sets, (middleware 305i-1, model file 304-1), (middleware 305i-2, model file 304-2), (middleware 305i-2, model file 304-3) , (Middleware 305i-2, model file 304-M) exist.

推論処理識別子がモデルファイル３０４−１に対応している場合、推論アプリケーション３０３は、（ミドルウェア３０５ｉ−１，モデルファイル３０４−１）を推論処理識別子に対応する組として特定する。この場合、ミドルウェア３０５ｉ−１がモデルファイル３０４−１を実行する。 When the inference processing identifier corresponds to the model file 304-1, the inference application 303 identifies (middleware 305i-1, model file 304-1) as a set corresponding to the inference processing identifier. In this case, the middleware 305i-1 executes the model file 304-1.

推論処理識別子がモデルファイル３０４−２に対応している場合、推論アプリケーション３０３は、（ミドルウェア３０５ｉ−２，モデルファイル３０４−２）を推論処理識別子に対応する組として特定する。この場合、ミドルウェア３０５ｉ−２がモデルファイル３０４−２を実行する。 When the inference processing identifier corresponds to the model file 304-2, the inference application 303 identifies (middleware 305i-2, model file 304-2) as a set corresponding to the inference processing identifier. In this case, the middleware 305i-2 executes the model file 304-2.

このように、各推論処理装置３００−１〜３００−Ｎが複数のミドルウェア３０５ｉ−１，３０５ｉ−２を有する場合も、推論処理毎に機械学習で効率化されたモデルファイル３０４を用いることができるので、推論処理装置３００における推論処理を効率化できる。 As described above, even when each of the inference processing devices 300-1 to 300-N has a plurality of middlewares 305i-1 and 305i-2, it is possible to use the model file 304 that is made efficient by machine learning for each inference process. Therefore, the inference processing in the inference processing device 300 can be made efficient.

あるいは、ブリッジコントローラ２０２は、図１５〜図２１に示すようなＰＣＩｅ（ＰｅｒｉｐｈｅｒａｌＣｏｍｐｏｎｅｎｔＩｎｔｅｒｃｏｎｎｅｃｔｅｘｐｒｅｓｓ）規格に対応したＰＣＩｅブリッジコントローラ３であってもよい。図１５は、実施形態の他の変形例における情報処理システムのハードウェア構成を示す図である。図１６は、実施形態の他の変形例における情報処理システムのソフトウェア構成を示す図である。図１７は、実施形態の他の変形例におけるＰＣＩｅブリッジコントローラのハードウェア構成を示す図である。図１８は、実施形態の他の変形例におけるＰＣＩｅのレイヤ構成を示す図である。図１９は、実施形態の他の変形例におけるコプロセッサＧからの他のプロセッサの見え方を示す図である。図２０は、実施形態の他の変形例におけるコプロセッサＤからの他のプロセッサの見え方を示す図である。図２１は、実施形態の他の変形例におけるＰＣＩｅブリッジコントローラを介したプロセッサ間のデータ転送処理を示す図である。 Alternatively, the bridge controller 202 may be a PCIe bridge controller 3 compatible with a PCIe (Peripheral Component Interconnect express) standard as shown in FIGS. FIG. 15 is a diagram illustrating a hardware configuration of an information processing system according to another modification of the embodiment. FIG. 16 is a diagram showing a software configuration of an information processing system in another modification of the embodiment. FIG. 17 is a diagram illustrating a hardware configuration of a PCIe bridge controller according to another modification of the embodiment. FIG. 18 is a diagram illustrating a PCIe layer configuration according to another modification of the embodiment. FIG. 19 is a diagram showing how another processor looks from the coprocessor G in another modification of the embodiment. FIG. 20 is a diagram showing how another processor looks from the coprocessor D in another modification of the embodiment. FIG. 21 is a diagram illustrating a data transfer process between processors via a PCIe bridge controller according to another modification of the embodiment.

図１５に例示する情報処理システム１においては、ＰＣＩｅブリッジコントローラ３および複数（図４に示す例では８つ）のプラットフォーム２−１〜２−８を備える。各プラットフォーム２−１〜２−８は、それぞれＰＣＩｅブリッジコントローラ３に接続されている。 The information processing system 1 illustrated in FIG. 15 includes a PCIe bridge controller 3 and a plurality of (eight in the example illustrated in FIG. 4) platforms 2-1 to 2-8. The platforms 2-1 to 2-8 are connected to the PCIe bridge controller 3, respectively.

なお、以下、プラットフォームを示す符号としては、複数のプラットフォームのうち１つを特定する必要があるときには符号２−１〜２−８を用いるが、任意のプラットフォームを指すときには符号２を用いる。プラットフォーム２はＰＣプラットフォーム２といってもよい。 In addition, hereinafter, as the code indicating the platform, reference numerals 2-1 to 2-8 are used when it is necessary to specify one of the plurality of platforms, but reference numeral 2 is used when referring to an arbitrary platform. The platform 2 may be called the PC platform 2.

プラットフォーム２−１はメインプロセッサ２１−１を備える。メインプロセッサ２１−１は、実施形態のメインプロセッサ１０２に対応している。プラットフォーム２−２〜２−８はコプロセッサ２１−２〜２１−８をそれぞれ備える。コプロセッサ２１−２〜２１−８は、それぞれ、実施形態のコプロセッサ３０２−１〜３０２−Ｎに対応している。 The platform 2-1 includes a main processor 21-1. The main processor 21-1 corresponds to the main processor 102 of the embodiment. The platforms 2-2 to 2-8 include coprocessors 21-2 to 21-8, respectively. The coprocessors 21-2 to 21-8 correspond to the coprocessors 302-1 to 302-N of the embodiment, respectively.

メインプロセッサ２１−１及びコプロセッサ２１−２〜２１−８は、それぞれ違うメーカ（ベンダ）から提供されてもよい。例えば、メインプロセッサ２１−１，コプロセッサ２１−２，コプロセッサ２１−３，コプロセッサ２１−４，コプロセッサ２１−５，コプロセッサ２１−６，コプロセッサ２１−７，コプロセッサ２１−８は、それぞれ、Ａ社，Ｂ社，Ｃ社，Ｄ社，Ｅ社，Ｆ社，Ｇ社，Ｈ社が提供するものであるとする。 The main processor 21-1 and the coprocessors 21-2 to 21-8 may be provided by different manufacturers (vendors). For example, the main processor 21-1, the coprocessor 21-2, the coprocessor 21-3, the coprocessor 21-4, the coprocessor 21-5, the coprocessor 21-6, the coprocessor 21-7, and the coprocessor 21-8 are , A company, B company, C company, D company, E company, F company, G company, and H company, respectively.

また、以下、コプロセッサ２１−２，コプロセッサ２１−３，コプロセッサ２１−４，コプロセッサ２１−５，コプロセッサ２１−６，コプロセッサ２１−７，コプロセッサ２１−８を、それぞれコプロセッサＡ，Ｂ，Ｃ，Ｄ，Ｅ，Ｆ，Ｇという場合がある。また、ＰＣＩｅブリッジコントローラに搭載されているＥＰに対して、それぞれ異なるプラットフォームを接続しても良い。さらに、２つ以上の複数のＥＰを１つのプラットフォームに接続し、プラットフォーム側が複数のＲＣを用いてＰＣＩｅブリッジコントローラと通信しても良い。 Further, hereinafter, the coprocessor 21-2, the coprocessor 21-3, the coprocessor 21-4, the coprocessor 21-5, the coprocessor 21-6, the coprocessor 21-7, and the coprocessor 21-8 are respectively referred to as coprocessors. Sometimes referred to as A, B, C, D, E, F, G. Also, different platforms may be connected to the EPs mounted on the PCIe bridge controller. Further, two or more EPs may be connected to one platform, and the platform side may use the RCs to communicate with the PCIe bridge controller.

なお、以下、プロセッサを示す符号としては、複数のプロセッサのうち１つを特定する必要があるときには符号２１−１〜２１−８もしくは符号Ａ〜Ｇ等を用いるが、任意のプロセッサを指すときには符号２１を用いる。 In addition, hereinafter, as the reference numeral indicating the processor, reference numerals 21-1 to 21-8 or reference numerals A to G are used when it is necessary to specify one of the plurality of processors, but when referring to any processor, 21 is used.

プラットフォーム２−１〜２−８は、ＡＩ推論処理や画像処理等の演算処理を行なうコンピュータ環境であり、プロセッサ２１や図２１に示すストレージ２３およびメモリ（物理メモリ）２２を備える。 The platforms 2-1 to 2-8 are computer environments that perform arithmetic processing such as AI inference processing and image processing, and include a processor 21, a storage 23 and a memory (physical memory) 22 shown in FIG.

プラットフォーム２においては、プロセッサ２１がメモリ２２やストレージ２３に格納されたプログラムを実行することで各種機能を実現する。 In the platform 2, various functions are realized by the processor 21 executing programs stored in the memory 22 and the storage 23.

ストレージ２３は、ハードディスクドライブ（ＨａｒｄＤｉｓｋＤｒｉｖｅ：ＨＤＤ）、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）、ストレージクラスメモリ（ＳｔｏｒａｇｅＣｌａｓｓＭｅｍｏｒｙ：ＳＣＭ）等の記憶装置であって、種々のデータを格納するものである。 The storage 23 is a storage device such as a hard disk drive (HDD), an SSD (solid state drive), and a storage class memory (storage class memory: SCM), and stores various data.

メモリ２２はＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）およびＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）を含む記憶メモリである。メモリ２２のＲＯＭには、各種ソフトウェアプログラムやこのプログラム用のデータ類が書き込まれている。メモリ２２上のソフトウェアプログラムは、プロセッサ２１に適宜読み込まれて実行される。また、メモリ２２のＲＡＭは、一次記憶メモリあるいはワーキングメモリとして利用される。 The memory 22 is a storage memory including a ROM (Read Only Memory) and a RAM (Random Access Memory). Various software programs and data for these programs are written in the ROM of the memory 22. The software program on the memory 22 is appropriately read and executed by the processor 21. The RAM of the memory 22 is used as a primary storage memory or a working memory.

プロセッサ２１（メインプロセッサ２１−１又はコプロセッサ２１−２〜２１−８）は、プラットフォーム２全体を制御する。プロセッサ２１は、マルチプロセッサであってもよい。プロセッサ２１は、例えばＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ），ＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ），ＭＰＵ（ＭｉｃｒｏＰｒｏｃｅｓｓｉｎｇＵｎｉｔ），ＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ），ＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ），ＰＬＤ（ＰｒｏｇｒａｍｍａｂｌｅＬｏｇｉｃＤｅｖｉｃｅ），ＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）のいずれか一つであってもよい。また、プロセッサ２１は、ＣＰＵ，ＧＰＵ，ＭＰＵ，ＤＳＰ，ＡＳＩＣ，ＰＬＤ，ＦＰＧＡのうちの２種類以上の要素の組み合わせであってもよい。例えば、コプロセッサ２１−２〜２１−８は、ＣＰＵ及びＧＰＵの組み合わせであってもよい。 The processor 21 (main processor 21-1 or coprocessors 21-2 to 21-8) controls the entire platform 2. The processor 21 may be a multiprocessor. The processor 21 is, for example, a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), an MPU (Micro Processing FPGA), a DSP (Digital Signal Progressive, Incremental Integrated Circuit, ASIC). It may be any one of the Field Programmable Gate Array). Further, the processor 21 may be a combination of two or more types of elements among CPU, GPU, MPU, DSP, ASIC, PLD, and FPGA. For example, the coprocessors 21-2 to 21-8 may be a combination of CPU and GPU.

図１６に例示する情報処理システム１において、プラットフォーム２−１はＷｉｎｄｏｗｓをＯＳとし、このＯＳ上において店舗管理プログラムが実行される。店舗管理プログラムは、実施形態の上位アプリケーション１１０が対応する。プラットフォーム２−２，２−３はそれぞれＬｉｎｕｘ（登録商標）をＯＳとし、このＯＳ上において分散処理プログラム（分散処理Ａ，Ｂ）が実行される。分散処理プログラム（分散処理Ａ，Ｂ）は、実施形態の推論処理アプリケーション３０３に対応する。 In the information processing system 1 illustrated in FIG. 16, the platform 2-1 uses Windows as an OS, and the store management program is executed on this OS. The store management program corresponds to the upper application 110 of the embodiment. Each of the platforms 2-2 and 2-3 uses Linux (registered trademark) as an OS, and the distributed processing programs (distributed processing A and B) are executed on this OS. The distributed processing programs (distributed processing A and B) correspond to the inference processing application 303 of the embodiment.

各プラットフォーム２には、ブリッジドライバ２０が備えられており、プラットフォーム２は、このブリッジドライバ２０を介してＰＣＩｅブリッジコントローラ３および他のプラットフォーム２との間で通信を行なう。なお、ブリッジドライバ２０による通信方法については後述する。 Each platform 2 is provided with a bridge driver 20, and the platform 2 communicates with the PCIe bridge controller 3 and another platform 2 via this bridge driver 20. The communication method by the bridge driver 20 will be described later.

各プラットフォーム２においては、プロセッサ２１およびメモリ（物理メモリ）２２を備え、プロセッサ２１がメモリ２２に格納されたＯＳや各種プログラム，ドライバ等を実行することでそれぞれの機能を実現する。 Each platform 2 includes a processor 21 and a memory (physical memory) 22, and each function is realized by the processor 21 executing an OS, various programs, drivers, etc. stored in the memory 22.

各プラットフォーム２に備えられるプロセッサ２１（メインプロセッサ２１−１又はコプロセッサ２１−２〜２１−８）は、互いに違うベンダによって提供されるものであってもよい。図１１に示す例においては、少なくとも一部のプラットフォーム２（例えば、プラットフォーム２−１）に複数のＲＣを有するプラットフォーム（例えば、Ｉｎｔｅｌ社のｘ８６プロセッサ）が用いられてもよい。 The processors 21 (main processor 21-1 or coprocessors 21-2 to 21-8) included in each platform 2 may be provided by different vendors. In the example illustrated in FIG. 11, a platform having a plurality of RCs (for example, an x86 processor manufactured by Intel Corporation) may be used for at least a part of the platforms 2 (for example, the platform 2-1).

また、各プラットフォーム２は、それぞれ他のドライバ構成に影響を与えないように独立動作可能に構成されている。 Further, each platform 2 is configured to be independently operable so as not to affect other driver configurations.

プラットフォーム２において、メモリ２２の記憶領域の一部は、図２１を用いて後述する如く、プラットフォーム２間（プロセッサ２１間）において転送されるデータが一時的に格納される通信バッファ２２１として用いられる。 In the platform 2, a part of the storage area of the memory 22 is used as a communication buffer 221 in which data transferred between the platforms 2 (between the processors 21) is temporarily stored, as described later with reference to FIG.

ＰＣＩｅブリッジコントローラ３は、複数のプラットフォーム２−１〜２−８間におけるデータ等の通信を実現する。 The PCIe bridge controller 3 realizes communication of data and the like between the plurality of platforms 2-1 to 2-8.

図１７に示すＰＣＩｅブリッジコントローラ３は、例えば、８チャネルのＥＰを１チップ内に有する中継装置である。このＰＣＩｅブリッジコントローラ３は、図１７に示すように、ＣＰＵ３１，メモリ３２，インターコネクトバス３３および複数（図１７に示す例では８つ）のスロット３４−１〜３４−８を備える。 The PCIe bridge controller 3 shown in FIG. 17 is, for example, a relay device having an EP of 8 channels in one chip. As shown in FIG. 17, the PCIe bridge controller 3 includes a CPU 31, a memory 32, an interconnect bus 33, and a plurality of (eight in the example shown in FIG. 17) slots 34-1 to 34-8.

スロット３４−１〜３４−８にはそれぞれＰＣＩｅの規格を満たすよう構成されたデバイスが接続される。例えば、情報処理システム１においては、スロット３４−１〜３４−８のそれぞれにプラットフォーム２が接続される。 Devices configured to satisfy the PCIe standard are connected to the slots 34-1 to 34-8. For example, in the information processing system 1, the platform 2 is connected to each of the slots 34-1 to 34-8.

なお、以下、スロットを示す符号としては、複数のスロットのうち１つを特定する必要があるときには符号３４−１〜３４−８を用いるが、任意のスロットを指すときには符号３４を用いる。 In addition, hereinafter, as the code indicating the slot, the codes 34-1 to 34-8 are used when it is necessary to specify one of the plurality of slots, but the code 34 is used to indicate an arbitrary slot.

なお、図１５中のプラットフォーム２−２〜２−８のように、一つのスロット３４に対して一つのプロセッサ２が接続されてもよく、図１５中のプラットフォーム２−１のように、複数（図４の例では２つ）のスロット３４に対して一つのプラットフォーム２が接続されてもよく、種々変形して実施することができる。 Note that one processor 2 may be connected to one slot 34 like the platforms 2-2 to 2-8 in FIG. 15, and a plurality of (such as the platform 2-1 in FIG. One platform 2 may be connected to the slots 34 (two in the example of FIG. 4), and various modifications can be implemented.

図１５中のプラットフォーム２−１のように、一つのプラットフォーム２に対して複数のスロット３４をアサインすることで、このプラットフォーム２−７に広い通信帯域を用いた通信を行なわせることができる。 By assigning a plurality of slots 34 to one platform 2 like the platform 2-1 in FIG. 15, it is possible to allow the platform 2-7 to perform communication using a wide communication band.

各スロット３４は内部バス（ＩｎｔｅｒｎａｌＢｕｓ）を介してインターコネクト３３にそれぞれ接続されている。また、インターコネクト３３にはＣＰＵ３１およびメモリ３２が接続されている。これにより、各スロット３４とＣＰＵ３１およびメモリ３２はインターコネクト３３を介して相互に通信可能に接続されている。 Each slot 34 is connected to the interconnect 33 via an internal bus. A CPU 31 and a memory 32 are connected to the interconnect 33. As a result, each slot 34, the CPU 31 and the memory 32 are communicably connected to each other via the interconnect 33.

メモリ３２は、例えば、ＲＯＭおよびＲＡＭを含む記憶メモリ（物理メモリ）である。メモリ３２のＲＯＭには、データ通信制御に係るソフトウェアプログラムやこのプログラム用のデータ類が書き込まれている。メモリ３２上のソフトウェアプログラムは、ＣＰＵ３１に適宜読み込まれて実行される。また、メモリ３２のＲＡＭは、一次記憶メモリあるいはワーキングメモリとして利用される。 The memory 32 is a storage memory (physical memory) including a ROM and a RAM, for example. In the ROM of the memory 32, a software program relating to data communication control and data for this program are written. The software program on the memory 32 is appropriately read and executed by the CPU 31. The RAM of the memory 32 is used as a primary storage memory or a working memory.

さらに、各プラットフォーム２には、各スロットに対応させてメモリ領域３５（図２１参照）が備えられ、メモリ領域３５には、スロットの数だけ分割された複数の記憶領域が設定され、各記憶領域はいずれかのスロット３０５に対応付けられている。すなわち、メモリ２２のメモリ領域３５には、スロット＃０〜＃７のそれぞれに対応する記憶領域が設けられている。 Further, each platform 2 is provided with a memory area 35 (see FIG. 21) corresponding to each slot, and a plurality of storage areas divided by the number of slots are set in the memory area 35. Is associated with one of the slots 305. That is, the memory area 35 of the memory 22 is provided with a storage area corresponding to each of the slots #0 to #7.

ＰＣＩｅブリッジコントローラ３は、後述の如く、スロット毎に対応付けられたメモリ領域３５の記憶領域を用いてプラットフォーム２間のデータ転送を行なう。 The PCIe bridge controller 3 uses the storage area of the memory area 35 associated with each slot to transfer data between the platforms 2 as described later.

ＣＰＵ３１は、ＰＣＩｅブリッジコントローラ３全体を制御する。ＣＰＵ３１は、マルチプロセッサであってもよい。なお、ＣＰＵ３１に代えてＭＰＵ，ＤＳＰ，ＡＳＩＣ，ＰＬＤ，ＦＰＧＡのいずれか一つが用いられてもよい。また、ＣＰＵ３１は、ＣＰＵ，ＭＰＵ，ＤＳＰ，ＡＳＩＣ，ＰＬＤ，ＦＰＧＡのうちの２種類以上の要素の組み合わせであってもよい。 The CPU 31 controls the PCIe bridge controller 3 as a whole. The CPU 31 may be a multiprocessor. Any one of MPU, DSP, ASIC, PLD and FPGA may be used instead of the CPU 31. Further, the CPU 31 may be a combination of two or more types of elements among a CPU, MPU, DSP, ASIC, PLD, and FPGA.

そして、ＣＰＵ３１がメモリ３２に格納されたソフトウェアプログラムを実行することで、ＰＣＩｅブリッジコントローラ３におけるプラットフォーム２間（プロセッサ２１間）のデータ転送を実現する。 Then, the CPU 31 executes the software program stored in the memory 32 to realize the data transfer between the platforms 2 (between the processors 21) in the PCIe bridge controller 3.

ＰＣＩｅブリッジコントローラ３は、プラットフォーム２間のデータ転送を高速化するためにＰＣＩｅを用い、図１５に示すように、各プラットフォーム２に備えられるプロセッサをそれぞれＲＣとして動作させ、デバイスとして動作するＥＰ間でデータ転送を実現する。 The PCIe bridge controller 3 uses PCIe in order to speed up data transfer between the platforms 2, and as shown in FIG. 15, each processor provided in each platform 2 operates as an RC, and between the EPs operating as devices. Achieve data transfer.

具体的には、情報処理システム１においては、各プラットフォーム２のプロセッサを、データ転送インタフェースとしてＰＣＩｅのＲＣとして動作させる。また、各プラットフォーム２（プロセッサ２１）に対して、ＰＣＩｅブリッジコントローラ３を、すなわち、各プラットフォーム２が接続されているスロット３４をＥＰとして動作させる。 Specifically, in the information processing system 1, the processor of each platform 2 operates as a PCIe RC as a data transfer interface. Further, the PCIe bridge controller 3, that is, the slot 34 to which each platform 2 is connected is operated as an EP for each platform 2 (processor 21).

ＰＣＩｅブリッジコントローラ３をプロセッサ２１に対してＥＰとして接続する手法としては、既知の種々の手法を用いて実現することができる。 As a method of connecting the PCIe bridge controller 3 to the processor 21 as an EP, various known methods can be used.

例えば、ＰＣＩｅブリッジコントローラ３は、プラットフォーム２との接続時に、ＥＰとして機能することを示す信号を当該プロセッサ２１に通知することで、ＥＰとしてプロセッサ２１と接続する。 For example, the PCIe bridge controller 3 connects to the processor 21 as an EP by notifying the processor 21 of a signal indicating that it functions as an EP when connecting to the platform 2.

ＰＣＩｅブリッジコントローラ３においてはＥＰｔｏＥＰ（ＥｎｄＰｏｉｎｔｔｏＥｎｄＰｏｉｎｔ）でデータをトンネリングさせて、複数のＲＣにデータを転送する。プロセッサ間の通信は、ＰＣＩｅのトランザクションが発生したときに論理的に接続され、１つのプロセッサにデータ転送が集中しないときは、それぞれのプロセッサ間で並行してデータ転送できる。 The PCIe bridge controller 3 tunnels data by EP to EP (End Point to End Point) and transfers the data to a plurality of RCs. Communication between processors is logically connected when a PCIe transaction occurs, and when data transfer is not concentrated on one processor, data can be transferred between the processors in parallel.

図１８においては、プラットフォーム２−２のコプロセッサＡとプラットフォーム２−３のコプロセッサＢとの間で通信を行なう例を示す。 FIG. 18 shows an example in which communication is performed between the coprocessor A of the platform 2-2 and the coprocessor B of the platform 2-3.

送信元のプラットフォーム２−２においては、ＲＣであるコプロセッサＡにおいて生成されたデータが、ソフトウェア，トランザクション層，データリンク層および物理層（ＰＨＹ）を、順次、転送され、物理層においてＰＣＩｅブリッジコントローラ３の物理層に転送される。 In the source platform 2-2, the data generated in the RC coprocessor A is sequentially transferred through the software, the transaction layer, the data link layer, and the physical layer (PHY), and the PCIe bridge controller is provided in the physical layer. 3 physical layer.

ＰＣＩｅブリッジコントローラ３においては、物理層，データリンク層，トランザクション層およびソフトウェアを順次、転送され、送信先のプラットフォーム２のＲＣに対応するＥＰにトンネリングによりデータが転送される。 In the PCIe bridge controller 3, the physical layer, the data link layer, the transaction layer, and the software are sequentially transferred, and the data is transferred to the EP corresponding to the RC of the destination platform 2 by tunneling.

すなわち、ＰＣＩｅブリッジコントローラ３においては、ＥＰ間でデータをトンネリングさせることで、一のＲＣ（コプロセッサ２１−２）から他のＲＣ（コプロセッサ２１−３）にデータが転送される。 That is, in the PCIe bridge controller 3, data is transferred from one RC (coprocessor 21-2) to another RC (coprocessor 21-3) by tunneling the data between EPs.

送信先のプラットフォーム２−３においては、ＰＣＩｅブリッジコントローラ３から転送されたデータが、物理層（ＰＨＹ），データリンク層，トランザクション層およびソフトウェアを、順次、転送され、送信先のプラットフォーム２−３のコプロセッサＢに転送される。 In the destination platform 2-3, the data transferred from the PCIe bridge controller 3 is sequentially transferred through the physical layer (PHY), the data link layer, the transaction layer, and the software, and the data of the destination platform 2-3 is transferred. Transferred to coprocessor B.

情報処理システム１において、プロセッサ２１間（プラットフォーム２間）の通信は、ＰＣＩｅのトランザクションが発生したときに論理的に接続される。 In the information processing system 1, the communication between the processors 21 (between the platforms 2) is logically connected when a PCIe transaction occurs.

ＰＣＩｅブリッジコントローラ３が有する８スロットのうちの一つに接続された特定のプロセッサ２１に対して複数の他のプロセッサ２１からのデータ転送が集中しないときは、異なる任意の複数組のそれぞれのプロセッサ２１間で並行してデータ転送してもよい。 When data transfer from a plurality of other processors 21 is not concentrated on a specific processor 21 connected to one of the eight slots of the PCIe bridge controller 3, a different arbitrary plurality of sets of the respective processors 21 are provided. Data may be transferred in parallel between them.

例えば、プラットフォーム２−１のメインプロセッサに対して、プラットフォーム２−２のコプロセッサＡおよびプラットフォーム２−３のコプロセッサＢのそれぞれが通信しようとする場合には、ＰＣＩｅブリッジコントローラ３は、コプロセッサＡ、コプロセッサＢの通信をシリアルに処理する。 For example, when the coprocessor A of the platform 2-2 and the coprocessor B of the platform 2-3 each try to communicate with the main processor of the platform 2-1, the PCIe bridge controller 3 determines that the coprocessor A , The processing of the coprocessor B is processed serially.

ただし、メインプロセッサ−コプロセッサＡ，コプロセッサＢ−コプロセッサＣ，コプロセッサＤ−コプロセッサＥのように、それぞれが異なるプロセッサ同士で通信し、特定のプロセッサに通信が集中しない場合には、ＰＣＩｅブリッジコントローラ３は、各プロセッサ２１間通信を並行して処理する。 However, when communication is performed between different processors such as main processor-coprocessor A, coprocessor B-coprocessor C, coprocessor D-coprocessor E, and communication is not concentrated on a specific processor, PCIe is used. The bridge controller 3 processes communication between the processors 21 in parallel.

図１９は実施形態の一例としての情報処理システム１におけるプロセッサ２１−８（プロセッサＧ）からの他のプロセッサ２１の見え方を例示する図であり、図２０はプロセッサ２１−５（プロセッサＤ）からの他のプロセッサ２１の見え方を例示する図である。 FIG. 19 is a diagram illustrating how the processor 21-8 (processor G) looks in the other processor 21 in the information processing system 1 as an example of the embodiment, and FIG. 20 illustrates the processor 21-5 (processor D) from the processor 21-8 (processor D). It is a figure which illustrates the appearance of the other processor 21 of FIG.

各プロセッサ２１間で通信が行なわれている状態においても、各プロセッサ２１上のＯＳ（例えばＷｉｎｄｏｗｓのデバイスマネージャ）からは、ＰＣＩｅブリッジコントローラ３しか見えず、接続先の他のプロセッサ２１を直接管理する必要がない。すなわち、ＰＣＩｅブリッジコントローラ３のデバイスドライバでＰＣＩｅブリッジコントローラ３の先に接続されたプロセッサ２１を管理すれば良い。 Even when communication is performed between the processors 21, only the PCIe bridge controller 3 can be seen from the OS (for example, the device manager of Windows) on each processor 21, and the other processor 21 of the connection destination is directly managed. No need. That is, the device driver of the PCIe bridge controller 3 may manage the processor 21 connected to the end of the PCIe bridge controller 3.

そのため、送信元、受信先それぞれのプロセッサ２１を動作させるためのデバイスドライバを準備する必要がなく、ＰＣＩｅブリッジコントローラ３のドライバでＰＣＩｅブリッジコントローラ３に対して通信処理を行なうだけでプロセッサ２１間の通信を行なうことができる。 Therefore, it is not necessary to prepare a device driver for operating the processor 21 of each of the transmission source and the reception destination, and communication between the processors 21 can be performed only by performing communication processing to the PCIe bridge controller 3 by the driver of the PCIe bridge controller 3. Can be done.

上述の如く構成された実施形態の一例としての情報処理システム１におけるＰＣＩｅブリッジコントローラ３を介したプロセッサ２１間のデータ転送方法を、図２１を用いて説明する。 A data transfer method between the processors 21 via the PCIe bridge controller 3 in the information processing system 1 as an example of the embodiment configured as described above will be described with reference to FIG.

この図２１に示す例においては、スロット＃０に接続されたプラットフォーム２−１からスロット＃４に接続されたプラットフォーム２−５にデータを転送する場合について説明する。 In the example shown in FIG. 21, a case where data is transferred from the platform 2-1 connected to the slot #0 to the platform 2-5 connected to the slot #4 will be described.

データ送信元のプラットフォーム２−１は、ソフトウェア等によって送信されるデータ（以下、送信データという場合がある）を、プラットフォーム２−１に備えられるストレージ２３等からプラットフォーム２−１のメモリ領域３５に格納する（符号Ｐ１参照）。メモリ領域３５は、通信バッファ２２１の一部であってもよい。メモリ領域３５は、プロットフォーム２のそれぞれにメモリ２２等に同じ大きさで設けられた領域である。メモリ領域３５は、スロット数に応じて分割されている。メモリ領域３５の分割された記憶領域は、いずれかのスロットに対応付けられている。例えば、メモリ領域３５内にＳｌｏｔ＃０で示す記憶領域は、Ｓｌｏｔ＃０に接続されたプラットフォーム２−１に対応付けられ、メモリ領域３５内にＳｌｏｔ＃４で示す記憶領域は、Ｓｌｏｔ＃０に接続されたプラットフォーム２−５に対応付けられている。プラットフォーム２−１は、メモリ領域３５のうち、送信先のスロットに割り当てられた領域（ここでは、Ｓｌｏｔ＃４）に送信データを格納する。 The platform 2-1 of the data transmission source stores the data transmitted by software (hereinafter sometimes referred to as transmission data) from the storage 23 or the like provided in the platform 2-1 to the memory area 35 of the platform 2-1. (See P1). The memory area 35 may be a part of the communication buffer 221. The memory area 35 is an area provided in each of the plot forms 2 in the memory 22 and the like with the same size. The memory area 35 is divided according to the number of slots. The divided storage areas of the memory area 35 are associated with any of the slots. For example, the storage area indicated by Slot#0 in the memory area 35 is associated with the platform 2-1 connected to Slot#0, and the storage area indicated by Slot#4 in the memory area 35 is changed to Slot#0. It is associated with the connected platform 2-5. The platform 2-1 stores the transmission data in the area (here, Slot#4) assigned to the destination slot in the memory area 35.

ブリッジドライバ２０は、プラットフォーム２のメモリ領域３５の記憶領域に基づいて、送信先のスロットを示すスロット情報と、送信先のメモリ領域３５における分割領域内におけるアドレスを示すアドレス情報とを取得または生成する（符号Ｐ２参照）。 The bridge driver 20 acquires or generates, based on the storage area of the memory area 35 of the platform 2, slot information indicating a destination slot and address information indicating an address in a divided area in the destination memory area 35. (See symbol P2).

送信元ＥＰにおいて、ブリッジドライバ２０は、スロット情報と、アドレス情報と、送信データとを含む転送データを中継装置３に渡す（符号Ｐ３）。これにより、ＰＣＩｅブリッジコントローラ３は、スロット情報に基づいてＥＰｔｏＥＰにより送信元のスロットと送信先のスロットとを接続することにより、転送データを送信先のプラットフォーム２−４転送する（符号Ｐ４参照）。送信先のブリッジドライバ２０は、スロット情報及びアドレス情報に基づいて、送信先のプラットフォーム２のメモリ領域３５のスロット＃４に対応する記憶領域内のアドレス情報が示すアドレスの領域に送信データ（または転送データ）を格納する（符号Ｐ５参照）。 In the transmission source EP, the bridge driver 20 passes transfer data including slot information, address information, and transmission data to the relay device 3 (reference P3). As a result, the PCIe bridge controller 3 transfers the transfer data to the destination platform 2-4 by connecting the transmission source slot and the transmission destination slot by EPtoEP based on the slot information (see reference numeral P4). Based on the slot information and the address information, the bridge driver 20 of the transmission destination transmits the transmission data (or the transfer data to the area of the address indicated by the address information in the storage area corresponding to the slot #4 of the memory area 35 of the platform 2 of the transmission destination). Data) is stored (see reference numeral P5).

送信先プラットフォーム２において、例えば、プログラムが、メモリ領域３５に格納された送信データを読み出して、メモリ（ローカルメモリ）２２やストレージ２３に移動させる（符号Ｐ６，Ｐ７参照）。 In the destination platform 2, for example, the program reads the transmission data stored in the memory area 35 and moves it to the memory (local memory) 22 or the storage 23 (see symbols P6 and P7).

以上のようにして、転送元のプラットフォーム２−１から転送先のプラットフォーム２−５にデータ（転送データ）が転送される。 As described above, the data (transfer data) is transferred from the transfer source platform 2-1 to the transfer destination platform 2-5.

このように、情報処理システム１においては、ＰＣＩｅブリッジコントローラ３において、当該ＰＣＩｅブリッジコントローラ３内のＥＰ間でデータ転送を媒介する。これにより、ＰＣＩｅブリッジコントローラ３に接続された複数のＲＣ（プロセッサ２１）間でのデータ転送を実現することができる。 As described above, in the information processing system 1, the PCIe bridge controller 3 mediates data transfer between the EPs in the PCIe bridge controller 3. As a result, data transfer can be realized between the plurality of RCs (processors 21) connected to the PCIe bridge controller 3.

すなわち、各プロセッサ２１をＰＣＩｅのＲＣとして独立動作させ、ＰＣＩｅブリッジコントローラ３において、それぞれのプロセッサ２１に接続するデバイスをＥＰとして接続し、Ｅ間でデータ転送を行なう。これにより、デバイスドライバに起因する問題を回避し、高速データ転送を１つのシステムとして動作させることができる。 That is, each processor 21 is independently operated as RC of PCIe, and in the PCIe bridge controller 3, a device connected to each processor 21 is connected as EP, and data transfer between E is performed. Thereby, the problem caused by the device driver can be avoided, and the high speed data transfer can be operated as one system.

また、ＰＣＩｅの規格に適合したデータ通信機能を有してさえいれば異なるプロセッサ２１間でのデータ転送が可能となるため、デバイスドライバの有無や、サポートＯＳ等を気にすることなく使用するプロセッサ２１の選択肢を広げることが可能となる。 Further, since it is possible to transfer data between different processors 21 as long as it has a data communication function conforming to the PCIe standard, a processor to be used without having to worry about the presence or absence of a device driver and a support OS. It is possible to expand the 21 options.

各プロセッサ２１はＥＰとなるＰＣＩｅブリッジコントローラ３を介して接続されるため、ＥＰの先のＲＣのデバイスドライバを追加する必要がない。従って、デバイスドライバの開発が不要であるとともに、デバイスドライバを追加することに起因する不具合が発生することもない。 Since each processor 21 is connected via the PCIe bridge controller 3 serving as an EP, it is not necessary to add a device driver for the RC at the EP end. Therefore, it is not necessary to develop a device driver, and a defect caused by adding a device driver does not occur.

情報処理システム１においては、ＡＲＭプロセッサやＦＰＧＡ等の一般的なプロセッサはＲＣとして動作することが求められるため、情報処理システム１のプロセッサ２１として容易に追加することができる。 In the information processing system 1, a general processor such as an ARM processor or an FPGA is required to operate as RC, and thus can be easily added as the processor 21 of the information processing system 1.

ＰＣＩｅブリッジコントローラ３においては、ＰＣＩｅでの接続（通信）がされるので、イーサネットでは実現できない高速転送を実現することができる。また、４Ｋ，８Ｋ等の高精細映像のプロセッサ間送受信、大規模なビッグデータの並列計算等も行なうことができる。 Since the PCIe bridge controller 3 is connected (communicated) by PCIe, high-speed transfer that cannot be realized by Ethernet can be realized. It is also possible to perform transmission/reception of high-definition video of 4K, 8K, etc. between processors and parallel calculation of large-scale big data.

また、画像処理やデータ検索等の各機能に特化した専用プロセッサを接続することもできるので、安価に機能追加、性能向上を行なうことができる。 Further, since a dedicated processor specialized for each function such as image processing and data search can be connected, it is possible to add functions and improve performance at low cost.

さらに、情報処理システム１においては、システムの仮想化等を行なう必要もなく、システムの仮想化を行なうことで生じるシステム性能の低下が生じることもない。従って、情報処理システム１を、ＡＩ推論や画像処理といった高負荷な演算を用途とするシステムに適用することもできる。 Furthermore, in the information processing system 1, it is not necessary to perform system virtualization or the like, and the system performance does not deteriorate due to system virtualization. Therefore, the information processing system 1 can also be applied to a system intended for high-load computation such as AI inference and image processing.

そして、開示の技術は上述した実施形態に限定されるものではなく、本実施形態の趣旨を逸脱しない範囲で種々変形して実施することができる。本実施形態の各構成および各処理は、必要に応じて取捨選択することができ、あるいは適宜組み合わせてもよい。 The disclosed technique is not limited to the above-described embodiment, and various modifications can be implemented without departing from the spirit of the present embodiment. Each configuration and each process of the present embodiment can be selected or omitted as necessary, or may be appropriately combined.

例えば、図１７に示す構成においては、ＰＣＩｅブリッジコントローラ３は８つのスロット３４−１〜３４−８を有しているが、これに限定されるものではなく種々変形して実施することができる。すなわち、ＰＣＩｅブリッジコントローラ３は７つ以下もしくは９つ以上のスロット３４をそなえてもよい。 For example, in the configuration shown in FIG. 17, the PCIe bridge controller 3 has eight slots 34-1 to 34-8, but it is not limited to this and various modifications can be implemented. That is, the PCIe bridge controller 3 may have seven or less slots or nine or more slots 34.

上述の実施形態では、各部のＩ／ＯインタフェースとしてＰＣＩｅを例に挙げて説明したが、Ｉ／ＯインタフェースはＰＣＩｅに限定されない。例えば、各部のＩ／Ｏインタフェースは、データ転送バスによって、デバイス（周辺制御コントローラ）とプロセッサとの間でデータ転送を行える技術であればよい。データ転送バスは、１個の筐体等に設けられたローカルな環境（例えば、１つのシステムまたは１つの装置）で高速にデータを転送できる汎用のバスであってよい。Ｉ／Ｏインタフェースは、パラレルインタフェース及びシリアルインタフェースのいずれであってもよい。 In the above-described embodiment, PCIe is taken as an example of the I/O interface of each unit, but the I/O interface is not limited to PCIe. For example, the I/O interface of each unit may be any technology capable of transferring data between the device (peripheral controller) and the processor by the data transfer bus. The data transfer bus may be a general-purpose bus that can transfer data at high speed in a local environment (for example, one system or one device) provided in one housing or the like. The I/O interface may be either a parallel interface or a serial interface.

Ｉ／Ｏインタフェースは、ポイント・ツー・ポイント接続ができ、データをパケットベースでシリアル転送可能な構成でよい。尚、Ｉ／Ｏインタフェースは、シリアル転送の場合、複数のレーンを有してよい。Ｉ／Ｏインタフェースのレイヤ構造は、パケットの生成及び復号を行うトランザクション層と、エラー検出等を行うデータリンク層と、シリアルとパラレルとを変換する物理層とを有してよい。また、Ｉ／Ｏインタフェースは、階層の最上位であり１または複数のポートを有するルート・コンプレックス、Ｉ／Ｏデバイスであるエンド・ポイント、ポートを増やすためのスイッチ、及び、プロトコルを変換するブリッジ等を含んでよい。Ｉ／Ｏインタフェースは、送信するデータとクロック信号とをマルチプレクサによって多重化して送信してもよい。この場合、受信側は、デマルチプレクサでデータとクロック信号を分離してよい。 The I/O interface may have a structure capable of point-to-point connection and serial transfer of data on a packet basis. The I/O interface may have a plurality of lanes in the case of serial transfer. The layer structure of the I/O interface may include a transaction layer that generates and decodes a packet, a data link layer that performs error detection and the like, and a physical layer that converts serial and parallel. Further, the I/O interface is the highest level of the hierarchy and has a root complex having one or more ports, end points which are I/O devices, switches for increasing ports, bridges for converting protocols, etc. May be included. The I/O interface may multiplex data to be transmitted and a clock signal by a multiplexer and transmit the multiplexed data. In this case, the receiving side may separate the data and clock signals with a demultiplexer.

１情報処理システム
１００情報処理装置
１３２送信部
１３３受信部
１３６測定部
１３７更新部
３００推論処理装置
３０２コプロセッサ
３０３推論アプリケーション
３０４モデルファイル
３０５，３０５ｉミドルウェア
１３１２ａ計算部
１３１２ｂ選択部
1 Information Processing System 100 Information Processing Device 132 Transmission Unit 133 Reception Unit 136 Measurement Unit 137 Update Unit 300 Inference Processing Device 302 Coprocessor 303 Inference Application 304 Model File 305, 305i Middleware 1312a Calculation Unit 1312b Selection Unit

本願の開示する情報処理装置は、一つの態様において、Ｎを２以上の整数とするとき、処理状況に関するリソース情報をＮ個のプロセッサのそれぞれから受信する受信部と、前記受信されたＮ個のプロセッサのリソース情報を、自プロセッサのリソース情報の値を処理時間に変換する変換パラメータへ自プロセッサのリソース情報の値を示す変数が乗算された１個の積と他プロセッサのリソース情報の値からの自プロセッサでの処理時間への影響度合いを示す寄与率パラメータへ他プロセッサのリソース情報の値を示す変数が（Ｎ−１）個の他プロセッサのそれぞれについて乗算された（Ｎ−１）個の積とが互いに足し合わせされた計算式に適用して、アプリケーションによる処理要求に対する処理時間を前記Ｎ個のプロセッサのそれぞれについて計算する計算部と、前記Ｎ個のプロセッサの処理時間に基づいて、前記Ｎ個のプロセッサから１つのプロセッサを選択する選択部と、前記処理要求を前記１つのプロセッサへ送信する送信部とを有する。 In one aspect, the information processing apparatus disclosed in the present application, when N is an integer of 2 or more, a receiving unit that receives resource information regarding a processing status from each of N processors, and the received N pieces of resource information. From the resource information of the processor, the conversion parameter for converting the value of the resource information of the own processor into the processing time is multiplied by a variable indicating the value of the resource information of the own processor and the value of the resource information of another processor. (N-1) products obtained by multiplying the contribution rate parameter indicating the degree of influence on the processing time in the own processor by the variable indicating the value of the resource information of the other processor for each of the (N-1) other processors. Based on the processing time of each of the N processors, the calculation unit calculating the processing time of the processing request by the application for each of the N processors. It has a selection part which selects one processor from this processor, and a transmission part which transmits the processing request to the one processor.

Claims

When N is an integer of 2 or more, a receiving unit that receives resource information regarding the processing status from each of the N processors,
A calculation unit that applies the received resource information of the N processors to a calculation formula to calculate a processing time for a processing request by an application for each of the N processors;
A selection unit that selects one processor from the N processors based on the processing time of the N processors;
A transmitter for transmitting the processing request to the one processor;
Information processing device equipped with.

A measuring unit that measures a processing time by the one processor for the processing request;
An update unit that performs multiple regression analysis based on the received resource information of the N processors and the measured processing time, and updates the calculation formula,
The information processing apparatus according to claim 1, further comprising:

The calculation formula includes N parameters corresponding to the N processors,
The information processing apparatus according to claim 2, wherein the updating unit updates the N parameters in the calculation formula.

The processing request includes a processing identifier for identifying processing content,
The calculation unit identifies the resource information of the N processors by the process identifier of N×M calculation formulas associated with a combination of different N processors and different M process contents. The information processing apparatus according to any one of claims 1 to 3, wherein the information processing apparatus is applied to N of the calculation formulas corresponding to the processed content.

The information processing apparatus according to any one of claims 1 to 4, wherein each of the N processors is an inference processing apparatus that performs inference processing according to the processing request.

An information processing apparatus according to any one of claims 1 to 5,
When N is an integer of 2 or more, N processors, each of which is communicably connected to the information processing device,
Information processing system equipped with.