JP2019087039A

JP2019087039A - Task management system, task management method, and task management program

Info

Publication number: JP2019087039A
Application number: JP2017214872A
Authority: JP
Inventors: 和正松原; Kazumasa Matsubara; 光雄早坂; Mitsuo Hayasaka
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2017-11-07
Filing date: 2017-11-07
Publication date: 2019-06-06
Anticipated expiration: 2037-11-07
Also published as: US20190138358A1; US10915362B2; JP7080033B2

Abstract

To enable reduction in a data communication traffic between nodes.SOLUTION: A computer system 10 includes a plurality of task processing nodes 200 capable of executing tasks and a task management node 100 determining that task processing node 200 to which a new task is to be assigned, wherein each of the task processing nodes 200 includes a memory 202 capable of caching data which is used by an assignment task assigned to the local task processing node, the task management node 100 stores task assignment history information 112 including a correspondence relationship between an assignment task and that task processing node 200 to which the assignment task is assigned, and for which the data to be used is cached, and a CPU 101 of the task management node 100 is configured to determine a similarity between a new task and the assignment task, determine that task processing node 200 to which the new task is to be assigned from the task processing nodes 200 included in the task assignment history information 112 based on the similarity, and assign the new task.SELECTED DRAWING: Figure 1

Description

本発明は、複数のノードの中からタスクを割り当てるノードを決定するタスク管理システム等に関する。 The present invention relates to a task management system or the like that determines a node to which a task is to be assigned among a plurality of nodes.

現在、ＡＩ（ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ）を活用した判断支援サービスを提供する市場が拡大している。自然言語処理系ＡＩを活用するサービスでは、大容量のテキストデータを扱うため、複数のノードによるデータの分散管理と分散処理とが行われている。 Currently, the market for providing decision support services utilizing AI (Artificial Intelligence) is expanding. In a service that utilizes a natural language processing system AI, in order to handle a large amount of text data, distributed management and distributed processing of data by a plurality of nodes are performed.

近年、ＡＩ処理の目的で、タスクを分散処理する際、大容量データを複数ノードに分散させて管理する方法が用いられている。この方法を用いた場合には、タスクを処理する際、タスクが必要とするデータを分散管理されたノードから取得する必要があり、タスクで必要なデータが多くなる程、ノード間の通信が増加し、通信がボトルネックとなって処理の速度が低下する問題がある。 In recent years, for distributed processing of tasks for the purpose of AI processing, a method of distributing large-volume data to a plurality of nodes and managing them has been used. When this method is used, when processing a task, it is necessary to acquire data required by the task from the distributed managed nodes, and as the data required by the task increases, communication between nodes increases. And there is a problem that communication becomes a bottleneck and processing speed is reduced.

これに対して、処理を高速化するため、タスクが使用するデータが最も多く保存されているノードに近い、タスクを処理するノードにタスクを配置し、データ取得時のノード間通信を削減する方法が用いられている。例えば、特許文献１には、ノード間の実際の距離を用いてタスクを割り当てるノードを決定する技術が開示されている。 On the other hand, in order to speed up processing, a task is placed on a node that processes a task, which is close to the node where data used most by the task is stored the most, and the method of reducing communication between nodes at the time of data acquisition Is used. For example, Patent Document 1 discloses a technique for determining a node to which a task is to be assigned using an actual distance between nodes.

米国特許出願公開第２０１４／０３７２６１１号明細書US Patent Application Publication No. 2014/0372611

上記したように、複数のノードにデータを分散管理し、複数のノードで分散処理する場合には、ノード間のデータ量を削減することが要請される。 As described above, when data is distributed to and managed by a plurality of nodes and distributed processing is performed by a plurality of nodes, it is required to reduce the amount of data between the nodes.

例えば、特許文献１の技術では、タスクが使用するデータが最も多くあるノードを判断するために、タスクが使用するデータを調査するための処理を行う必要がある。さらに、使用するデータが増加する程、タスクを割り当てるノードの算出には時間がかかる。このため、データを取得するためのノード間通信が多発し、性能が低下する。 For example, in the technique of Patent Document 1, it is necessary to perform processing for investigating data used by a task in order to determine a node that uses most data used by the task. Furthermore, as data to be used increases, it takes time to calculate a node to which a task is assigned. For this reason, communication between nodes for acquiring data occurs frequently, and the performance is degraded.

本発明は、上記事情に鑑みなされたものであり、その目的は、ノード間でのデータ通信量を低減することのできる技術を提供することにある。 The present invention has been made in view of the above circumstances, and an object thereof is to provide a technology capable of reducing the amount of data communication between nodes.

上記目的を達成するため、一観点に係るタスク管理システムは、タスクを実行可能な複数のタスク処理ノードと、新規タスクを割り当てるタスク処理ノードを決定するタスク管理ノードとを有するタスク管理システムであって、複数のタスク処理ノードのそれぞれは、自身に割り当てられたタスクである割当タスクが使用するデータをキャッシュ可能なメモリを備え、タスク管理ノードは、割当タスクと、割当タスクが割り当てられ、且つ、割当タスクが使用するデータがキャッシュされているタスク処理ノードとの対応関係を含むタスク割当情報を記憶し、タスク管理ノードのプロセッサは、新規タスクと、割当タスクとの類似度を判定し、類似度に基づいて、タスク割当情報に含まれているタスク処理ノードの中から新規タスクを割り当てるタスク処理ノードを決定し、決定したタスク処理ノードに新規タスクを割り当てる。 In order to achieve the above object, a task management system according to one aspect is a task management system having a plurality of task processing nodes capable of executing tasks and a task management node for determining a task processing node to which a new task is to be assigned. Each of the plurality of task processing nodes has a memory capable of caching data used by the assignment task which is a task assigned to itself, and the task management node is assigned the assignment task and the assignment task, and is assigned The task management node stores the task assignment information including the correspondence with the task processing node in which the data used by the task is cached, and the processor of the task management node determines the similarity between the new task and the assignment task. Assign a new task from the task processing nodes included in the task assignment information based on the That task processing node determines and assigns a new task to the determined task processing node.

本発明によれば、ノード間でのデータ通信量を低減することができる。 According to the present invention, it is possible to reduce the amount of data communication between nodes.

図１は、第１実施形態に係る計算機システムの全体構成図である。FIG. 1 is an overall configuration diagram of a computer system according to the first embodiment. 図２は、第１実施形態に係るタスク割当履歴情報の構成を示す図である。FIG. 2 is a diagram showing the configuration of task assignment history information according to the first embodiment. 図３は、第１実施形態に係るキャッシュ管理情報の構成を示す図である。FIG. 3 is a diagram showing the configuration of cache management information according to the first embodiment. 図４は、第１実施形態に係るタスク管理処理のフローチャートである。FIG. 4 is a flowchart of task management processing according to the first embodiment. 図５は、第１実施形態に係るタスク実行処理のフローチャートである。FIG. 5 is a flowchart of task execution processing according to the first embodiment. 図６は、第１実施形態に係るタスク単位のキャッシュ内整理処理のフローチャートである。FIG. 6 is a flowchart of the in-cache reorganization processing in units of tasks according to the first embodiment. 図７は、第１実施形態に係る削除対象タスク決定処理のフローチャートである。FIG. 7 is a flowchart of deletion target task determination processing according to the first embodiment. 図８は、第１実施形態に係るキャッシュデータ単位のキャッシュ内整理処理のフローチャートである。FIG. 8 is a flowchart of the cache data unit cache internal processing according to the first embodiment. 図９は、第２実施形態に係る計算機システムの全体構成図である。FIG. 9 is an overall configuration diagram of a computer system according to the second embodiment.

実施形態について、図面を参照して説明する。なお、以下に説明する実施形態は特許請求の範囲に係る発明を限定するものではなく、また実施形態の中で説明されている諸要素及びその組み合わせの全てが発明の解決手段に必須であるとは限らない。 Embodiments will be described with reference to the drawings. The embodiments described below do not limit the invention according to the claims, and all of the elements described in the embodiments and their combinations are essential to the solution means of the invention. There is no limit.

以下の説明では、実施形態に係る情報を、例えば、テーブルよるデータ構造で表現して説明するが、これら情報は必ずしもテーブルによるデータ構造で表現されていなくてもよく、「リスト」、「ＤＢ（データベース）」、「キュー」等のデータ構造やそれ以外で表現されていても良い。そのため、データ構造に依存しないことを示すために「テーブル」、「リスト」、「ＤＢ」、「キュー」等については、単に「情報」と呼ぶこともできる。また、各情報の内容を説明する際に、「識別情報」、「識別子」、「名」、「名前」、「ＩＤ」「番号」という表現を用いることが可能であり、これらについてはお互いに置換が可能である。 In the following description, the information according to the embodiment will be described by, for example, expressing it as a data structure using a table, but such information may not necessarily be expressed using a data structure using a table, “list”, “DB ( It may be expressed by a data structure such as “database” or “queue” or the like. Therefore, "table", "list", "DB", "queue" and the like can be simply referred to as "information" in order to indicate that they do not depend on the data structure. In addition, when describing the contents of each information, it is possible to use the expressions "identification information", "identifier", "name", "name", "ID" and "number", which are mutually Substitution is possible.

また、以下の説明では「プログラム」（例えば、プログラムモジュール）を動作の主体として説明を行うが、プログラムはプロセッサによって実行されることで定められた処理をメモリ及び通信ポート（通信制御装置）を用いながら行うため、プロセッサを主体とした説明としてもよく、コントローラを主体とした説明としてもよい。 In the following description, although the “program” (for example, a program module) is described as the subject of the operation, the program uses a memory and a communication port (communication control device) to execute processing determined by being executed by the processor. However, the description may be based on the processor or may be based on the controller.

また、プログラムを動作の主体として開示された処理は、ノード等の計算機（コンピュータ、情報処理装置）が行う処理としてもよい。プログラムの一部または全ては専用ハードウェアで実現してもよい。各種プログラムはプログラム配布サーバや記憶メディア（例えば、不揮発性記録メディア）によって各ノードにインストールされてもよい。 Further, the processing disclosed with the program as the subject of operation may be processing performed by a computer (computer, information processing apparatus) such as a node. Some or all of the programs may be realized by dedicated hardware. Various programs may be installed in each node by a program distribution server or storage medium (for example, non-volatile storage medium).

まず、第１実施形態に係る計算機システムの概要について説明する。 First, an overview of a computer system according to the first embodiment will be described.

第１実施形態に係る計算機システムは、類似するタスクを検索し、タスクの割り当てを行うタスク管理ノード１００（図１参照）と、タスク処理を実行するとともに、タスク処理で必要なデータを管理する複数のタスク処理ノード２００（図２参照）とを含む。 The computer system according to the first embodiment searches for similar tasks and assigns task management node 100 (see FIG. 1) and executes a task process, and also manages a plurality of data necessary for the task process. And the task processing node 200 (see FIG. 2).

タスク管理ノードは、タスク処理要求を受け取って、タスク内容を解析し、タスクの履歴情報と類似検索を行い、タスク間の類似率等を条件にタスクを割り当てるタスク処理ノードを決定する処理を実行する。また、タスク処理ノードは、割り当てられたタスクを処理するとともに、タスクに使用するデータのキャッシュを管理する処理を実行する。 The task management node receives the task processing request, analyzes the task content, performs a similarity search with task history information, and executes a process of determining a task processing node to which a task is to be allocated on the condition of similarity between tasks etc. . The task processing node also executes processing for processing assigned tasks and managing cache of data used for the tasks.

次に、第１実施形態に係る計算機システムについて詳細に説明する。 Next, the computer system according to the first embodiment will be described in detail.

図１は、第１実施形態に係る計算機システムの全体構成図である。 FIG. 1 is an overall configuration diagram of a computer system according to the first embodiment.

タスク管理システムの一例としての計算機システム１０は、クライアント５０と、タスク管理ノード１００と、複数のタスク処理ノード２００とを備える。クライアント５０と、タスク管理ノード１００と、タスク処理ノード２００とは、ネットワーク３００を介して接続されている。ネットワーク３００は、例えば、有線ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、無線ＬＡＮ、ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）や、これらを組み合わせたものであってもよい。 A computer system 10 as an example of a task management system includes a client 50, a task management node 100, and a plurality of task processing nodes 200. The client 50, the task management node 100, and the task processing node 200 are connected via the network 300. The network 300 may be, for example, a wired LAN (Local Area Network), a wireless LAN, a WAN (Wide Area Network), or a combination thereof.

クライアント５０は、例えば、ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）によって構成され、タスクの実行を要求するユーザにより利用される。クライアント５０は、例えば、ユーザからのタスク処理要求をタスク管理ノード１００に送信し、タスク管理ノード１００からタスク処理結果を受信し、例えば、ディスプレイ等に表示する。 The client 50 is configured by, for example, a PC (Personal Computer), and is used by a user who requests execution of a task. The client 50 transmits, for example, a task processing request from a user to the task management node 100, receives a task processing result from the task management node 100, and displays the result on, for example, a display or the like.

タスク管理ノード１００は、例えば、サーバ等の計算機で構成され、プロセッサの一例としてのＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１０１と、メモリ１０２と、ネットワークインターフェース１０３とを有する。 The task management node 100 is configured of, for example, a computer such as a server, and includes a central processing unit (CPU) 101 as an example of a processor, a memory 102, and a network interface 103.

ネットワークインターフェース１０３は、例えば、有線ＬＡＮカードや無線ＬＡＮカードなどのインターフェースであり、ネットワーク３００を介して他の装置（例えば、クライアント５０、タスク処理ノード２００等）と通信する。 The network interface 103 is, for example, an interface such as a wired LAN card or a wireless LAN card, and communicates with other devices (for example, the client 50, the task processing node 200, and the like) via the network 300.

ＣＰＵ１０１は、メモリ１０２に格納されたプログラム（モジュール）を実行することにより、各種処理を実行する。 The CPU 101 executes various processes by executing programs (modules) stored in the memory 102.

メモリ１０２は、例えば、ＲＡＭ（ＲＡＮＤＯＭＡＣＣＥＳＳＭＥＭＯＲＹ）であり、ＣＰＵ１０１で実行されるプログラムや、必要な情報を記憶する。 The memory 102 is, for example, a RAM (RANDOM ACCESS MEMORY), and stores a program executed by the CPU 101 and necessary information.

メモリ１０２は、プログラムの一例としてのタスク管理モジュール１１１と、タスク割当情報の一例としてのタスク割当履歴情報１１２とを格納する。 The memory 102 stores a task management module 111 as an example of a program and task allocation history information 112 as an example of task allocation information.

タスク管理モジュール１１１は、タスク割当モジュール１３１と、タスク解析モジュール１３２と、類似検索モジュール１３３とを含む。タスク割当モジュール１３１は、ＣＰＵ１０１に、タスクの受付およびタスクの割当に関する処理を実行させるプログラムモジュールである。タスク解析モジュール１３２は、ＣＰＵ１０１に、タスク内容の解析、タスクの数値化等の処理を実行させるプログラムモジュールである。類似検索モジュール１３３は、ＣＰＵ１０１に、数値化された新規タスクの情報と既に割り当てられてタスクの情報とを比較し、類似率を評価する処理を実行させるプログラムモジュールである。なお、これらモジュールによる処理の詳細については後述する。 The task management module 111 includes a task assignment module 131, a task analysis module 132, and a similarity search module 133. The task assignment module 131 is a program module that causes the CPU 101 to execute processing relating to task acceptance and task assignment. The task analysis module 132 is a program module that causes the CPU 101 to execute processing such as analysis of task contents and digitization of tasks. The similarity search module 133 is a program module that causes the CPU 101 to execute a process of comparing the information of the new task quantified and the information of the task that has already been allocated and evaluating the similarity rate. The details of processing by these modules will be described later.

タスク割当履歴情報１１２は、タスクの割当に関する履歴の情報である。なお、タス割当履歴情報１１２の詳細については、後述する。 The task assignment history information 112 is information on a history of task assignment. The details of the task allocation history information 112 will be described later.

タスク処理ノード２００は、例えば、サーバ等の計算機で構成され、プロセッサの一例としてのＣＰＵ２０１と、メモリ２０２と、記憶装置２０３と、ネットワークインターフェース２０４とを有する。 The task processing node 200 is configured by, for example, a computer such as a server, and includes a CPU 201 as an example of a processor, a memory 202, a storage device 203, and a network interface 204.

ネットワークインターフェース２０４は、例えば、有線ＬＡＮカードや無線ＬＡＮカードなどのインターフェースであり、ネットワーク３００を介して他の装置（例えば、クライアント５０、タスク管理ノード１００、他のタスク処理ノード２００等）と通信する。 The network interface 204 is, for example, an interface such as a wired LAN card or a wireless LAN card, and communicates with other devices (for example, the client 50, the task management node 100, another task processing node 200, etc.) via the network 300. .

ＣＰＵ２０１は、メモリ２０２に格納されたプログラム（モジュール）を実行することにより、各種処理を実行する。 The CPU 201 executes various processes by executing a program (module) stored in the memory 202.

メモリ２０２は、例えば、ＲＡＭであり、ＣＰＵ２０１で実行されるプログラムや、必要な情報を記憶する。 The memory 202 is, for example, a RAM, and stores a program executed by the CPU 201 and necessary information.

メモリ２０２は、タスク処理モジュール２１１、キャッシュ管理モジュール２１２、データ管理モジュール２１３、キャッシュ管理情報２１４を格納する。 The memory 202 stores a task processing module 211, a cache management module 212, a data management module 213, and cache management information 214.

タスク処理モジュール２１１は、ＣＰＵ２０１に、タスク管理ノード１００から受信したタスクを処理し、タスク管理ノード１００にタスクの結果および完了通知を行う処理を実行させるプログラムモジュールである。タスク処理モジュール２１１は、ＣＰＵ２０１に、タスク処理として、ＡＩによる自然言語処理を実行させるようにしてもよい。キャッシュ管理モジュール２１２は、ＣＰＵ２０１に、タスクを処理する際にキャッシュ２１５にキャッシュされたデータを管理させ、キャッシュ２１５の容量不足の際に整理を行わせる処理（キャッシュ内整理処理）を実行させるプログラムモジュールである。データ管理モジュール２１３は、ＣＰＵ２０１に、複数のタスク処理ノード２００の記憶装置２０３内のデータ２３１の配置状況を管理させるプログラムモジュールである。このデータ管理モジュール２１３をＣＰＵ２１０が実行することにより、記憶装置２０３内のデータ２３１を単体、もしくは複数のタスク処理ノード２００の記憶装置２０３により分散管理することができる。 The task processing module 211 is a program module that causes the CPU 201 to process the task received from the task management node 100 and execute the process of notifying the task management node 100 of the result of the task and the completion. The task processing module 211 may cause the CPU 201 to execute natural language processing by AI as task processing. The cache management module 212 is a program module that causes the CPU 201 to manage data cached in the cache 215 when processing a task, and to execute processing (in-cache organization processing) that performs organization when the capacity of the cache 215 is insufficient. It is. The data management module 213 is a program module that causes the CPU 201 to manage the arrangement status of the data 231 in the storage devices 203 of the plurality of task processing nodes 200. By the CPU 210 executing the data management module 213, the data 231 in the storage device 203 can be distributed and managed by the storage devices 203 of one or more task processing nodes 200.

キャッシュ管理情報２１４は、キャッシュ２１５におけるキャッシュ状況及びタスクが使用したデータの一覧に関する情報である。なお、キャッシュ管理情報２１４の詳細については、後述する。 The cache management information 214 is information on the cache status in the cache 215 and a list of data used by the task. The details of the cache management information 214 will be described later.

また、メモリ２０２は、タスク処理に利用するために自身のタスク処理ノード２００又は他のタスク処理ノード２００の記憶装置２０３から取得したデータ（キャッシュデータ）２２１を格納するための領域であるキャッシュ２１５を有する。キャッシュ２１５の容量は、予め所定の容量に設定されている。キャッシュデータ２２１は、キャッシュ管理モジュール２１２によりキャッシュ２１５から削除されるまで、キャッシュ２１５に保持された状態が維持される。 The memory 202 is also an area for storing data (cache data) 221 acquired from the storage device 203 of its own task processing node 200 or another task processing node 200 for use in task processing. Have. The capacity of the cache 215 is set in advance to a predetermined capacity. The cache data 221 is maintained in the cache 215 until it is deleted from the cache 215 by the cache management module 212.

記憶装置２０３は、例えば、ハードディスクやフラッシュメモリなどであり、ＣＰＵ２０１で実行されるプログラムや、ＣＰＵ２０１に利用されるデータを記憶する。本実施形態では、記憶装置２０３は、タスクの処理で使用されるデータ２３１を格納する。 The storage device 203 is, for example, a hard disk or a flash memory, and stores programs executed by the CPU 201 and data used by the CPU 201. In the present embodiment, the storage device 203 stores data 231 used in processing of tasks.

次に、タスク割当履歴情報１１２について説明する。 Next, the task assignment history information 112 will be described.

図２は、第１実施形態に係るタスク割当履歴情報の構成を示す図である。 FIG. 2 is a diagram showing the configuration of task assignment history information according to the first embodiment.

タスク割当履歴情報１１２は、タスクで使用するデータがタスク処理ノード２００においてキャッシュされている各タスクに対応するエントリを格納する。タスク割当履歴情報１１２のエントリは、タスク３０１、実行ノードＩＤ３０２、解析結果３０３、処理状況３０４、比較回数３０５、平均類似率３０６、採択数３０７、及び最終実行日時３０８のフィールドを含む。 The task assignment history information 112 stores an entry corresponding to each task in which data used in a task is cached in the task processing node 200. The entry of the task assignment history information 112 includes fields of task 301, execution node ID 302, analysis result 303, processing status 304, comparison frequency 305, average similarity rate 306, adoption number 307, and final execution date and time 308.

タスク３０１には、タスク処理ノード２００に割り当てられたタスク（割当タスク）の識別子（タスクＩＤ）が格納される。なお、本実施形態では、タスク３０１には、タスク処理ノード２００のキャッシュ２１５にタスクで使用するデータがキャッシュされた状態が維持されているタスクのタスクＩＤが格納されている。実行ノードＩＤ３０２には、エントリに対応するタスクが割り当てられたノード（タスク処理ノード２００）の識別子（実行ノードＩＤ）が格納される。解析結果３０３には、エントリに対応するタスク（タスクの内容）をタスク解析モジュール１３３により解析した結果が格納される。例えば、自然言語処理系のタスクについては、解析結果３０３には、タスク内容の文章や単語を数値化（ベクトル化）した結果が格納される。処理状況３０４には、エントリに対応するタスクが処理中であるか、又は処理済みであるかの情報が格納される。比較回数３０５には、類似検索モジュール１３２が新たに要求されたタスク（新規タスク）と、エントリに対応するタスク（割当タスク）とを比較した回数が格納される。平均類似率３０６には、類似検索モジュール１３２が新規タスクと割当タスクとを比較した結果（類似率：類似度の一例）についての今までの平均（平均類似率）が格納される。採択数３０７には、タスク割当モジュール１３１がタスクを割り当てるタスク処理ノード２００を決定するためにこのエントリを要因とした、すなわち、このエントリに対応するタスク処理ノード２００を新規タスクの割当先と決定した回数（採択数）が格納される。最終実行日時３０８には、エントリに対応するタスクを実行した最終日時（例えば、ＹＹＭＭＤＤｈｈ：ｍｍ：ｓｓ）が格納される。 The task 301 stores an identifier (task ID) of a task (assignment task) assigned to the task processing node 200. In the present embodiment, the task 301 stores a task ID of a task whose cached state of data used by the task is maintained in the cache 215 of the task processing node 200. The execution node ID 302 stores an identifier (execution node ID) of a node (task processing node 200) to which a task corresponding to the entry is assigned. The analysis result 303 stores the result of analysis of the task (content of task) corresponding to the entry by the task analysis module 133. For example, for a task in the natural language processing system, the analysis result 303 stores the result of digitizing (vectorizing) a sentence or word of task content. The processing status 304 stores information as to whether the task corresponding to the entry is being processed or has been processed. The number of comparisons 305 stores the number of comparisons between the task (new task) newly requested by the similarity search module 132 and the task (allocation task) corresponding to the entry. The average similarity ratio 306 stores the average (average similarity ratio) of the results obtained by the similarity search module 132 comparing the new task with the assignment task (similarity ratio: an example of similarity). In the adoption number 307, the task allocation module 131 uses this entry as a factor to determine the task processing node 200 to which the task is to be allocated, that is, the task processing node 200 corresponding to this entry is determined as the new task allocation destination. The number of times (adopted number) is stored. The final execution date 308 stores the final date (for example, YYMMDD hh: mm: ss) at which the task corresponding to the entry was executed.

次に、キャッシュ管理情報２１４について説明する。 Next, the cache management information 214 will be described.

図３は、第１実施形態に係るキャッシュ管理情報の構成を示す図である。 FIG. 3 is a diagram showing the configuration of cache management information according to the first embodiment.

キャッシュ管理情報２１４は、タスク処理ノード２００に対応するエントリを保持する。キャッシュ管理情報２１４のエントリは、ノードＩＤ４０１、キャッシュ容量４０２、使用済みキャッシュ容量４０３、データ削除閾値４０４、キャッシュ整理方式４０５、キャッシュ済みタスク４０６、及びキャッシュデータ一覧４０７のフィールドを有する。 The cache management information 214 holds an entry corresponding to the task processing node 200. The entry of the cache management information 214 includes fields of a node ID 401, a cache capacity 402, a used cache capacity 403, a data deletion threshold 404, a cache reduction scheme 405, a cached task 406, and a cache data list 407.

ノードＩＤ４０１には、タスク処理ノード２００を識別するためのユニークな識別子（ノードＩＤ）が格納される。なお、キャッシュ管理情報２１４が、自身が格納されたタスク処理ノード２００に関する情報のみを管理する場合には、ノードＩＤ４０１を備えなくてもよい。キャッシュ容量４０２には、エントリに対応するタスク処理ノード２００のキャッシュ２１５として使用可能なメモリ２０２の記憶容量が格納される。使用済みキャッシュ容量４０３には、エントリに対応するタスク処理ノード２００のキャッシュ２１５にキャッシュされているキャッシュデータ２２１の総容量が格納される。データ削除閾値４０４には、キャッシュ２１５におけるデータの削除をするか否かの判断に使用する容量に関する閾値が格納される。キャッシュ整理方式４０５には、キャッシュ２１５のキャッシュ容量が不足した際に、キャッシュデータを整理する方式を示す情報が格納される。キャッシュ済みタスク４０６には、エントリに対応するタスク処理ノード２００のキャッシュ２１５に使用するデータがキャッシュされているタスクの識別子が格納される。キャッシュ済みタスク４０６は、複数のタスクの識別子を格納することができる。キャッシュデータ一覧４０７には、キャッシュ済みタスク４０６のタスク識別子に対応するタスクが使用した又使用しているデータの内、キャッシュ２１５にキャッシュされているキャッシュデータ２２１の識別子の一覧が保持される。 The node ID 401 stores a unique identifier (node ID) for identifying the task processing node 200. When the cache management information 214 manages only information related to the task processing node 200 in which the cache management information 214 is stored, the node ID 401 may not be provided. The cache capacity 402 stores the storage capacity of the memory 202 usable as the cache 215 of the task processing node 200 corresponding to the entry. The used cache capacity 403 stores the total capacity of the cache data 221 cached in the cache 215 of the task processing node 200 corresponding to the entry. The data deletion threshold 404 stores a threshold related to the capacity used to determine whether to delete data in the cache 215. The cache organizing method 405 stores information indicating a method for organizing cache data when the cache capacity of the cache 215 is insufficient. The cached task 406 stores an identifier of a task for which data used in the cache 215 of the task processing node 200 corresponding to the entry is cached. Cached tasks 406 may store identifiers of multiple tasks. The cache data list 407 holds a list of identifiers of the cache data 221 cached in the cache 215 among data used or used by the task corresponding to the task identifier of the cached task 406.

次に、計算機システム１０における処理動作について説明する。 Next, the processing operation in the computer system 10 will be described.

まず、タスク管理ノード１００によるタスク管理処理について説明する。 First, task management processing by the task management node 100 will be described.

図４は、第１実施形態に係るタスク管理処理のフローチャートである。 FIG. 4 is a flowchart of task management processing according to the first embodiment.

タスク管理処理は、タスク管理ノード１００がクライアント５０からタスク実行要求を受信した場合に開始される。なお、本実施形態では、タスク実行要求の対象とするタスク（新規タスク）が自然言語処理のタスクである場合を例に説明する。なお、タスクの種類は自然言語処理のタスクに限らない。 The task management process is started when the task management node 100 receives a task execution request from the client 50. In the present embodiment, the case where the task (new task) to be the target of the task execution request is a natural language processing task will be described as an example. The type of task is not limited to the natural language processing task.

タスク管理モジュール１１１（正確には、タスク管理モジュール１１１を実行するＣＰＵ１０１）は、クライアント５０から受信したタスク処理要求に含まれている新規タスクの解析をタスク解析モジュール１３２に実行させることにより、タスク（タスクの内容
を解析させる（ステップＳ１０１）。具体的には、タスク解析モジュール１３２は、タスクのワード（文章又は語）をベクトル情報に変換（数値化）する。ワードをベクトル情報に変換する方法としては、例えば、ｗｏｒｄ２ｖｅｃ等の公知の技術を用いることができる。 The task management module 111 (precisely, the CPU 101 that executes the task management module 111) causes the task analysis module 132 to execute analysis of the new task included in the task processing request received from the client 50, thereby performing the task ( The task analysis module 132 analyzes the task content (step S101) Specifically, the task analysis module 132 converts (digitizes) the task word (text or word) into vector information as a method of converting the word into vector information For example, known techniques such as word2vec can be used.

次いで、タスク管理モジュール１１１は、タスク割当履歴情報１１２の各エントリの処理状況３０４を参照することにより、現在タスクを処理していないタスク処理ノード２００（空きノード）があるか否かを判定する（ステップＳ１０２）。 Next, the task management module 111 determines whether there is a task processing node 200 (empty node) that is not currently processing a task by referring to the processing status 304 of each entry of the task allocation history information 112 ( Step S102).

この結果、空きノードがない場合（ステップＳ１０２：ＮＯ）には、タスク管理モジュール１１１は、タスク割当履歴情報１１２にエントリがある全タスクを対象に、類似検索モジュール１３３により新規タスクとの類似率を算出させ（ステップＳ１０４）、処理をステップＳ１０６に進める。ここで、類似検索モジュール１３３は、算出された新規タスクのベクトル情報と、タスク割当履歴情報１１２の全タスクのエントリの解析結果３０３の情報とに基づいて、新規タスクと割当タスクとの類似率を算出する。 As a result, when there is no empty node (step S102: NO), the task management module 111 uses the similarity search module 133 to calculate the similarity rate with the new task for all tasks having an entry in the task assignment history information 112. After calculation (step S104), the process proceeds to step S106. Here, the similarity search module 133 calculates the similarity between the new task and the assigned task based on the calculated vector information of the new task and the information of the analysis result 303 of the entries of all the tasks in the task assignment history information 112. calculate.

一方、空きノードがある場合（ステップＳ１０２：ＹＥＳ）には、タスク管理モジュール１１１は、全ての空きノードを対象に、類似検索モジュール１３３により新規タスクとの類似率を算出させ（ステップＳ１０５）、処理をステップＳ１０６に進める。 On the other hand, when there is an empty node (step S102: YES), the task management module 111 causes the similarity search module 133 to calculate the similarity rate with the new task for all empty nodes (step S105), To step S106.

ステップＳ１０６では、タスク管理モジュール１１１は、類似検索モジュール１３３によって算出された類似率に基づいて、最も類似率の高いタスクが存在するタスク処理ノード２００を新規タスクの割当先のノード（割当先ノード）として選択する。 In step S106, the task management module 111 determines, based on the similarity rate calculated by the similarity search module 133, a task processing node 200 having a task with the highest similarity rate as a node to which a new task is assigned (assignment destination node). Choose as.

ここで、新規タスクとの類似率が最も高いタスクは、新規タスクで利用されるデータと同じデータを使用する可能性が高く、そのデータがキャッシュされている可能性も高い。したがって、最も類似率が高いタスクが存在するタスク処理ノード２００を新規タスクの割当先ノードとすると、新規タスクを実行する際に、キャッシュされているデータを利用できる可能性（すなわち、キャッシュヒット率）が高くなる。新規タスクと最も類似度が高いタスクとが同じデータを使用する可能性は、タスクの種類によって異なるが、例えば、タスクが自然言語処理に関するタスクであれば、同じデータを使用する可能性はより高くなる傾向にあることが言える。 Here, the task having the highest similarity rate to the new task is likely to use the same data as the data used in the new task, and the data is also likely to be cached. Therefore, when task processing node 200 having a task with the highest similarity rate is assigned as a new task assignment node, the possibility of using cached data when executing a new task (ie, cache hit rate) Becomes higher. The possibility that the new task and the task with the highest similarity use the same data depends on the type of task, but for example, if the task is a task related to natural language processing, the possibility of using the same data is higher Tend to be

次いで、タスク割当モジュール１３１は、割当先ノードとして選択されたタスク処理ノード２００にタスク処理を依頼する、すなわち、新規タスクを送付する（ステップＳ１０７）。これにより、割当先ノードのタスク処理ノード２００では、タスクの処理が実行され、タスク処理が終了した場合には、タスクの処理結果とタスク完了通知とがタスク管理ノード１００に送信されることとなる。 Next, the task assignment module 131 requests task processing to the task processing node 200 selected as the assignment destination node, that is, sends a new task (step S107). As a result, in the task processing node 200 of the assignment destination node, the processing of the task is executed, and when the task processing ends, the processing result of the task and the task completion notification are transmitted to the task management node 100. .

次いで、タスク管理モジュール１１１は、タスク割当履歴情報１１２を更新する（ステップＳ１０８）具体的には、タスク管理モジュール１１１は、新規タスクに対応するエントリをタスク割当履歴情報１１２に追加し、追加したエントリにおけるタスク３０１に新規タスクの識別子を設定し、実行ノード３０２に割当先ノードの識別子を設定し、解析結果３０３にステップＳ１０１で得られたベクトル情報を設定し、処理状況３０４にタスクが実行中であることを示す処理中を設定し、比較回数３０５に割当タスクと比較した数（類似率を算出した数）を格納し、平均類似率３０６に比較した割当タスクとの平均の類似率を格納し、採択数３０７に０を格納する。また、タスク管理モジュール１１１は、比較を行った割当タスクに対応するエントリの比較回数３０５の比較回数に１を加算し、平均類似率３０６に、比較回数と、更新前の平均類似率と、算出した類似率とに基づいて算出した最新の平均類似率を格納する。また、タスク管理モジュール１１１は、割当先ノードとして選択されたタスク処理ノード２００に対応するエントリの採択数３０７に１を加算する。 Next, the task management module 111 updates the task allocation history information 112 (step S108). Specifically, the task management module 111 adds an entry corresponding to the new task to the task allocation history information 112, and the added entry The identifier of the new task is set in the task 301 in step S. The identifier of the allocation destination node is set in the execution node 302, the vector information obtained in step S101 is set in the analysis result 303, and the task is executing in the processing status 304 It sets “in process” indicating that there is a certain number, stores the number (the number of similarity ratios calculated) compared with the allocation task in the comparison frequency 305, stores the average similarity ratio with the allocation task compared with the average similarity ratio 306 The accepted number 307 stores 0. In addition, the task management module 111 adds 1 to the number of comparisons of the number of comparisons 305 of the entry corresponding to the compared assignment task, and calculates the number of comparisons and the average similarity before the update in the average similarity rate 306. The latest average similarity rate calculated based on the calculated similarity rate is stored. Further, the task management module 111 adds 1 to the adopted number 307 of the entry corresponding to the task processing node 200 selected as the assignment destination node.

その後、タスク管理モジュール１１１は、タスク処理ノード２００からタスク完了通知とタスク処理結果とを受信し（Ｓ１０９）、タスク完了通知がされたタスクに対応するタスク割当履歴情報１１２のエントリの処理状況３０４を処理済みに更新し（Ｓ１１０）、タスク処理結果をタスク処理要求の要求元のクライアント５０に通知する（Ｓ１１１）。 Thereafter, the task management module 111 receives the task completion notification and the task processing result from the task processing node 200 (S109), and the processing status 304 of the entry of the task assignment history information 112 corresponding to the task notified of the task completion. It is updated to “processed” (S110), and the task processing result is notified to the client 50 that is the request source of the task processing request (S111).

次に、タスク処理ノード２００によるタスク実行処理について説明する。 Next, task execution processing by the task processing node 200 will be described.

図５は、第１実施形態に係るタスク実行処理のフローチャートである。 FIG. 5 is a flowchart of task execution processing according to the first embodiment.

タスク実行処理は、タスク処理ノード２００がタスク管理ノード１００から新規タスクを受領した場合に開始される。 The task execution process is started when the task processing node 200 receives a new task from the task management node 100.

タスク実行処理が開始されると、タスク処理ノード２００のタスク処理モジュール２１１は、タスクを解析してタスク処理に必要なデータ（取得対象データ）を調査する（Ｓ２０１）。 When the task execution process is started, the task processing module 211 of the task processing node 200 analyzes the task and investigates data (acquisition target data) necessary for task processing (S201).

次いで、タスク処理モジュール２１１は、取得対象データのそれぞれを処理対象として、ループＡの処理（ステップＳ２０３〜Ｓ２０８）を繰り返し実行する。 Next, the task processing module 211 repeatedly executes the processing of loop A (steps S203 to S208) with each of the acquisition target data as a processing target.

ループＡでは、タスク処理モジュール２１１は、処理対象の取得対象データ（処理対象データ）について、キャッシュ２１５に存在するか否かをキャッシュ管理モジュール２１２に問い合わせる（ステップＳ２０３）。この結果、キャッシュ２１５に存在する場合（ステップＳ２０３：ＹＥＳ）には、タスク処理モジュール２１１は、キャッシュ管理モジュール２１２を介して、キャッシュ２１５から処理対象データに対応するキャッシュデータ２２１を取得する（ステップＳ２０５）。 In loop A, the task processing module 211 inquires the cache management module 212 whether or not the acquisition target data to be processed (data to be processed) exists in the cache 215 (step S203). As a result, when it exists in the cache 215 (step S203: YES), the task processing module 211 acquires cache data 221 corresponding to the processing target data from the cache 215 via the cache management module 212 (step S205). ).

一方、キャッシュ２１５に存在しない場合（ステップＳ２０３：ＮＯ）には、タスク処理モジュール２１１は、データ管理モジュール２１３を介して、全てのタスク処理ノード２００の記憶装置２０３のいずれかから処理対象データに対応するデータ２３１を取得する（ステップＳ２０４）。 On the other hand, when the task processing module 211 does not exist in the cache 215 (step S203: NO), the task processing module 211 corresponds to the processing target data from any of the storage devices 203 of all the task processing nodes 200 via the data management module 213. The data 231 to be acquired is acquired (step S204).

次いで、タスク処理モジュール２１１は、キャッシュ管理モジュール２１２に、キャッシュ２１５にデータ２３１を格納できる空き容量があるか否かを問い合わせる（ステップＳ２０６）。 Next, the task processing module 211 inquires of the cache management module 212 whether or not there is free space in the cache 215 where the data 231 can be stored (step S206).

この結果、キャッシュ２１５にデータ２３１を格納できる空き容量がない場合（ステップＳ２０６：ＮＯ）には、タスク処理モジュール２１１は、キャッシュ管理モジュール２１２にキャッシュ容量不足を通知して、キャッシュ２１５内を整理して空き容量を増やすキャッシュ内整理処理を実行させる（ステップＳ２０７）。キャッシュ内整理処理の方法は、キャッシュ管理情報２１４のキャッシュ整理方式４０５の情報によって異なる。キャッシュ内整理処理については、図６、図８を用いて後述する。 As a result, when there is no free space available for storing the data 231 in the cache 215 (step S206: NO), the task processing module 211 notifies the cache management module 212 that the cache capacity is insufficient, and organizes the cache 215. In-cache organization processing to increase the free space is executed (step S207). The method of the in-cache reorganization processing differs depending on the information of the cache reorganization method 405 of the cache management information 214. The in-cache organizing process will be described later with reference to FIGS. 6 and 8.

空き容量がある場合（ステップＳ２０６：ＹＥＳ）又はキャッシュ内整理処理（ステップＳ２０７）を実行した後には、タスク処理モジュール２１１は、キャッシュ２１５へデータ２３１を書込み、キャッシュ管理情報２１４を更新する（Ｓ２０８）。具体的には、タスク処理モジュール２１１は、キャッシュ管理情報２１４のタスク処理ノード２００に対応するエントリのキャッシュ済みタスク４０６に新規タスクの識別子を格納するとともに、キャッシュデータ一覧４０７に取得したデータの識別子を追加し、使用済みキャッシュ容量４０３をその時点の容量に変更する。 If there is free space (step S206: YES) or after executing the cache organization process (step S207), the task processing module 211 writes the data 231 to the cache 215 and updates the cache management information 214 (S208) . Specifically, the task processing module 211 stores the identifier of the new task in the cached task 406 of the entry corresponding to the task processing node 200 in the cache management information 214, and the identifier of the acquired data in the cache data list 407. Add and change the used cache capacity 403 to the capacity at that time.

取得対象データのすべてを処理対象として、ループＡの処理を実行した場合には、タスク処理モジュール２１１は、ループＡを抜けて、タスクの処理を実行する（ステップＳ２０９）。 When the process of loop A is executed with all the acquisition target data as the process target, the task processing module 211 exits loop A and executes the process of the task (step S209).

次いで、タスク処理モジュール２１１は、タスク処理の実行を終えた場合には、タスクの実行結果を、タスク管理ノード１００へ通知し（ステップＳ２１０）、処理を終了する。 Next, when the task processing module 211 finishes executing the task processing, the task processing module 211 notifies the task management node 100 of the task execution result (step S210), and ends the processing.

次に、タスク処理ノード２００におけるタスク単位のキャッシュ内整理処理について説明する。タスク単位のキャッシュ内整理処理は、図５のステップＳ２０７のキャッシュ内整理処理の一例である。 Next, in-cache organization processing in task units in the task processing node 200 will be described. The task-based in-cache reorganization processing is an example of the in-cache reorganization processing in step S207 of FIG.

図６は、第１実施形態に係るタスク単位のキャッシュ内整理処理のフローチャートである。 FIG. 6 is a flowchart of the in-cache reorganization processing in units of tasks according to the first embodiment.

タスク単位のキャッシュ内整理処理は、キャッシュ管理モジュール２１２が、タスク処理モジュール２１１からキャッシュ容量不足の通知を受領した場合に開始される。 The task-based cache internalization process is started when the cache management module 212 receives a notification of cache capacity shortage from the task processing module 211.

まず、キャッシュ管理モジュール２１２は、自身のタスク処理ノード２００の識別子をタスク管理ノード１００に通知し、削除対象タスクの順位（削除の優先順位）をタスク管理ノード１００に問い合わせる（ステップＳ３０１）。これにより、タスク管理ノード１００は、削除対象タスク決定処理（図７参照）を実行することとなり、この処理の結果として、キャッシュ管理モジュール２１２は、削除対象タスクの順位を取得することとなる。 First, the cache management module 212 notifies the task management node 100 of the identifier of its own task processing node 200, and inquires the task management node 100 about the rank of the deletion target task (deletion priority) (step S301). As a result, the task management node 100 executes the deletion target task determination process (see FIG. 7), and as a result of this process, the cache management module 212 acquires the rank of the deletion target task.

キャッシュ管理モジュール２１２は、タスク管理ノード１００から削除対象タスクの順位を受領すると、その順位の上位から順に、削除対象タスクに対応するキャッシュ管理情報２１４のキャッシュデータ一覧４０７から削除対象とするキャッシュデータ（削除対象キャッシュデータ）の一覧を取得する（Ｓ３０２）。 When the cache management module 212 receives the order of the deletion target task from the task management node 100, the cache data to be deleted from the cache data list 407 of the cache management information 214 corresponding to the deletion target task A list of cache data to be deleted is acquired (S302).

次いで、キャッシュ管理モジュール２１２は、順位に従って１つの削除対象タスクに対応する削除対象キャッシュデータのそれぞれを処理対象としてループＢの処理（ステップＳ３０４，Ｓ３０５）を実行する。 Next, the cache management module 212 executes the processing of loop B (steps S304 and S305) with each of deletion target cache data corresponding to one deletion target task as a processing target according to the order.

ループＢでは、キャッシュ管理モジュール２１２は、キャッシュ管理情報２１４のキャッシュデータ一覧４０７の情報を参照して、処理対象の削除対象キャッシュデータ（処理対象キャッシュデータ）を、他のタスクが使用しているか否かを判定する（ステップＳ３０４）。 In loop B, the cache management module 212 refers to the information in the cache data list 407 of the cache management information 214 and determines whether or not other tasks use the deletion target cache data (processing target cache data) to be processed. It is determined (step S304).

この結果、他のタスクが使用していると判定した場合（ステップＳ３０４：ＹＥＳ）には、キャッシュ２１５から処理対象キャッシュデータを削除すると他のタスクに影響があるので、キャッシュ管理モジュール２１２は、この処理対象キャッシュデータを削除せずに、次に削除対象キャッシュデータを処理対象として、ループＢの処理を実行する。 As a result, if it is determined that another task is using it (step S304: YES), deleting the process target cache data from the cache 215 will affect the other task, so the cache management module 212 Next, without deleting the processing target cache data, the processing of loop B is executed for the processing target of the deletion target cache data.

一方、他のタスクが使用していないと判定した場合（ステップＳ３０４：ＮＯ）には、処理対象キャッシュデータを削除しても他のタスクの実行に影響がないので、キャッシュ管理モジュール２１２は、キャッシュ２１５から処理対象キャッシュデータを削除し（ステップＳ３０５）、次の削除対象キャッシュデータを処理対象として、ループＢの処理を実行する。 On the other hand, when it is determined that another task is not using it (step S304: NO), the cache management module 212 deletes the cache data to be processed because there is no influence on the execution of the other task. The processing target cache data is deleted from 215 (step S305), and the processing of loop B is executed with the next deletion target cache data as the processing target.

そして、削除対象キャッシュデータのすべてを処理対象として、ループＢの処理を実行した場合には、キャッシュ管理モジュール２１２は、ループＢを抜けて、削除対象タスクに対応するキャッシュタスク４０６及びキャッシュデータ一覧４０７をキャッシュ管理情報２１４のエントリから削除する（Ｓ３０６）。 When the processing of loop B is executed with all the deletion target cache data as the processing target, the cache management module 212 exits loop B and the cache task 406 and the cache data list 407 corresponding to the deletion target task. Are deleted from the entry of the cache management information 214 (S306).

次いで、キャッシュ管理モジュール２１２は、キャッシュ２１５の容量不足が解消したか否かを判定する（ステップＳ３０７）。この結果、キャッシュ２１５の容量不足が解消されていない場合（ステップＳ３０７：ＮＯ）には、キャッシュ管理モジュール２１２は、次の順位の削除対象タスクについてループＢの処理を実行する。 Next, the cache management module 212 determines whether the capacity shortage of the cache 215 is resolved (step S307). As a result, when the capacity shortage of the cache 215 is not resolved (step S307: NO), the cache management module 212 executes the process of loop B for the next deletion target task.

一方、キャッシュ２１５の容量不足を解消した場合（ステップＳ３０７：ＹＥＳ）には、キャッシュ管理モジュール２１２は、キャッシュデータを削除したタスクをタスク管理ノード１００に通知し（ステップＳ３０８）、処理を終了する。 On the other hand, if the capacity shortage of the cache 215 is eliminated (step S307: YES), the cache management module 212 notifies the task management node 100 of the task whose cache data has been deleted (step S308), and ends the processing.

以上説明したように、タスク単位のキャッシュ内整理処理によると、キャッシュ２１５の容量を適切に確保することができる。また、他のタスクで使用されているデータについては、キャッシュ２１５から削除されないようにすることができる。 As described above, according to the in-cache reorganization processing in units of tasks, the capacity of the cache 215 can be secured appropriately. In addition, data used in other tasks can be prevented from being deleted from the cache 215.

次に、タスク管理ノード１００による削除対象タスク決定処理について説明する。 Next, deletion target task determination processing by the task management node 100 will be described.

図７は、第１実施形態に係る削除対象タスク決定処理のフローチャートである。 FIG. 7 is a flowchart of deletion target task determination processing according to the first embodiment.

削除対象タスク決定処理は、タスク処理ノード２００から、タスク処理ノード２００の識別子と、削除対象タスクの順位の問い合わせを受信した場合に実行が開始される。 The deletion target task determination process is started when an inquiry about the identifier of the task processing node 200 and the rank of the deletion target task is received from the task processing node 200.

タスク管理モジュール１１１は、削除対象タスクの順位を決定する（ステップＳ４０１）。ここで、削除対象タスクの順位を決定する方法としては、例えば、タスク割当履歴情報１１２の比較回数３０５の比較回数に基づいて決定する方法（例えば、比較回数が少ないものを上位としたり、或いは、比較回数が多いものを上位にしたりする方法）や、タスク割当履歴情報１１２の最終実行日時３０８の日時に基づいて決定する方法（最新のものを上位としたり、最古のものを上位としたりする方法）や、実行済みのタスクであってタスク割当履歴情報１１２の平均類似率３０６の平均類似率が小さいものを上位とする方法等のいずれか少なくとも一つとしてもよい。 The task management module 111 determines the order of the deletion target task (step S401). Here, as a method of determining the order of the deletion target task, for example, a method of determining based on the number of comparisons of the number of comparisons 305 of the task allocation history information 112 (for example Method to make things with a large number of comparisons higher or methods to make decisions based on the date and time of the last execution date and time 308 of the task allocation history information 112 (latest ones are higher ones, oldest ones are higher ones) The method may be at least one of a method in which the executed task is performed and the average similarity rate of the average similarity rate 306 of the task allocation history information 112 is small.

次いで、タスク管理モジュール１１１は、決定した削除対象タスクの順位をタスク処理ノード２００に通知する（ステップＳ４０２）。 Next, the task management module 111 notifies the task processing node 200 of the determined order of the deletion target task (step S402).

次いで、タスク管理モジュール１１１は、タスク処理ノード２００からキャッシュデータを削除したタスクの通知を受けた場合には、タスク割当履歴情報１１２から削除したタスクのエントリを削除して（ステップＳ４０３）、削除対象タスク決定処理を終了する。 Next, when the task management module 111 receives a notification from the task processing node 200 of a task for which cache data has been deleted, the task management module 111 deletes the entry of the deleted task from the task allocation history information 112 (step S403). End the task determination process.

この削除対象タスク決定処理によると、削除対象タスクの順位を適切に決定することができ、キャッシュ２１５からキャッシュデータが削除されたタスクについてのエントリをタスク割当履歴情報１１２から削除することができる。したがって、タスク割当履歴情報１１２には、タスク処理ノード２００のキャッシュ２１５に使用するデータがキャッシュされているタスクについてのエントリのみが存在することとなる。 According to the deletion target task determination process, the order of the deletion target task can be appropriately determined, and the entry for the task whose cache data has been deleted from the cache 215 can be deleted from the task allocation history information 112. Therefore, in the task allocation history information 112, only an entry for a task for which data used in the cache 215 of the task processing node 200 is cached is present.

次に、タスク処理ノード２００におけるキャッシュデータ単位のキャッシュ内整理処理について説明する。キャッシュデータ単位のキャッシュ内整理処理は、図５のステップＳ２０７のキャッシュ内整理処理の一例である。なお、図６に示すキャッシュ内整理処理とのいずれかを実行してもよいし、両方を実行するようにしてもよい。 Next, the cache internal processing in cache data units in the task processing node 200 will be described. The in-cache reorganization processing in units of cache data is an example of the in-cache reorganization processing of step S207 in FIG. 5. Note that one or both of the in-cache organizing processing shown in FIG. 6 may be executed.

図８は、第１実施形態に係るキャッシュデータ単位のキャッシュ内整理処理のフローチャートである。 FIG. 8 is a flowchart of the cache data unit cache internal processing according to the first embodiment.

まず、キャッシュ管理モジュール２１２は、削除対象とするキャッシュデータ（削除対象キャッシュデータ）を決定する（ステップＳ５０１）。削除対象キャッシュデータは、例えば、複数のタスクで重複して利用されていないデータ、利用するタスクの数が少ないデータ、データサイズが大きいデータ、データサイズが小さいデータのいずれとしてもよく、任意の方法で決定することができる。 First, the cache management module 212 determines cache data to be deleted (deletion target cache data) (step S501). The cache data to be deleted may be, for example, data that is not redundantly used by a plurality of tasks, data with a small number of tasks used, data with a large data size, or data with a small data size, any method It can be determined by

次いで、キャッシュ管理モジュール２１２は、キャッシュ管理情報２１４のキャッシュデータ一覧４０７から削除対象キャッシュデータを削除し（ステップＳ５０２）、キャッシュ２１５に存在する削除対象キャッシュデータを削除する（ステップＳ５０３）。 Next, the cache management module 212 deletes the deletion target cache data from the cache data list 407 of the cache management information 214 (step S502), and deletes the deletion target cache data existing in the cache 215 (step S503).

次いで、キャッシュ管理モジュール２１２は、キャッシュ２１５に必要な空き容量を確保できたか否かを確認する（ステップＳ５０４）。この結果、必要な空き容量を確保できていない場合（ステップＳ５０４：ＮＯ）には、キャッシュ管理モジュール２１２は、処理をステップＳ５０１に進め、次の削除対象キャッシュデータを削除する処理を行う。一方、必要な空き容量を確保できた場合（ステップＳ５０４：ＹＥＳ）には、キャッシュ管理モジュール２１２は、キャッシュデータ単位のキャッシュ内整理処理を終了する。 Next, the cache management module 212 checks whether the necessary free space for the cache 215 has been secured (step S504). As a result, if the necessary free space can not be secured (step S504: NO), the cache management module 212 advances the process to step S501 and performs the process of deleting the next cache data to be deleted. On the other hand, when the necessary free space can be secured (step S504: YES), the cache management module 212 ends the cache data unit internal cache organization process.

上記したキャッシュデータ単位のキャッシュ内整理処理によると、タスク管理ノード１００に問い合わせることなく、キャッシュ２１５の空き容量を適切に確保することができる。 According to the above-described cache internal processing in cache data units, it is possible to appropriately secure the free space of the cache 215 without inquiring of the task management node 100.

以上説明したように、第１実施形態に係る計算機システム１０によれば、新規タスクと類似度の高いタスクが割り当てられていたタスク処理ノード２００を新規タスクの割当先とするようにしているので、新規タスクを実行する際に必要なデータがこのタスク処理ノード２００のキャッシュ２１５に存在する可能性が高く、このタスク処理ノード２００の記憶装置２０３からメモリ２０２にデータを読み出したり、或いは、別のタスク処理ノード２００の記憶装置２０３からノード間通信によりデータを取得したりすることが低減され、ノード間通信の通信量を低減することができる。また、ノード間通信の通信量を低減することができるので、データを受信する時間が低減されてタスク処理の実行時間を短縮することができ、タスク処理の結果を得るまでのユーザの待機時間を短縮することができる。 As described above, according to the computer system 10 according to the first embodiment, the task processing node 200 to which a task having a high similarity to the new task is assigned is assigned as the new task assignment destination. Data necessary for executing a new task is likely to exist in the cache 215 of the task processing node 200, and data may be read from the storage device 203 of the task processing node 200 to the memory 202, or another task may be performed. Acquisition of data from the storage device 203 of the processing node 200 by inter-node communication is reduced, and the communication amount of inter-node communication can be reduced. In addition, since the amount of communication between nodes can be reduced, the time to receive data can be reduced and the execution time of task processing can be shortened, and the user's waiting time until the result of task processing is obtained It can be shortened.

次に、本発明の第２実施形態に係る計算機システムについて説明する。なお、以下においては、第１実施形態に係る計算機システムとの相違点を主に説明し、共通点については、説明を省略または簡略する。 Next, a computer system according to a second embodiment of the present invention will be described. In the following, differences from the computer system according to the first embodiment will be mainly described, and the description of the common points will be omitted or simplified.

図９は、第２実施形態に係る計算機システムの全体構成図である。 FIG. 9 is an overall configuration diagram of a computer system according to the second embodiment.

第２実施形態に係る計算機システム１０Ａは、第１実施形態に係る計算機システム１０におけるタスク処理ノード２００の記憶装置２０３及びデータ管理モジュール２１３に相当する構成を含む１以上のデータ管理ノード５００をタスク処理ノード４００と別体とし、このデータ管理ノード５００をネットワーク３００に接続するようにしたものである。なお、図９に示すタスク処理ノード４００は、第１実施形態に係るタスク処理ノード２００から記憶装置２０３及びデータ管理モジュール２１３を除いた構成となっているが、本発明はこれに限られず、タスク処理ノード２００を同じ構成としてもよい。 The computer system 10A according to the second embodiment performs task processing on one or more data management nodes 500 including a configuration corresponding to the storage device 203 and the data management module 213 of the task processing node 200 in the computer system 10 according to the first embodiment. The data management node 500 is separate from the node 400 and connected to the network 300. Although the task processing node 400 illustrated in FIG. 9 has a configuration in which the storage device 203 and the data management module 213 are removed from the task processing node 200 according to the first embodiment, the present invention is not limited thereto. The processing nodes 200 may have the same configuration.

データ管理ノード５００は、ＣＰＵ５０１と、メモリ５０２と、記憶装置２０３と、ネットワークインターフェース５０３とを備える。 The data management node 500 includes a CPU 501, a memory 502, a storage device 203, and a network interface 503.

ネットワークインターフェース５０３は、例えば、有線ＬＡＮカードや無線ＬＡＮカードなどのインターフェースであり、ネットワーク３００を介して他の装置（例えば、クライアント５０、タスク管理ノード１００、タスク処理ノード４００、データ管理ノード５００等）と通信する。 The network interface 503 is, for example, an interface such as a wired LAN card or a wireless LAN card, and another device (for example, the client 50, the task management node 100, the task processing node 400, the data management node 500, etc.) via the network 300. Communicate with.

ＣＰＵ５０１は、メモリ５０２に格納されたプログラム（モジュール）を実行することにより、各種処理を実行する。 The CPU 501 executes various programs by executing programs (modules) stored in the memory 502.

メモリ５０２は、例えば、ＲＡＭであり、ＣＰＵ５０１で実行されるプログラムや、必要な情報を記憶する。 The memory 502 is, for example, a RAM, and stores programs executed by the CPU 501 and necessary information.

メモリ５０２は、データ管理モジュール２１３を格納する。 The memory 502 stores a data management module 213.

データ管理モジュール２１３は、ＣＰＵ５０１に、複数のデータ管理ノード５００で管理された記憶装置２０３内のデータ２３１の配置状況を管理させるプログラムモジュールである。 The data management module 213 is a program module that causes the CPU 501 to manage the arrangement status of the data 231 in the storage device 203 managed by the plurality of data management nodes 500.

第２実施形態に係る計算機システム１０Ａは、計算機システム１０との構成の違いにより各処理を実行するノードが異なる場合があるが、図４〜図８と同様な処理を実行することができる。 The computer system 10A according to the second embodiment may differ in the node that executes each process due to the difference in configuration from the computer system 10, but can execute processes similar to those in FIGS.

上記した第２実施形態によると、第１実施形態と同様な効果が得られる。また、第２実施形態によると、タスク処理及びキャッシュ管理を行うノード（タスク処理ノード４００）と、データの管理を行うノード（データ管理ノード５００）とを分離して備えるようにしているので、例えば、タスク処理ノード４００のみを拡張したり、データ管理ノード５００のみを拡張したりすることができるので、低コストで必要な性能を実現することができる。 According to the above-described second embodiment, the same effect as that of the first embodiment can be obtained. Further, according to the second embodiment, since the node (task processing node 400) that performs task processing and cache management and the node (data management node 500) that manages data are provided separately, for example, Since only the task processing node 400 can be expanded or only the data management node 500 can be expanded, the required performance can be realized at low cost.

なお、本発明は、上述の実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で、適宜変形して実施することが可能である。 The present invention is not limited to the embodiments described above, and can be appropriately modified and implemented without departing from the spirit of the present invention.

例えば、上記実施形態に係るタスク管理処理では、タスクを割り当てるノードを決定する際に、図４のステップＳ１０２及びＳ１０５に示すように、空きノードがある場合には、空きノードを対象にタスクを割り当てるノードを決定するようにしていたが、本発明はこれに限られず、空きノードがあるか否かに関わらず、全てのノードを対象にタスクを割り当てるノードを決定するようにしてもよい。 For example, in the task management process according to the above embodiment, when determining a node to which a task is to be allocated, as shown in steps S102 and S105 of FIG. 4, when there is an empty node, the task is allocated to the empty node. Although the nodes are determined, the present invention is not limited to this, and nodes to which tasks are to be assigned to all nodes may be determined regardless of whether there are free nodes or not.

また、上記実施形態では、類似度が最も高いタスクのデータをキャッシュしているノード（タスク処理ノード）をタスクの割当先のノードに決定していたが、本発明はこれに限られず、例えば、割り当てられているタスクの数が所定範囲内であるノードのなかで、類似度が最も高いタスクのデータをキャッシュしているノードをタスクの割当先のノードに決定してもよく、類似度が所定値以上であるタスクのデータをキャッシュしているノードをタスクの割当先のノードに決定してもよく、要は、類似度に基づいてタスクを割り当てるノードを決定すればよい。 Further, in the above embodiment, the node (task processing node) that caches the data of the task with the highest similarity is determined as the node to which the task is assigned, but the present invention is not limited to this. Among nodes in which the number of tasks assigned is within a predetermined range, a node caching data of a task with the highest degree of similarity may be determined as a node to which tasks are assigned, and the degree of similarity is predetermined. A node that is caching data of a task that is equal to or more than a value may be determined as the node to which the task is assigned. In short, the node to which the task is assigned may be determined based on the similarity.

１０，１０Ａ…計算機システム、１００…タスク管理ノード、１０１…ＣＰＵ、１０２…メモリ、１１２…タスク割当履歴情報、２００，４００…タスク処理ノード、２０１…ＣＰＵ、２０２…メモリ、２０３…記憶装置、２１４…キャッシュ管理情報、２１５…キャッシュ、３００…ネットワーク、５００…データ管理ノード 10, 10A: computer system, 100: task management node, 101: CPU, 102: memory, 112: task allocation history information, 200, 400: task processing node, 201: CPU, 202: memory, 203: storage device, 214 ... Cache management information, 215 ... Cache, 300 ... Network, 500 ... Data management node

Claims

A task management system comprising: a plurality of task processing nodes capable of executing tasks; and a task management node for determining a task processing node to which a new task is to be assigned,
Each of the plurality of task processing nodes has a memory capable of caching data used by an assignment task, which is a task assigned to the node;
The task management node is
Storing task assignment information including a correspondence between the assignment task and the task processing node to which the assignment task is assigned and in which data used by the assignment task is cached;
The processor of the task management node is
Determining the similarity between the new task and the assignment task;
Based on the degree of similarity, a task processing node to which the new task is to be assigned is determined from among the task processing nodes included in the task assignment information,
A task management system for assigning the new task to the determined task processing node.

The task management node is
And storing the assignment task in association with an analysis result obtained by analyzing the assignment task by a predetermined analysis method,
The processor of the task management node is
Analyzing the new task by the predetermined analysis method;
The task management system according to claim 1, wherein the similarity is determined based on an analysis result of the new task and the analysis result of the assignment task.

The assignment task and the new task are tasks related to natural language processing,
The task management system according to claim 2, wherein the predetermined analysis method is a method of vectorizing words or sentences of the assignment task or the new task.

The processor of the task processing node is
4. The data to be used in the new task allocated to the task processing node of its own and not cached in the memory is acquired from a predetermined storage device and cached in the memory. Task management system according to any one of the above.

The task processing node is
Storing information on correspondence between the assignment task and cached data among data used in the assignment task;
The processor of the task management node is
According to a predetermined condition, for the assignment task, a priority in deleting cached data is determined and notified to the task processing node;
The processor of the task processing node is
5. The task management system according to claim 4, wherein the data used in the assignment task is deleted from the memory according to the priority when there is no free space for caching the data acquired from the storage device in the memory. .

The processor of the task processing node is
The task management system according to claim 5, wherein data used in the assignment task and not used in another assignment task is deleted from the memory.

The processor of the task management node is
The new task is assigned to the execution node that has executed the assignment task, the last execution date of the assignment task, the similarity between the assignment task and one or more new tasks, the number of times the similarity with the new task is calculated The task management system according to claim 5 or 6, wherein the priority is determined based on at least one of the number of times.

The processor of the task management node is
The information according to any one of claims 5 to 7 which deletes information which specifies said task processing node which performed said assignment task, when all data used by said assignment task are deleted from said memory. Task management system.

The task processing node is
Storing information on correspondence between the assignment task and cached data among data used in the assignment task;
The processor of the task processing node is
9. The memory according to any one of claims 4 to 8, wherein when the memory has no free space for caching the data acquired from the storage device, data not used by a plurality of the assignment tasks is deleted from the memory. Task management system according to one item.

The predetermined storage device is
The task management system according to any one of claims 1 to 9, provided in at least one of the plurality of task processing nodes.

The predetermined storage device is
The task management system according to any one of claims 1 to 9, provided in a data management node configured separately from the task processing node.

A task management method by a task management system, comprising: a plurality of task processing nodes capable of executing tasks; and a task management node for determining a task processing node to which a new task is to be assigned.
Each of the plurality of task processing nodes has a memory capable of caching data used by an assignment task, which is a task assigned to the node;
Storing task assignment information including a correspondence between the assignment task and the task processing node to which the assignment task is assigned and in which data used by the assignment task is cached;
Determining the similarity between the new task and the assignment task;
Based on the degree of similarity, a task processing node to which the new task is to be assigned is determined from among the task processing nodes included in the task assignment information,
A task management method for assigning the new task to the determined task processing node.

A task management program for causing a computer constituting a task management node to determine a task processing node to which a new task is to be assigned among a plurality of task processing nodes capable of executing tasks,
The computer is
Each of the plurality of task processing nodes has a memory capable of caching data used by an assignment task, which is a task assigned to the node;
On the computer
Storing task assignment information including a correspondence between the assignment task and the task processing node to which the assignment task is assigned and in which data used by the assignment task is cached;
Have the degree of similarity between the new task and the assignment task determined
Based on the degree of similarity, a task processing node to which the new task is to be assigned is determined among the task processing nodes included in the task assignment information;
A task management program for assigning the new task to the determined task processing node.