JP7376450B2

JP7376450B2 - A method for constructing a trained model and a design support device using the trained model

Info

Publication number: JP7376450B2
Application number: JP2020164148A
Authority: JP
Inventors: 重人安原; 英之岡田; 毅吉本; 高幸岡田; 禎後藤; 昇福田; 航司山田
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2020-09-29
Filing date: 2020-09-29
Publication date: 2023-11-08
Anticipated expiration: 2040-09-29
Also published as: JP2022056238A

Description

本開示は、学習済モデル構築方法であって、特に、設計支援を行うための設計支援装置に係る学習済モデルの構築方法と、その学習済モデルを用いた設計支援装置に関する。 The present disclosure relates to a learned model construction method, and particularly to a learned model construction method related to a design support device for providing design support, and a design support device using the learned model.

車両の設計を支援するための設計支援システムであって、一つの要素の設計変更に伴って入力された入力諸元値に基づいて、他の関連する要素の諸元値を自動的に変更するものが開発されている（例えば、特許文献１）。 A design support system for supporting vehicle design, which automatically changes the specification values of other related elements based on the input specification values input when the design of one element is changed. A device has been developed (for example, Patent Document 1).

建物の設計を支援するための設計支援装置であって、建築主から画像情報や言語情報の入力を受け付けて、学習済モデルによる演算を行い、設計対象である建物の設計条件を出力するものが知られている（例えば、特許文献２）。 A design support device for supporting the design of buildings that accepts input of image information and linguistic information from the architect, performs calculations using a trained model, and outputs the design conditions of the building being designed. known (for example, Patent Document 2).

特開２００９－３７５０９号公報Japanese Patent Application Publication No. 2009-37509 特開２０１９－２００７２１号公報JP2019-200721A

車両を設計するためには、目標の設定や、荷重入力の検討等、多くの活動を経る必要がある。特許文献１の設計支援装置では、一つの要素の変更によって、他の要素の諸元値が変更されるものの、経験の浅いユーザは車両設計の各段階でどのような活動をすべきかを知ることができないため、経験の浅いユーザに対して十分な設計支援を行うことが難しい。 In order to design a vehicle, it is necessary to go through many activities such as setting goals and considering load input. In the design support device of Patent Document 1, although changing one element changes the specification values of other elements, it is difficult for an inexperienced user to know what activities to perform at each stage of vehicle design. Therefore, it is difficult to provide sufficient design support to inexperienced users.

そこで、設計支援装置を、特許文献２に記載されているように学習済モデルを用いた設計支援が可能となるように構成することが考えられる。 Therefore, it is conceivable to configure the design support apparatus so as to be able to support design using a learned model as described in Patent Document 2.

学習済モデルを、設計状況を示す状態において、その状態において取るべき行動に報酬をつけ、行動を順次行うことによって得られる価値を最大化する強化学習を用いて構築することが考えられる。このとき、価値を最大化するべく選択される行動が、十分な経験を持つユーザ（エキスパート）の行動に合致するように、価値を設定することが求められる。 It is conceivable to construct a trained model using reinforcement learning, which maximizes the value obtained by sequentially performing the actions in a state that represents the design situation, by assigning rewards to the actions that should be taken in that state. At this time, it is required to set the value so that the action selected to maximize the value matches the action of a user (expert) with sufficient experience.

しかし、車両設計のように、各状態において選択可能な活動の数が多くなると、設定すべき報酬の数が増大し、報酬の設定に係る負担が増大するという問題がある。 However, as in vehicle design, when the number of selectable activities in each state increases, the number of rewards to be set increases, and there is a problem that the burden associated with setting rewards increases.

本発明は、以上の背景を鑑み、強化学習を用いた学習済モデルの構築方法、及び、その学習済モデルを用いた設計支援装置において、報酬の設定を容易にすることを課題とする。 In view of the above background, it is an object of the present invention to provide a method for constructing a trained model using reinforcement learning and a design support device using the trained model to facilitate setting of rewards.

上記課題を解決するために本発明のある態様は、設計の活動の実行の有無に依存して定められる状態（Ｓ）と、前記状態の下で選択可能な前記活動である行動（Ａ）との組み合わせに対して報酬（Ｒ）を付与し、価値を最大化することによって学習済モデル（５０）を構築する学習済モデルの構築方法であって、予め入力された規則（１６）に基づいて前記報酬を付与するステップ（ＳＴ４）と、付与された前記報酬に基づいて強化学習を行うステップ（ＳＴ５）とを含み、前記規則は経路元活動と、前記経路元活動の後に行われる経路先活動と、前記経路元活動の後に前記経路先活動が行われることの重要度を示す太さとに係る情報を含み、前記報酬を付与するステップにおいて、前記状態に至る直前の前記行動が前記経路元活動に適合し、且つ、前記行動と前記経路先活動とが適合するときに、前記太さに基づいて前記報酬を設定する。 In order to solve the above problems, an aspect of the present invention provides a state (S) that is determined depending on whether or not a design activity is executed, and an action (A) that is the activity that can be selected under the state. A learned model construction method that constructs a trained model (50) by giving a reward (R) to a combination of and maximizing the value, the method comprising: The rule includes a step of providing the reward (ST4) and a step of performing reinforcement learning based on the reward that has been provided, and the rule is based on a route source activity and a route destination activity that is performed after the route source activity. and a thickness indicating the importance of the route destination activity being performed after the route source activity, and in the step of giving the reward, the behavior immediately before leading to the state is the route source activity. and when the action and the route destination activity match, the reward is set based on the thickness.

この態様によれば、エキスパートの経験や知識に基づく規則を入力することによって、エキスパートの行動に即した報酬を容易に且つ適切に設定することができる。 According to this aspect, by inputting rules based on the expert's experience and knowledge, it is possible to easily and appropriately set a reward that matches the expert's behavior.

上記態様において、前記状態は、複数の前記活動の実行の有無（Ｔ，Ｆ）と、直前に実行された前記活動とを含む組み合わせによって定義されるとよい。 In the above aspect, the state may be defined by a combination including whether or not a plurality of the activities are executed (T, F) and the activity that was executed immediately before.

この態様によれば、状態に至る直前の活動及び行動を用いて、状態及び行動の組み合わせがエキスパートの設計手順に沿った規則に合致しているかを容易に判定することができる。 According to this aspect, it is possible to easily determine whether a combination of a state and an action matches a rule according to an expert's design procedure using the activity and action immediately before the state is reached.

上記態様において、前記状態は、複数の前記活動の実行の有無と、直前に実行された前記活動と、設計の条件に対応する番号（ａ，ｂ，ｕ）とを含む組み合わせによって定義されるとよい。 In the above aspect, the state is defined by a combination including whether or not a plurality of the activities are executed, the activity executed immediately before, and a number (a, b, u) corresponding to a design condition. good.

この態様によれば、設計の条件に応じた設計支援が可能となる。 According to this aspect, design support can be provided according to the design conditions.

上記態様において、設計の前記条件に対応する前記番号には、設計の前記条件が未定であることを示す前記番号（ｕ）を含むとよい。 In the above aspect, the number corresponding to the design condition may include the number (u) indicating that the design condition is undetermined.

この態様によれば、設計の条件が未定である場合であっても、設計支援が可能となる。 According to this aspect, even if design conditions are undetermined, design support can be provided.

上記態様において、設計に係る前記活動に対応する複数のノード（２２）、及び、前記活動の関係に基づいて、対応する前記ノードを接続するエッジ（２３）を含むナレッジグラフ（２１）を用いて、前記ノードに対応する前記活動の中から、前記規則に照合することにより前記状態を定義するための前記活動を抽出し、前記状態の集合である状態空間（Ｐ）を生成するステップを含むとよい。 In the above aspect, a knowledge graph (21) including a plurality of nodes (22) corresponding to the activities related to the design and edges (23) connecting the corresponding nodes based on the relationship between the activities is used. , the step of extracting the activity for defining the state by matching with the rule from among the activities corresponding to the node, and generating a state space (P) that is a set of the states. good.

この態様によれば、設計に係る情報が記憶されたナレッジグラフを用いて状態を生成し、設計支援を行うことができる。 According to this aspect, a state can be generated using a knowledge graph in which information related to design is stored, and design support can be performed.

上記態様において、前記ナレッジグラフは、設計情報、及び、設計手順に係るデータをそれぞれ含むとよい。 In the above aspect, the knowledge graph preferably includes design information and data related to a design procedure.

この態様によれば、ナレッジグラフに設計情報に係るデータと設計手順に係るデータとが含まれるため、ナレッジグラフの有用性が高められる。 According to this aspect, since the knowledge graph includes data related to design information and data related to design procedures, the usefulness of the knowledge graph is enhanced.

上記態様において、前記ナレッジグラフには、前記ノードが階層に分かれて記録され、前記規則は前記経路元活動に対応する前記ノードが、前記経路先活動に対応する前記ノードの上層に位置しているときに、前記太さが所定の値であることを含むとよい。 In the above aspect, the nodes are recorded in a hierarchical manner in the knowledge graph, and the rule is such that the node corresponding to the route source activity is located above the node corresponding to the route destination activity. In some cases, the thickness may be a predetermined value.

この態様によれば、ナレッジグラフの上層に位置するノードに対応する活動から、下層に位置するノードに対応する活動に向かって下るように活動の指示が行われるため、ユーザに指示内容が理解され易くなる。 According to this aspect, since the activity instructions are given so as to descend from the activities corresponding to the nodes located in the upper layer of the knowledge graph toward the activities corresponding to the nodes located in the lower layer, the user does not understand the contents of the instructions. It becomes easier.

上記態様において、前記規則は前記経路元活動に対応する前記ノードと、前記経路先活動に対応する前記ノードとの階層の差が所定値以上であるときには、前記報酬は付与されないとよい。 In the above aspect, the rule may be such that the reward is not given when a difference in hierarchy between the node corresponding to the route source activity and the node corresponding to the route destination activity is a predetermined value or more.

この態様によれば、経路元活動に対応するノードと、経路先活動に対応するノードとの階層の差が大きくなり、活動の流れがユーザに理解し難くなることが防止できる。 According to this aspect, it is possible to prevent the difference in hierarchy between the node corresponding to the route source activity and the node corresponding to the route destination activity from becoming large and making it difficult for the user to understand the flow of the activity.

上記態様において、前記規則はテキストによって入力され、前記状態空間を生成するステップにおいて、前記テキストに含まれる単語と、前記ノードに対応する前記活動とを照合することによって、前記ノードに対応する前記活動の中から、前記状態を定義するための前記活動を抽出するとよい。 In the above aspect, the rule is input by text, and in the step of generating the state space, the activity corresponding to the node is determined by matching words included in the text with the activity corresponding to the node. It is preferable to extract the activity for defining the state from among the activities.

この態様によれば、ノードに対応する活動と規則とを容易に照合することができる。 According to this aspect, it is possible to easily match the activity corresponding to the node with the rule.

上記態様において、上記の方法によって構築された学習済モデルに基づいて、設計支援を行う設計支援装置（１、１０１）であって、ユーザから入力された情報に基づいて初期状態を設定するステップ（ＳＴ２４）と、前記初期状態から価値を最大化する前記行動を順次行わせるべく、前記ユーザに指示を行う出力を行うステップ（ＳＴ２５）とを実行する。 In the above aspect, the design support apparatus (1, 101) performs design support based on the trained model constructed by the above method, and the step (1) of setting an initial state based on information input by the user ST24) and a step (ST25) of outputting an instruction to the user to sequentially perform the actions that maximize the value from the initial state.

この態様によれば、設計支援装置を、エキスパートの経験や知識に基づく規則を入力することによって、エキスパートの行動に即した学習モデルに基づいて設計支援を行うように構成することができる。 According to this aspect, the design support device can be configured to perform design support based on a learning model that matches the expert's behavior by inputting rules based on the experience and knowledge of the expert.

上記態様において、上記の方法によって構築された学習済モデルに基づいて、設計支援を行う設計支援装置（１０１）であって、ユーザから入力された情報に基づいて初期状態を設定するステップ（ＳＴ２４）と、前記初期状態から価値を最大化する前記行動を順次行わせるべく、前記ユーザに指示を行う出力を行うステップ（ＳＴ２５）とを実行し、前記出力を行うステップにおいて、設計の前記条件に係る入力を受け付けたときには、前記状態を入力された前記条件に合致するものに遷移させた後、価値を最大化する前記行動を順次行わせるべく、前記ユーザに指示を行う前記出力を行う。 In the above aspect, the design support device (101) performs design support based on the learned model constructed by the above method, and a step (ST24) of setting an initial state based on information input by the user. and a step (ST25) of outputting instructions to the user to sequentially perform the actions that maximize value from the initial state, and in the step of outputting, When an input is received, the state is changed to one that matches the input condition, and then the output is performed to instruct the user to sequentially perform the actions that maximize value.

この態様によれば、設計支援装置をエキスパートの行動に即した学習モデルに基づいて設計支援が可能となるように構成できる。また、設計支援装置を、設計の条件が変更された場合であっても、変更後の条件に合致した学習モデルを用いて設計支援が可能となるように構成できる。 According to this aspect, the design support device can be configured to be able to provide design support based on a learning model that matches the behavior of an expert. Moreover, even if the design conditions are changed, the design support apparatus can be configured to be able to support the design using a learning model that matches the changed conditions.

以上の構成によれば、強化学習を用いた学習済モデルの構築方法、及び、その学習済モデルを用いた設計支援装置において、報酬の設定を容易にすることができる。 According to the above configuration, it is possible to easily set a reward in a method for constructing a trained model using reinforcement learning and a design support device using the trained model.

第１実施形態に係る学習済モデルの構築方法が実施される設計支援装置のハードウェア構成を示すブロック図A block diagram showing the hardware configuration of a design support device that implements the learned model construction method according to the first embodiment. 設計支援装置の機能ブロック図Functional block diagram of design support equipment ナレッジグラフの例を示す図Diagram showing an example of a knowledge graph 上位概念モデルの例を示す図Diagram showing an example of a superordinate concept model 学習済モデル構築処理のフローチャートFlowchart of trained model construction process （Ａ）抽出グラフ、及び（Ｂ）状態空間を説明するための説明図(A) Extraction graph and (B) Explanatory diagram for explaining state space 報酬テーブルを示す図Diagram showing the reward table 報酬設定処理のフローチャートFlowchart of reward setting process Ｑテーブルを示す図Diagram showing Q table 支援処理のフローチャートSupport processing flowchart 学習済の状態遷移を説明するための説明図Explanatory diagram for explaining learned state transitions 第２実施形態に係る上位概念モデルの例を示す図A diagram showing an example of a superordinate concept model according to the second embodiment 第２実施形態に係る状態及び状態遷移を説明するための説明図Explanatory diagram for explaining states and state transitions according to the second embodiment

本発明に係る学習済モデルの構築方法は、学習済モデルを用いた設計支援を行う設計支援装置によって実施される。以下、図面を参照して、本発明に係る設計支援装置を、車両の設計支援に適用した実施形態について説明する。 The learned model construction method according to the present invention is implemented by a design support device that provides design support using a learned model. DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment in which a design support apparatus according to the present invention is applied to vehicle design support will be described below with reference to the drawings.

ここでいう設計支援とは、設計に携わるユーザから入力を受け付けて、設計するためにユーザに行うべき工程（活動）を指示することである。 Design support here refers to receiving input from users involved in design and instructing the users on the steps (activities) to be performed in order to design.

＜＜第１実施形態＞＞
設計支援装置１は、図１に示すように、公知のハードウェア構成を有するコンピュータを含み、プロセッサ２、ＲＡＭ３、ＲＯＭ４、ストレージ５、入力装置６、及び、出力装置７を有する。ストレージ５はＨＤＤやＳＳＤ等の、情報を記憶する公知の記憶装置であってよい。また、入力装置６はキーボード、マウス、マイクロホン等であってよく、出力装置７はモニタ、スピーカ等であってよい。 <<First embodiment>>
As shown in FIG. 1, the design support device 1 includes a computer having a known hardware configuration, and includes a processor 2, a RAM 3, a ROM 4, a storage 5, an input device 6, and an output device 7. The storage 5 may be a known storage device for storing information, such as an HDD or an SSD. Further, the input device 6 may be a keyboard, a mouse, a microphone, etc., and the output device 7 may be a monitor, a speaker, etc.

設計支援装置１は図２に示すように、機能部として、記憶部１０と、入力部１１と、出力部１２と、モデル構築部１３と、支援処理部１４とを備える。 As shown in FIG. 2, the design support apparatus 1 includes a storage section 10, an input section 11, an output section 12, a model construction section 13, and a support processing section 14 as functional sections.

記憶部１０はストレージ５を含み、設計支援に係る情報を記憶する。記憶部１０は少なくとも、知識データベース１５と、上位概念モデル１６とを記憶（保持）している。 The storage unit 10 includes a storage 5 and stores information related to design support. The storage unit 10 stores (holds) at least a knowledge database 15 and a superordinate concept model 16.

知識データベース１５は、設計に係る知識が階層化されて記録されたデータベースであり、ナレッジグラフ２１を含む。ナレッジグラフ２１は、図３に示すように、所定の高さの木構造（ツリー構造）を有している。本実施形態では、ナレッジグラフ２１は、設計情報に係るデータベース（ＳｙｓＭＬ）と設計手順に係るデータベース（ＧＳＮ）とを組み合わせることによって構築されている。ナレッジグラフ２１には，設計対象となる車両の部分や部品の諸元、その目標値、また、設計上の制約、開発戦略（手順）、証拠などが分類されて保持されている。このように、ナレッジグラフ２１に、設計情報と設計手順に係る情報とが含まれているため、いずれか一方のみが含まれている場合に比べて、その有用性が高められている。 The knowledge database 15 is a database in which knowledge related to design is recorded in a hierarchical manner, and includes a knowledge graph 21. As shown in FIG. 3, the knowledge graph 21 has a tree structure with a predetermined height. In this embodiment, the knowledge graph 21 is constructed by combining a database related to design information (SysML) and a database related to design procedures (GSN). The knowledge graph 21 classifies and holds specifications of parts and parts of the vehicle to be designed, their target values, design constraints, development strategies (procedures), evidence, and the like. In this way, since the knowledge graph 21 includes the design information and the information related to the design procedure, its usefulness is increased compared to the case where only one of them is included.

ナレッジグラフ２１は、複数のノード２２と、それぞれが２つのノード２２を接続する複数のエッジ２３とによって構成されている。ナレッジグラフ２１に含まれるノード２２はそれぞれ、設計に要する１つの活動に対応する。ここでいう活動とは、設計を行うために実行すべき工程に相当する。図３に示すように、ノード２２は、例えば、「扉上部の管理目標を設定する」や、「部位αの荷重入力を検討する」等の活動に対応する。 The knowledge graph 21 is composed of a plurality of nodes 22 and a plurality of edges 23, each of which connects two nodes 22. Each node 22 included in the knowledge graph 21 corresponds to one activity required for design. The activities here correspond to the steps that must be performed in order to perform a design. As shown in FIG. 3, the node 22 corresponds to, for example, activities such as "setting a management target for the upper part of the door" and "considering the load input for the part α."

ナレッジグラフ２１には、ノード２２がそれぞれ、対応する活動の抽象レベル（概念）に応じて、複数の階層に分かれて記録されている。すなわち、活動に係る概念が上位になるほど、対応するノード２２は上層に位置している。例えば、設計工程における上流に位置する活動（例えば、目標設定）に対応するノード２２は、下流に位置する活動（例えば、目標を達するために要する検討）に対応するノード２２に比べて上層に位置する。より具体的には、図３に示すように、車両設計に関するノード２２は、入力の検討、耐力の検討や、板厚の検討に対応するノード２２よりも上層に位置する。 In the knowledge graph 21, each node 22 is recorded in a plurality of hierarchies depending on the abstraction level (concept) of the corresponding activity. That is, the higher the concept related to the activity, the higher the corresponding node 22 is located. For example, a node 22 corresponding to an upstream activity in the design process (e.g., goal setting) is located at a higher level than a node 22 corresponding to a downstream activity (e.g., consideration required to reach the goal). do. More specifically, as shown in FIG. 3, the node 22 related to vehicle design is located at a higher level than the node 22 corresponding to consideration of input, strength, and plate thickness.

その他、包含関係にある部材に係る同一の活動については、含まれる部材に係る活動が記載されたノード２２は、含む部材に係る活動が記載されたノード２２よりも下層に位置している。また、部材（例えば、車体）の設計と記載されたノード２２は、その部材に含まれる部分（例えば、部位α）に係る検討と記載されたノード２２よりも上層に位置する。 Regarding the same activities related to members in an inclusive relationship, the node 22 in which the activity related to the included member is described is located at a lower level than the node 22 in which the activity related to the included member is described. Furthermore, the node 22 that describes the design of a member (for example, a vehicle body) is located at a higher level than the node 22 that describes the study of a part (for example, the part α) included in the member.

各エッジ２３はノード２２の間の関係性を表し、互いに関わり合いのある（すなわち、相関のある）活動が記載されたノード２２を接続している。本実施形態では、エッジ２３は主に互いに階層の異なる２つのノード２２を接続している。エッジ２３は例えば、２つのノード２２に記載された活動が親子関係にある場合や、包含関係にある場合、従属関係にある場合、因果関係にある場合等に両者を接続する。 Each edge 23 represents a relationship between nodes 22 and connects nodes 22 in which mutually related (that is, correlated) activities are described. In this embodiment, the edge 23 mainly connects two nodes 22 of different hierarchies. For example, the edge 23 connects two activities described in the nodes 22 when they have a parent-child relationship, an inclusion relationship, a dependent relationship, a causal relationship, and the like.

上位概念モデル１６は、エキスパートの設計順序に係る判断基準（暗黙知）をモデル化したものであり、設計における条件（プロジェクト全体の条件や設計結果による条件）を元に依存関係を記述している。上位概念モデル１６はエキスパートが持っているノウハウの抽出することによって構築される。上位概念モデル１６は、実際の業務シナリオ、過去のトラブル分析、ヒアリング、動的発生事象の因果関係から抽出されて、構築されるとよい。 The superordinate concept model 16 is a model of experts' judgment criteria (tacit knowledge) related to the design order, and describes dependencies based on design conditions (conditions of the entire project and conditions based on design results). . The superordinate concept model 16 is constructed by extracting the know-how possessed by experts. The superordinate concept model 16 is preferably constructed by being extracted from actual business scenarios, past trouble analysis, hearings, and causal relationships between dynamically occurring events.

本実施形態では、図４に示すように、上位概念モデル１６は、複数の上位概念ユニット３０を含む。上位概念ユニット３０はそれぞれ、エキスパートから取得したテキストと、そのテキストそれぞれに係る一対以上の活動（以下、対をなす２つの活動を活動対と記載する）と、活動対それぞれに付与された太さとを含む。活動対はそれぞれ、活動（経路元活動）と、経路元活動の直後に行われる活動（経路先活動）とを含む。上位概念モデル１６は、エキスパートならば、経路元活動を行った後に、経路後の活動を行うという規則を示すものであって、太さは経路元活動の後に経路先活動が行われることの重要度を示す。本実施形態では、太さの値が大きくなることによって、重要度が大きくなる。上位概念モデル１６は上位概念モデル１６を複数含むテーブルとして、記憶部１０に記憶されている。但し、上位概念モデル１６はこの態様には限定されず、更に、活動対が行われるための前提条件（例えば、開発モデル等）を含んでいてもよい。 In this embodiment, as shown in FIG. 4, the superordinate concept model 16 includes a plurality of superordinate concept units 30. Each superordinate concept unit 30 includes a text acquired from an expert, one or more pairs of activities related to each of the texts (hereinafter, two paired activities will be referred to as an activity pair), and a thickness assigned to each activity pair. including. Each activity pair includes an activity (source activity) and an activity that occurs immediately after the source activity (destination activity). The superordinate concept model 16 indicates the rule that an expert would perform a post-route activity after performing a route-source activity, and the thickness indicates the importance of performing a route-destination activity after a route-source activity. Show degree. In this embodiment, the degree of importance increases as the thickness value increases. The superordinate concept model 16 is stored in the storage unit 10 as a table including a plurality of superordinate conceptual models 16. However, the superordinate concept model 16 is not limited to this aspect, and may further include preconditions (for example, a development model, etc.) for performing the activity pair.

上位概念モデル１６は、学習モデルの構築前に予め記憶部１０に記憶されている。上位概念モデル１６はエキスパートが設計支援装置１に入力されることによって構築され、記憶部１０に記憶されてもよい。また、設計支援装置１がエキスパートから取得したテキストから上位概念モデル１６を構築する概念構築部を備え、その概念構築部によって複数の上位概念ユニット３０が構築され、上位概念モデル１６として記憶部１０に記憶されてもよい。 The superordinate concept model 16 is stored in advance in the storage unit 10 before constructing the learning model. The superordinate concept model 16 may be constructed by inputting an expert into the design support device 1 and stored in the storage unit 10. The design support device 1 also includes a concept construction unit that constructs a superordinate concept model 16 from the text acquired from the expert, and the concept construction unit constructs a plurality of superordinate concept units 30 and stores them in the storage unit 10 as the superordinate concept model 16. May be stored.

入力部１１は入力装置６を含み、ユーザから入力される入力情報を取得する。本実施形態では、入力部１１はユーザから設計に係る言語情報を取得することができる。 The input unit 11 includes the input device 6, and acquires input information input by the user. In this embodiment, the input unit 11 can acquire linguistic information related to the design from the user.

出力部１２は出力装置７を含み、ユーザに出力情報を出力する。本実施形態では、出力部１２はユーザにすべき行動の内容を音声や画像、テキスト等によって出力して、ユーザにすべき活動を伝達する。 The output unit 12 includes an output device 7 and outputs output information to the user. In this embodiment, the output unit 12 outputs the content of the action to be performed by the user in the form of audio, image, text, etc., to convey the action to be performed to the user.

モデル構築部１３は、プロセッサ２によって実行されるソフトウエアによって構成されている。モデル構築部１３はモデル構築処理を行うことによって学習済モデルの構築方法を実施する。本実施形態では、モデル構築部１３はモデル構築処理において、少なくとも設計の活動の実行の有無に依存して定められる状態Ｓと、状態Ｓの下で選択可能な活動である行動Ａとの組み合わせに対して報酬Ｒ（図７参照）を付与し、価値を最大化する、いわゆる強化学習を行うことによって学習済モデルを構築する。 The model construction unit 13 is configured by software executed by the processor 2. The model construction unit 13 implements a learned model construction method by performing a model construction process. In this embodiment, in the model construction process, the model construction unit 13 determines the combination of a state S that is determined depending on whether or not a design activity is executed, and an action A that is an activity that can be selected under the state S. A trained model is constructed by assigning a reward R (see FIG. 7) to the model and performing so-called reinforcement learning to maximize the value.

支援処理部１４は、プロセッサ２によって実行されるソフトウエアによって構成されている。支援処理部１４はモデル構築部１３によって構築された学習済モデルを用いて設計支援を行う。 The support processing unit 14 is configured by software executed by the processor 2. The support processing unit 14 performs design support using the trained model constructed by the model construction unit 13.

次に、図５を参照して、モデル構築部１３によって実行されるモデル構築処理の詳細について説明する。 Next, details of the model construction process executed by the model construction unit 13 will be described with reference to FIG. 5.

モデル構築部１３はモデル構築処理の最初のステップＳＴ１において、上位概念モデル１６から、設計を行うために要する上位概念ユニット３０を選択する選択処理を実行する。より具体的には、モデル構築部１３は選択処理において、上位概念モデル１６に含まれる上位概念ユニット３０のリストを出力部１２に出力する。エキスパートはその出力部１２の出力を確認し、入力部１１に入力する。モデル構築部１３は入力部１１から設計対象となる車両に適した上位概念ユニット３０の選択を受け付ける。上位概念ユニット３０の選択の受付が完了すると、モデル構築部１３はステップＳＴ２を実行する。 In the first step ST1 of the model construction process, the model construction unit 13 executes a selection process for selecting a high-level concept unit 30 required for designing from the high-level concept model 16. More specifically, the model construction unit 13 outputs a list of the superordinate concept units 30 included in the superordinate concept model 16 to the output unit 12 in the selection process. The expert checks the output of the output section 12 and inputs it to the input section 11. The model construction unit 13 receives from the input unit 11 the selection of a general concept unit 30 suitable for the vehicle to be designed. When the acceptance of the selection of the superordinate concept unit 30 is completed, the model construction unit 13 executes step ST2.

モデル構築部１３はステップＳＴ２において、抽出処理を実行する。モデル構築部１３は、抽出処理において、ナレッジグラフ２１からステップＳＴ１において選択された上位概念ユニット３０に適合する領域を抽出し、抽出グラフ３２として記憶部１０に保存する。 The model construction unit 13 executes an extraction process in step ST2. In the extraction process, the model construction unit 13 extracts from the knowledge graph 21 a region that matches the superordinate concept unit 30 selected in step ST1, and stores it in the storage unit 10 as an extraction graph 32.

本実施形態では、モデル構築部１３は、まず、ナレッジグラフ２１のノード２２の中から、ノード２２に対応する活動が、抽出された上位概念ユニット３０に含まれる経路元活動又は経路先活動に適合するものを抽出する。このとき、モデル構築部１３は、活動に記載れた単語と、経路元活動に記載された単語又は経路先活動に記載された単語とがそれぞれ同一であるか、類似であるときに、ノード２２に対応する活動が、経路元活動又は経路先活動に適合すると判定するとよい。このような類似判定を行うため、記憶部１０は類義語辞書等を含む所定のデータベースを記憶しているとよく、モデル構築部１３はこのデータベースを参照することによって、単語の類否を判定するとよい。その後、モデル構築部１３は、ナレッジグラフ２１の中から、適合したノード２２を含む領域（図３の破線で囲まれた領域を参照）を上位概念ユニット３０に適合する領域として抽出し、抽出グラフ３２（図６（Ａ）を参照）として記憶部１０に保存する。抽出グラフ３２の保存が完了すると、モデル構築部１３はステップＳＴ３を実行する。 In this embodiment, the model construction unit 13 first determines, from among the nodes 22 of the knowledge graph 21, that the activity corresponding to the node 22 is compatible with the route source activity or route destination activity included in the extracted superordinate concept unit 30. Extract what you want. At this time, when the word written in the activity and the word written in the route source activity or the word written in the route destination activity are respectively the same or similar, the model construction unit 13 It is preferable to determine that the activity corresponding to the route source activity or the route destination activity is compatible with the route source activity or the route destination activity. In order to perform such similarity determination, the storage unit 10 preferably stores a predetermined database including a synonym dictionary, etc., and the model construction unit 13 preferably determines the similarity of words by referring to this database. . Thereafter, the model construction unit 13 extracts a region (see the region surrounded by the broken line in FIG. 3) containing the matching node 22 from the knowledge graph 21 as a region matching the superordinate concept unit 30, and creates an extracted graph. 32 (see FIG. 6(A)) in the storage unit 10. When the storage of the extracted graph 32 is completed, the model construction unit 13 executes step ST3.

モデル構築部１３はステップＳＴ３において、抽出グラフ３２から上位概念ユニット３０に適合するノード２２を抽出し、抽出されたノード２２に基づいて状態空間生成処理を行う。ここでいう状態Ｓは少なくとも抽出グラフ３２の含まれる各活動の実行の有無を示す組み合わせに依存して定められるものであり、その組み合わせに加えて、活動の履歴に基づくものであってもよい。本実施形態では、状態Ｓは、図６に示すように、各活動の実行の有無と、その組み合わせに至る直前に実行された活動との組み合わせとによって定義され、状態空間Ｐは全ての状態Ｓを含む集合として定義される。 In step ST3, the model construction unit 13 extracts nodes 22 that match the superordinate concept unit 30 from the extraction graph 32, and performs state space generation processing based on the extracted nodes 22. The state S here is determined depending on at least a combination indicating whether or not each activity included in the extraction graph 32 is executed, and may be based on the history of the activity in addition to the combination. In this embodiment, as shown in FIG. 6, the state S is defined by the execution or non-execution of each activity and the combination with the activity executed immediately before that combination, and the state space P includes all the states S. is defined as a set containing

本実施形態では、モデル構築部１３は、抽出グラフ３２から上位概念ユニット３０に適合するノード２２を抽出し、抽出されたノード２２それぞれに対応する活動の各活動の実行（Ｔ）又は未実行（Ｆ）を示す論理式の組み合わせを算出する。但し、モデル構築部１３は、抽出グラフ３２の最も上流に位置するノード２２に対応する活動（図４では活動１）については、常に実行（Ｔ）とする。その後、モデル構築部１３は、各活動の実行及び未実行の組み合わせに対し、実行（Ｔ）となっている活動をその組み合わせに至る直前に実施された活動として１つずつ選択して組み合わせて全ての状態Ｓを取得し、それらを組み合わせて状態空間Ｐを構成する。 In this embodiment, the model construction unit 13 extracts nodes 22 that match the superordinate concept unit 30 from the extraction graph 32, and executes (T) or does not execute (T) each activity of the activities corresponding to each of the extracted nodes 22. A combination of logical expressions indicating F) is calculated. However, the model construction unit 13 always executes (T) the activity corresponding to the most upstream node 22 of the extraction graph 32 (activity 1 in FIG. 4). After that, the model construction unit 13 selects and combines all the activities that are executed (T) one by one for each combination of executed and unexecuted activities as the activities that were executed immediately before that combination. , and configure a state space P by combining them.

図６（Ｂ）に示すように、活動１～３が抽出された場合には、活動２及び３の実行、未実行を示す論理式の組み合わせは、（活動２、活動３）＝（Ｆ，Ｆ）、（Ｔ，Ｆ）、（Ｆ、Ｔ）及び（Ｔ，Ｔ）となる。但し、（活動２、活動３）＝（Ｆ，Ｆ）は両活動が未実行、（Ｔ，Ｆ）は活動２が実行、活動３が未実行、（Ｆ，Ｔ）は活動２が未実行、活動３が実行、（Ｔ，Ｔ）は両活動が実行済であることを意味する。全ての状態Ｓは、活動２及び３の実行、未実行を示す論理式の組み合わせそれぞれに至る直前に実行されうる活動を組み合わせることによって表現される。より具体的には、状態Ｓは、図６（Ｂ）に示されるように、（活動２、活動３，直前の活動）＝（Ｆ，Ｆ，１）、（Ｔ，Ｆ，２）、（Ｆ，Ｔ，３）、（Ｔ，Ｔ，２）、（Ｔ，Ｔ，３）の５つとなる。ここで、（）内の最後の要素は、直前の活動の番号を表す。 As shown in FIG. 6(B), when activities 1 to 3 are extracted, the combination of logical expressions indicating execution or non-execution of activities 2 and 3 is (activity 2, activity 3) = (F, F), (T,F), (F,T) and (T,T). However, (activity 2, activity 3) = (F, F), both activities are not executed, (T, F), activity 2 is executed, activity 3 is not executed, (F, T), activity 2 is not executed. , activity 3 has been executed, (T, T) means both activities have been executed. All states S are expressed by combining activities that can be executed immediately before each combination of logical expressions indicating execution or non-execution of activities 2 and 3. More specifically, as shown in FIG. 6(B), the state S is (activity 2, activity 3, previous activity) = (F, F, 1), (T, F, 2), ( There are five types: F, T, 3), (T, T, 2), and (T, T, 3). Here, the last element in parentheses represents the number of the immediately previous activity.

図６（Ｂ）に示すように、標記を簡略化するため、適宜、（活動２、活動３，直前の活動）＝（Ｆ，Ｆ，１）、（Ｔ，Ｆ，２）、（Ｆ，Ｔ，３）、（Ｔ，Ｔ，２）、（Ｔ，Ｔ，３）の各状態Ｓをそれぞれ、Ｓ_００，１、Ｓ_１０，２、Ｓ_０１，３、Ｓ_１１，２、Ｓ_１１，３と記載する。ここで、Ｓ_ｉｊ，ｋにおいて、ｉは活動２の実行（１）、未実行（０）を示し、ｊは活動３の実行（１）、未実行（０）を示し、ｋは直前の活動の番号を示している。 As shown in Figure 6(B), in order to simplify the notations, (activity 2, activity 3, previous activity) = (F, F, 1), (T, F, 2), (F, S _00,1 , S _10,2 , S _01,3 , S _11,2 , S _11, It is written as ₃ . Here, in S _ij,k , i indicates execution (1) or non-execution (0) of activity 2, j indicates execution (1) or non-execution (0) of activity 3, and k indicates the immediately preceding activity. It shows the number.

状態空間Ｐの構成が完了すると、モデル構築部１３は状態空間Ｐを記憶部１０に記憶させ、その後、ステップＳＴ４を実行する。 When the configuration of the state space P is completed, the model construction unit 13 stores the state space P in the storage unit 10, and then executes step ST4.

モデル構築部１３はステップＳＴ４において、報酬設定処理を行う。報酬設定処理は、抽出グラフ３２、及び、ステップＳＴ１において予め入力された規則（上位概念ユニット３０）に基づいて、状態空間Ｐに含まれる各状態Ｓと、その状態Ｓの下で選択可能な行動Ａとの組み合わせに対して報酬Ｒ（より詳細には即時報酬）を付与する処理である。具体的には、モデル構築部１３は状態空間Ｐの全ての状態Ｓと、その状態Ｓから状態空間Ｐに含まれる状態Ｓへ移る全ての行動Ａとの組み合わせそれぞれについて報酬設定処理を行って報酬テーブル４０（図７参照）を構成し、記憶部１０に記憶させる。図７に示すように、報酬テーブル４０には、各状態Ｓに対して各行動Ａにより得られる報酬Ｒが記載されている。 The model construction unit 13 performs remuneration setting processing in step ST4. The reward setting process is performed based on the extraction graph 32 and the rules (superordinate concept unit 30) input in advance in step ST1, for each state S included in the state space P and the actions that can be selected under the state S. This is a process of giving reward R (more specifically, immediate reward) to the combination with A. Specifically, the model construction unit 13 performs reward setting processing for each combination of all the states S in the state space P and all the actions A that move from the state S to the states S included in the state space P. A table 40 (see FIG. 7) is constructed and stored in the storage unit 10. As shown in FIG. 7, the reward table 40 describes the reward R obtained by each action A for each state S.

図８を参照して、報酬設定処理の詳細を説明する。以下、説明の便宜上、報酬Ｒを設定する状態Ｓ及び行動Ａにおいて、当該状態Ｓを遷移元と記載し、遷移元において行動Ａを行うことによって遷移する先の状態Ｓを遷移先と記載する。すなわち、図９に示すように、遷移元（遷移先）は状態空間Ｐに含まれる状態Ｓの一つであり、状態空間Ｐに含まれる全ての状態Ｓが遷移元（遷移先）になり得る。行動Ａは遷移元から遷移先への遷移の可否や遷移における活動の有無に依らず、遷移元から遷移先に遷移するために要する活動として定義され、いずれの活動も実行しないことをも含む。 Details of the remuneration setting process will be described with reference to FIG. 8. Hereinafter, for convenience of explanation, in the state S and action A in which the reward R is set, the state S will be described as a transition source, and the state S to which the state S to which the action A is transitioned by performing action A in the transition source will be described as the transition destination. That is, as shown in FIG. 9, a transition source (transition destination) is one of the states S included in the state space P, and all states S included in the state space P can be a transition source (transition destination). . Action A is defined as an activity required to transition from a transition source to a transition destination, regardless of whether the transition is possible or not, or whether there is an activity during the transition, and includes not performing any activity.

報酬設定処理の最初のステップＳＴ１１において、モデル構築部１３は、遷移元と遷移先とが同一かを判定する。同一である場合には、モデル構築部１３はステップＳＴ１２を、異なる場合はステップＳＴ１３を実行する。 In the first step ST11 of the reward setting process, the model construction unit 13 determines whether the transition source and transition destination are the same. If they are the same, the model construction unit 13 executes step ST12, and if they are different, executes step ST13.

モデル構築部１３は、ステップＳＴ１２において、遷移元において、全ての活動が実行済の場合（図７では、遷移元及び遷移先が共にＳ_１１，２、又は、共にＳ_１１，３が該当）は報酬Ｒを零に設定し、それ以外の場合（遷移元において、少なくとも一つの活動が実行済でない場合）には報酬Ｒを負の値（本実施形態では、－１）に設定する（図７の最も薄い網掛けがされている部分参照）。設定が完了すると、モデル構築部１３は、報酬設定処理（すなわち、ＳＴ４）を終える。 In step ST12, the model construction unit 13 determines that if all activities have been executed in the transition source (in FIG. 7, both the transition source and the transition destination correspond to S _11,2 or both correspond to S _11,3 ), The reward R is set to zero, and in other cases (when at least one activity has not been executed in the transition source), the reward R is set to a negative value (-1 in this embodiment) (Fig. 7 (See the lightest shaded part). When the settings are completed, the model construction unit 13 ends the reward setting process (ie, ST4).

モデル構築部１３は、ステップＳＴ１３において、遷移元と遷移先とが一つの活動を実行することによって遷移可能であるかを判定する。遷移元と遷移先が遷移可能でない場合には、遷移元から遷移先に遷移するためには２つ以上の活動を実行する必要がある場合（例えば、遷移元がＳ_００，１であり、遷移先がＳ_１１，２の場合）、及び、遷移元の直前の活動と、遷移後の直前の活動とが等しい場合（例えば、遷移元がＳ_１０，２であり、遷移先がＳ_１１，２の場合）が含まれる。モデル構築部１３は、遷移元と遷移先とが一つの活動を実行することによって遷移可能であると判定した場合には、ステップＳＴ１４を、遷移不可能であると判定した場合には、ステップＳＴ１５を実行する。 In step ST13, the model construction unit 13 determines whether the transition source and the transition destination can be transitioned by executing one activity. If the source and destination are not transitionable, if more than one activity needs to be performed to transition from the source to the destination (for example, if the source is S _00,1 and the transition When the activity immediately before the transition source is equal to the activity immediately before the transition ( _for example, when the transition source is S _10,2 and the transition destination is S _11,2 ) ) is included. If the model construction unit 13 determines that the transition source and the transition destination can be transitioned by executing one activity, the model construction unit 13 performs step ST14, and if it determines that the transition is not possible, the model construction unit 13 performs step ST15. Execute.

モデル構築部１３は、ステップＳＴ１４において、遷移元及び遷移先をそれぞれ抽出された上位概念ユニット３０それぞれに対して照合し、遷移元及び遷移先が上位概念ユニット３０の示す条件に適合するかを判定する。より詳細には、モデル構築部１３は、上位概念ユニット３０の経路元活動が遷移元の直前の活動に適合し、且つ、経路先活動が遷移先の直前の活動に適合するときに、遷移元及び遷移先が上位概念ユニット３０の示す条件に合致すると判定する。モデル構築部１３は、経路元活動（経路先活動）に記載された単語と、遷移元（遷移先）の直前の活動に含まれる単語との類否判定を行い、両者が類似である場合に、経路元活動と遷移元の直前の活動とが適合すると判定するとよい。 In step ST14, the model construction unit 13 compares the transition source and the transition destination with each of the extracted superordinate concept units 30, and determines whether the transition source and the transition destination meet the conditions indicated by the superordinate concept unit 30. do. More specifically, the model construction unit 13 selects the transition source when the route source activity of the superordinate concept unit 30 matches the activity immediately before the transition source, and the route destination activity matches the activity immediately before the transition destination. It is determined that the transition destination matches the conditions indicated by the superordinate concept unit 30. The model construction unit 13 determines the similarity between the word written in the route source activity (route destination activity) and the word included in the activity immediately before the transition source (transition destination), and if the two are similar, , it may be determined that the route source activity and the activity immediately before the transition source match.

例えば、図６（Ｂ）に示す状態空間Ｐにおいて、図４に示す上位概念ユニット３０の示す条件を満たす遷移元は直前の活動が活動１となる状態Ｓであり、上位概念ユニット３０の示す条件を満たす遷移先は直前の活動が活動２（上段）又は活動３（下段）となる状態Ｓである（図７の最も濃い網掛け部分を参照）。よって、遷移元及び遷移先が上位概念ユニット３０の示す条件に合致する組み合わせは、遷移元がＳ_００，１、且つ、遷移先がＳ_１０，２である場合（太さ５）と、遷移元がＳ_００，１、且つ、遷移先がＳ_１０，３である場合（太さ４）になる。 For example, in the state space P shown in FIG. 6B, the transition source that satisfies the condition indicated by the superordinate concept unit 30 shown in FIG. The transition destination that satisfies this condition is the state S in which the immediately preceding activity becomes activity 2 (upper row) or activity 3 (lower row) (see the darkest shaded area in FIG. 7). Therefore, the combinations where the transition source and the transition destination meet the conditions indicated by the superordinate concept unit 30 are the case where the transition source is S _00,1 and the transition destination is S _10,2 (thickness 5), and the case where the transition source is S 00,1 and the transition destination is S 10,2 (thickness 5) is S _00,1 and the transition destination is S _10,3 (thickness 4).

その後、モデル構築部１３は、遷移元及び遷移先がともに適合する上位概念ユニット３０に含まれる太さに基づいて、報酬Ｒを設定する。より具体的には、モデル構築部１３は遷移元及び遷移先がともに適合する上位概念ユニット３０の太さの値を取得し、その値それぞれに所定の値（本実施形態では１）を積算することによって換算し、その和を取って、報酬Ｒを算出する。モデル構築部１３は、遷移元及び遷移先がともに適合する上位概念ユニット３０がない場合（図７の中程度の濃さの網掛け部分を参照）は、報酬Ｒを零に設定する。報酬Ｒの設定が完了すると、モデル構築部１３は、報酬設定処理（ＳＴ４）を終える。 Thereafter, the model construction unit 13 sets the reward R based on the thickness included in the superordinate concept unit 30 to which both the transition source and transition destination fit. More specifically, the model construction unit 13 acquires the thickness values of the superordinate concept unit 30 that both the transition source and the transition destination match, and adds a predetermined value (1 in this embodiment) to each of the thickness values. The remuneration R is calculated by calculating the sum. The model construction unit 13 sets the reward R to zero when there is no superordinate concept unit 30 that matches both the transition source and the transition destination (see the medium-dark shaded area in FIG. 7). When the setting of the reward R is completed, the model construction unit 13 finishes the reward setting process (ST4).

モデル構築部１３はステップＳＴ１５において、報酬Ｒを負の値（本実施形態では、－１）に設定する。設定が完了すると、モデル構築部１３は報酬設定処理（ＳＴ４）を終える。 In step ST15, the model construction unit 13 sets the reward R to a negative value (-1 in this embodiment). When the settings are completed, the model construction unit 13 ends the reward setting process (ST4).

モデル構築部１３は報酬設定処理（ＳＴ４）が完了すると、図５に示すように、ステップＳＴ５において、付与された報酬Ｒに基づいて強化学習を行う学習処理を実行する。より詳細には、学習処理において、モデル構築部１３は報酬テーブル４０を用いて、公知のアルゴリズムにより強化学習を行って、学習モデルを示すＱテーブル５０（図９参照）を生成する。Ｑテーブル５０は、状態Ｓにあるときに、行動Ａ（遷移）を行うことの価値、すなわち、行動価値関数Ｑ（Ｓ，Ａ）（Ｓは状態、Ａは行動）を示している。但し、状態Ｓの間の遷移が実質的に不可能である場合（例えば、例えば、遷移元がＳ_００，１であり、遷移先がＳ_１１，２の場合）は報酬Ｒが負に設定されているため、強化学習によってその価値が０となり、その行動Ａ（遷移）が行われない。Ｑテーブル５０の生成に用いられるアルゴリズムはいかなるものであってもよく、例えば、動的計画法、モンテカルロ法、ＴＤ法（より具体的には、Ｑ学習や、ＳＡＲＳＡ等）のいずれに基づくものであってよい。Ｑテーブル５０の生成が完了すると、モデル構築部１３はモデル構築処理を終える。 When the reward setting process (ST4) is completed, the model construction unit 13 executes a learning process in which reinforcement learning is performed based on the given reward R in step ST5, as shown in FIG. More specifically, in the learning process, the model construction unit 13 uses the reward table 40 to perform reinforcement learning using a known algorithm to generate a Q table 50 (see FIG. 9) that represents the learning model. The Q table 50 shows the value of performing action A (transition) when in state S, that is, action value function Q(S, A) (S is state, A is action). However, if a transition between states S is virtually impossible (for example, when the transition source is S _00,1 and the transition destination is S _11,2 ), the reward R is set to negative. Therefore, reinforcement learning reduces its value to 0, and that action A (transition) is not performed. Any algorithm may be used to generate the Q-table 50, for example, it may be based on dynamic programming, Monte Carlo method, or TD method (more specifically, Q-learning, SARS, etc.). It's good. When the generation of the Q table 50 is completed, the model construction unit 13 ends the model construction process.

次に、図１０を参照して、支援処理部１４によって実行される支援処理の詳細について説明する。支援処理部１４は支援処理の最初のステップＳＴ２１において、ユーザに支援すべき設計の工程等の情報を入力させるべく出力部１２に出力を行い、入力部１１から支援すべき工程に係る情報をテキストにより取得する。取得が完了すると、支援処理部１４はステップＳＴ２２を実行する。 Next, details of the support processing executed by the support processing unit 14 will be described with reference to FIG. In the first step ST21 of the support process, the support processing unit 14 outputs information to the output unit 12 in order to allow the user to input information such as the design process to be supported, and outputs information regarding the process to be supported from the input unit 11 to text. Obtained by When the acquisition is completed, the support processing unit 14 executes step ST22.

支援処理部１４はステップＳＴ２２において、抽出グラフ３２のノード２２に対応する活動から、取得した情報と最も適合するものを抽出する。抽出が完了すると、支援処理部１４はステップＳＴ２３を実行する。 In step ST22, the support processing unit 14 extracts, from the activities corresponding to the nodes 22 of the extraction graph 32, those that most match the acquired information. When the extraction is completed, the support processing unit 14 executes step ST23.

支援処理部１４はステップＳＴ２３において、ＳＴ２２において抽出された活動を行うようにユーザに指示する出力を出力部１２に行う。出力が完了すると、支援処理部１４はステップＳＴ２４を実行する。 In step ST23, the support processing unit 14 outputs to the output unit 12 an instruction to the user to perform the activity extracted in ST22. When the output is completed, the support processing section 14 executes step ST24.

支援処理部１４はステップＳＴ２４において、ステップＳＴ２２において抽出された活動に基づいて、状態空間Ｐから一つの状態を抽出し、学習モデルに基づいた出力を行うための初期状態に設定する。本実施形態では、支援処理部１４はＳＴ２２において抽出された活動に対応するノード２２が、抽出グラフ３２の最も上流に位置するノード２２である場合には、その活動のみが実行済となった状態Ｓを初期状態に設定する。支援処理部１４は、ＳＴ２２において抽出された活動に対応するノード２２が抽出グラフ３２の最も上流に位置するノード２２でない場合には、最も上流に位置するノード２２に対応する活動、及び、ステップＳＴ２２において抽出された活動が実行済となった状態Ｓを初期状態に設定する。初期状態の設定が完了すると、支援処理部１４はステップＳＴ２５を実行する。 In step ST24, the support processing unit 14 extracts one state from the state space P based on the activity extracted in step ST22, and sets it as an initial state for outputting based on the learning model. In this embodiment, if the node 22 corresponding to the activity extracted in ST22 is the node 22 located most upstream in the extraction graph 32, the support processing unit 14 is in a state where only that activity has been executed. Set S to the initial state. If the node 22 corresponding to the extracted activity in ST22 is not the node 22 located most upstream in the extraction graph 32, the support processing unit 14 selects the activity corresponding to the most upstream node 22 and step ST22. The state S in which the extracted activity has been executed is set as the initial state. When the initial state setting is completed, the support processing unit 14 executes step ST25.

支援処理部１４はステップＳＴ２５において、初期状態から学習モデル（Ｑテーブル５０）に基づいて、現在の状態Ｓに対応する最も価値のある行動Ａを選択して状態Ｓを遷移させる工程を繰り返し行う。その際、支援処理部１４は、状態遷移ごとに行動Ａに対応する活動を行わせるべくユーザに指示する出力を出力部１２に行う。支援処理部１４は初期状態において実行済となっている活動を除き、全ての活動が完了すると、支援処理を終える。 In step ST25, the support processing unit 14 repeatedly performs the process of selecting the most valuable action A corresponding to the current state S and transitioning the state S based on the learning model (Q table 50) from the initial state. At this time, the support processing unit 14 outputs an output to the output unit 12 instructing the user to perform the activity corresponding to the action A for each state transition. The support processing unit 14 ends the support processing when all activities are completed except for the activities that have already been executed in the initial state.

次に、設計支援装置１の動作について説明する。設計対象となる車両が決定すると、エキスパートは上位概念モデル１６から設計対象となる車両に適合する上位概念ユニット３０を選択することによって、設計を行う際の規則を入力する（ＳＴ１）。図４には、選択された上位概念ユニット３０の例が示されている。その後、モデル構築部１３は、上位概念ユニット３０に含まれる経路元活動及び経路先活動と、ナレッジグラフ２１のノード２２に対応する活動とを照合（図３の「マッチ」を参照）することによって、ナレッジグラフ２１の一部の領域（図３の破線で囲まれた部分、及び、図６（Ａ）を参照）を抽出グラフ３２として抽出する（ＳＴ２）。 Next, the operation of the design support device 1 will be explained. When the vehicle to be designed is determined, the expert selects the general concept unit 30 suitable for the vehicle to be designed from the general concept model 16, thereby inputting rules for designing (ST1). FIG. 4 shows an example of the selected general concept unit 30. Thereafter, the model construction unit 13 collates the route source activity and route destination activity included in the superordinate concept unit 30 with the activity corresponding to the node 22 of the knowledge graph 21 (see "Match" in FIG. 3). , a part of the area of the knowledge graph 21 (see the part surrounded by the broken line in FIG. 3 and FIG. 6(A)) is extracted as the extraction graph 32 (ST2).

次に、モデル構築部１３は抽出グラフ３２を用いて状態空間Ｐ（図６（Ｂ）を参照）を生成する（ＳＴ３）。その後、モデル構築部１３は、図７に示すように、モデル構築部１３は、状態Ｓのそれぞれと、状態Ｓの遷移を引き起こす行動Ａとの組み合わせに対して、それぞれ報酬Ｒを設定し、報酬テーブル４０（図７参照）を作成する（ＳＴ４）。報酬テーブル４０の設定が完了すると、モデル構築部１３は強化学習を行い、学習モデルであるＱテーブル５０（図９参照）を生成する（ＳＴ５）。これにより、学習モデルの構築が完了する。 Next, the model construction unit 13 generates a state space P (see FIG. 6(B)) using the extracted graph 32 (ST3). Thereafter, as shown in FIG. 7, the model construction unit 13 sets a reward R for each combination of each state S and an action A that causes a transition of the state S, and A table 40 (see FIG. 7) is created (ST4). When the setting of the reward table 40 is completed, the model construction unit 13 performs reinforcement learning and generates a Q table 50 (see FIG. 9), which is a learning model (ST5). This completes the construction of the learning model.

その後、出力部１２の出力に従い、ユーザが入力部１１に支援すべき工程に係る情報を入力すると、支援処理部１４はその情報を取得し（ＳＴ２１）、抽出グラフ３２のノード２２に対応する活動から、取得した情報に最も適合したものを抽出する（ＳＴ２２）。 Thereafter, when the user inputs information related to the process to be supported into the input unit 11 according to the output of the output unit 12, the support processing unit 14 acquires the information (ST21) and performs the activity corresponding to the node 22 of the extraction graph 32. From these, the one that best matches the acquired information is extracted (ST22).

より具体的には、ユーザが「扉上部の設計を行いたい」と入力を行うと、支援処理部１４は「扉上部」の単語を抽出し、抽出グラフ３２（図６（Ａ）参照）の中から、適合する活動として活動１「扉上部の管理目標を設定する」を抽出する。 More specifically, when the user inputs "I want to design the upper part of the door," the support processing unit 14 extracts the word "upper part of the door" and extracts the word "upper part of the door" and extracts it from the extraction graph 32 (see FIG. 6(A)). From among them, activity 1 "Setting management goals for the upper part of the door" is extracted as a matching activity.

その後、支援処理部１４は、活動１の実行、すなわち扉上部の管理目標の設定を指示する出力を出力部１２に行う（ＳＴ２３）。次に、支援処理部１４は、初期状態として活動１のみが実行されている状態Ｓ_００，１を設定し（ＳＴ２４）、学習モデル（Ｑテーブル５０）に基づいて、価値の高い行動Ａ（遷移）を順次選択し、その行動Ａを行わせるべく、ユーザに実行を指示する（ＳＴ２５）。 Thereafter, the support processing unit 14 outputs to the output unit 12 an instruction to execute activity 1, that is, to set a management target for the upper part of the door (ST23). Next, the support processing unit 14 sets a state _S00,1 in which only activity 1 is being executed as an initial state (ST24), and based on the learning model (Q table 50), the high value action A (transition ) and instruct the user to perform the action A (ST25).

図１１には、強化学習によって得られたＱテーブル５０に対応する状態遷移図が示されている。支援処理部１４は初期状態として状態Ｓ_００，１を設定し、Ｑテーブル５０を参照して、価値の高い、活動２の実行、すなわち、部位αの荷重入力の検討をユーザに指示する。これにより、状態Ｓは状態Ｓ_００，１から状態Ｓ_１０，２に遷移する。 FIG. 11 shows a state transition diagram corresponding to the Q table 50 obtained by reinforcement learning. The support processing unit 14 sets the state S _00,1 as the initial state, refers to the Q table 50, and instructs the user to perform activity 2, which is of high value, that is, consider inputting the load on the part α. As a result, the state S transitions from the state S _00,1 to the state S _10,2 .

次に、支援処理部１４は状態Ｓ_１０，２において、価値の高い活動３の実行、すなわち、部位βの耐力の検討をユーザに指示する。これにより、状態Ｓは状態Ｓ_１０，２から状態Ｓ_１１，３に遷移する。その後、支援処理部１４は、全ての活動が実行済であると判定し、支援処理を終える。 Next, in state _S10,2 , the support processing unit 14 instructs the user to perform a high-value activity 3, that is, to examine the resistance of the region β. As a result, the state S transitions from the state S _10,2 to the state S _11,3 . Thereafter, the support processing unit 14 determines that all activities have been performed, and ends the support processing.

次に、設計支援装置１、及び、モデル構築部１３によって実行される学習済モデルの構築方法の効果について説明する。 Next, the effects of the learned model construction method executed by the design support device 1 and the model construction unit 13 will be described.

設計支援装置１は、エキスパートの経験や知識に基づいて入力された規則（上位概念モデル１６）に基づいて報酬Ｒを設定し、学習モデルを構築する。設計支援装置１は、その学習モデルに基づいて活動指示を行う。このように、設計支援装置１は、エキスパートから入力された規則に基づき、自律的に学習モデルを構築し、その学習結果をもとにユーザに活動指示を行う推論エージェントとして機能する。 The design support device 1 sets a reward R based on a rule (superordinate concept model 16) input based on the experience and knowledge of an expert, and constructs a learning model. The design support device 1 issues activity instructions based on the learning model. In this way, the design support device 1 functions as an inference agent that autonomously constructs a learning model based on rules input from an expert and instructs the user to perform activities based on the learning results.

設計支援装置１による活動指示は、エキスパートから入力された規則（上位概念モデル１６、上位概念ユニット３０）に基づいて学習モデルを用いて行われる。そのため、設計支援装置１はエキスパートの行動に即した活動指示を行うことができる。更に、経験の少ない設計初心者であっても、数多の選択肢の中から価値の高い最適な活動を選択して実行することができるため、設計初心者がエキスパートに指示を仰ぐ機会を低減することができる。これにより、エキスパートの負担（例えば、設計初心者へ個別教育や業務ＯＪＴ等）を低減することができる。 Activity instructions by the design support device 1 are performed using a learning model based on rules (superordinate concept model 16, superordinate concept unit 30) input from an expert. Therefore, the design support device 1 can give activity instructions in line with the expert's actions. Furthermore, even design beginners with little experience can select and execute high-value, optimal activities from a large number of options, reducing the chances of design beginners asking experts for instructions. can. This can reduce the burden on experts (for example, individual training for beginners in design, on-the-job training, etc.).

設計支援装置１では、エキスパートが上位概念モデル１６から適した上位概念ユニット３０を選択すると、状態Ｓと、状態Ｓの間を遷移する行動Ａとの組み合わせそれぞれに報酬Ｒが設定される。そのため、エキスパートが状態Ｓと行動Ａとの組み合わせそれぞれに報酬Ｒを設定することを要せず、報酬Ｒの設定が容易である。 In the design support device 1, when the expert selects a suitable superordinate concept unit 30 from the superordinate concept model 16, a reward R is set for each combination of a state S and an action A that transitions between states S. Therefore, it is not necessary for the expert to set the reward R for each combination of the state S and the action A, and the reward R can be easily set.

状態Ｓはその状態Ｓに至る直前の活動によっても定義される。そのため、状態Ｓに至る直前の活動と、その後行い得る行動Ａとの組み合わせがエキスパートの設計手順に沿った規則に合致しているかを容易に判定することができる。また、状態Ｓ及び行動Ａがその規則に合致するかは、状態Ｓに至る直前の活動と、その後行い得る行動Ａに対応する活動を示す単語と、上位概念モデル１６の経路元活動及び経路先活動に係る単語とを比較することによって行われる。このように、単語を比較することによって、状態Ｓ及び行動Ａが規則に合致するかの判定を行うことができるため、その判定を機械的に行うことができ、その判定が容易である。 A state S is also defined by the activities immediately preceding that state S. Therefore, it is possible to easily determine whether the combination of the activity immediately before reaching the state S and the action A that can be performed thereafter matches the rules according to the expert's design procedure. In addition, whether state S and action A match the rules is determined by the activity immediately before reaching state S, the word indicating the activity corresponding to action A that can be performed after that, the route source activity of the superordinate concept model 16, and the route destination. This is done by comparing the words related to the activity. In this way, by comparing the words, it is possible to determine whether the state S and the action A match the rules, so the determination can be made mechanically and is easy.

＜＜第２実施形態＞＞
第２実施形態に係る学習済モデルの構築方法、及び、その方法を実施する設計支援装置１０１は、第１実施形態と比べて上位概念モデル１６の構成と、状態Ｓの定義と、学習モデル構築処理のＳＴ３～ＳＴ５と、支援処理のＳＴ２３、ＳＴ２４とが異なる。第２実施形態のその他の構成は第１実施形態と同様であるので、説明を省略する。 <<Second embodiment>>
The method for constructing a learned model according to the second embodiment and the design support device 101 that implements the method are different from those in the first embodiment in terms of the configuration of the superordinate concept model 16, the definition of the state S, and the construction of the learned model. Processing ST3 to ST5 is different from support processing ST23 and ST24. The rest of the configuration of the second embodiment is the same as that of the first embodiment, so a description thereof will be omitted.

図１２に示すように、第２実施形態に係る上位概念モデル１６には、第１実施形態に係る上位概念モデル１６に加えて設計条件が含まれる。設計条件は、エキスパートから入力されたテキストから抽出されてもよく、また、エキスパートから直接入力されてもよい。設計条件には、条件の番号（図１２ではａ，ｂ）、又は、条件が定まっていないことを示す番号（図１２ではｕ）が付されている。上位概念モデル１６には、条件の番号ごとに、活動対（経路元活動、及び経路先活動）と、対応する太さとが記録されている。 As shown in FIG. 12, the high-level concept model 16 according to the second embodiment includes design conditions in addition to the high-level concept model 16 according to the first embodiment. The design conditions may be extracted from text input by the expert, or may be input directly from the expert. Each design condition is given a condition number (a, b in FIG. 12) or a number indicating that the condition is not determined (u in FIG. 12). In the superordinate concept model 16, an activity pair (a route source activity and a route destination activity) and the corresponding thickness are recorded for each condition number.

モデル構築部１３は状態空間作成処理（図５のＳＴ３）において第１実施形態と同様に、抽出グラフ３２から上位概念ユニット３０に適合するノード２２を抽出し、抽出されたノード２２に基づいて状態空間生成処理を行う。但し、第２実施形態に係る状態Ｓは、図１２に示すように、ノード２２に対応する各活動の実行の有無と、その組み合わせに至る直前に行われた活動の番号と、設計条件の番号との組み合わせによって定義される。 In the state space creation process (ST3 in FIG. 5), the model construction unit 13 extracts nodes 22 that match the superordinate concept unit 30 from the extraction graph 32, and creates a state based on the extracted nodes 22, as in the first embodiment. Performs space generation processing. However, as shown in FIG. 12, the state S according to the second embodiment includes the presence or absence of execution of each activity corresponding to the node 22, the number of the activity performed immediately before the combination, and the number of the design condition. Defined in combination with

図６（Ａ）に示すように抽出グラフ３２に３つの活動（活動１～３）が含まれ、図１２に示すように、上位概念ユニット３０に２つの設計条件（ａ，ｂ，ｕ）が含まれる場合には、状態空間Ｐには１５の状態Ｓが含まれる。図１３では、各状態Ｓは、Ｓ_{ｉｊ，ｋ，ｌ}（ｉは活動２の実行（１）、未実行（０）を示し、ｊは活動３の実行（１）、未実行（０）を示し、ｋは直前の活動を示し、ｌは設計条件を示す）によって表現されている。 As shown in FIG. 6(A), the extraction graph 32 includes three activities (activities 1 to 3), and as shown in FIG. 12, the superordinate concept unit 30 has two design conditions (a, b, u). If so, the state space P includes 15 states S. In FIG. 13, each state S is S _ij,k,l (i indicates execution (1) or non-execution (0) of activity 2, and j indicates execution (1) or non-execution (0) of activity 3. where k indicates the previous activity and l indicates the design condition).

状態空間Ｐの生成が完了すると、モデル構築部１３は報酬設定処理を行う。第２実施形態において、モデル構築部１３は、設計条件ごとに報酬設定処理（ＳＴ４）を行い、学習モデル生成処理（ＳＴ５）を行なって、設計条件ごとに学習モデルを生成する。 When the generation of the state space P is completed, the model construction unit 13 performs reward setting processing. In the second embodiment, the model construction unit 13 performs a reward setting process (ST4) for each design condition, performs a learning model generation process (ST5), and generates a learning model for each design condition.

支援処理部１４は、支援処理の最初のステップＳＴ２１において、第１実施形態と同様に、支援すべき工程に係る情報を取得する。その後、支援処理部１４は、ステップＳＴ２２において、第１実施形態と同様に、抽出グラフ３２のノード２２に対応する活動から、取得した情報と最も適合するものを抽出する。その後、ステップＳＴ２３において、第１実施形態と同様に、支援処理部１４は、ＳＴ２２において抽出された活動を行うようにユーザに指示する出力を出力部１２に行う。 In the first step ST21 of the support process, the support processing unit 14 acquires information regarding the process to be supported, similarly to the first embodiment. Thereafter, in step ST22, the support processing unit 14 extracts the activity that most matches the acquired information from the activities corresponding to the node 22 of the extraction graph 32, as in the first embodiment. Thereafter, in step ST23, similarly to the first embodiment, the support processing unit 14 outputs to the output unit 12 an instruction to the user to perform the activity extracted in ST22.

次に、支援処理部１４はステップＳＴ２３において、ステップＳＴ２２において抽出された活動に基づいて、状態空間Ｐから一つの状態Ｓを抽出し、学習モデルに基づいた出力を行うための初期状態に設定する。本実施形態では、まず、支援処理部１４はＳＴ２１において取得した情報に基づいて、支援すべき工程の設計条件を取得し、設計条件の番号を設定する。但し、設計条件が取得できなかったとき、又は、条件が未定であると判定したときには、支援処理部１４は設計条件の番号に、条件が未定であることを示す番号（本実施形態ではｕ）を設定する。 Next, in step ST23, the support processing unit 14 extracts one state S from the state space P based on the activity extracted in step ST22, and sets it as an initial state for outputting based on the learning model. . In this embodiment, the support processing unit 14 first obtains the design conditions of the process to be supported based on the information obtained in ST21, and sets the number of the design conditions. However, when the design condition could not be acquired or when it is determined that the condition is undetermined, the support processing unit 14 adds a number (u in this embodiment) indicating that the condition is undetermined to the design condition number. Set.

次に、支援処理部１４はステップＳＴ２４において、ＳＴ２２において抽出された活動に対応するノード２２が、抽出グラフ３２の最も上流に位置するノード２２である場合には、設定した設計条件の番号に該当し、且つ、その活動のみが実行済となった状態Ｓを初期状態に設定する。支援処理部１４は、ＳＴ２２において抽出された活動に対応するノード２２が抽出グラフ３２の最も上流に位置するノード２２でない場合には、設定した設計条件の番号に該当し、且つ、最も上流に位置するノード２２に対応する活動、及び、ステップＳＴ２２において抽出された活動が実行済となった状態Ｓを初期状態に設定する。但し、ＳＴ２１において取得した情報に設計条件に係る情報が含まれている場合には、支援処理部１４は、対応する設計条件の番号として、条件が未定であることを示す番号（本実施形態ではｕ）を用いる。初期状態の設定が完了すると、支援処理部１４はステップＳＴ２５を実行する。 Next, in step ST24, the support processing unit 14 determines that if the node 22 corresponding to the activity extracted in ST22 is the node 22 located most upstream in the extraction graph 32, it corresponds to the number of the set design condition. Then, a state S in which only that activity has been executed is set as the initial state. If the node 22 corresponding to the activity extracted in ST22 is not the node 22 located at the most upstream position in the extraction graph 32, the support processing unit 14 selects a node 22 that corresponds to the set design condition number and is located at the most upstream position. The state S in which the activity corresponding to the node 22 and the activity extracted in step ST22 have been executed is set to the initial state. However, if the information acquired in ST21 includes information related to design conditions, the support processing unit 14 sets a number indicating that the condition is undetermined (in this embodiment, as the number of the corresponding design condition). Use u). When the initial state setting is completed, the support processing unit 14 executes step ST25.

支援処理部１４はステップＳＴ２５において、第１実施形態と同様に、初期状態から学習モデル（Ｑテーブル５０）に基づいて、現在の状態Ｓに対応する最も価値のある行動Ａを選択して状態Ｓを遷移させる工程を繰り返し行う。その際、支援処理部１４は、状態遷移ごとに行動Ａに対応する活動を行うようにユーザに指示する出力を出力部１２に行う。但し、入力部１１において、設計条件の変更を示す入力を受け付けると、支援処理部１４は状態Ｓを設計条件のみを変更した状態Ｓを遷移させる。その後、支援処理部１４は、現在の状態Ｓに対応する最も価値のある行動Ａを選択して状態遷移を行い、順次、行動Ａに対応する活動を行うようにユーザに指示する出力を出力部１２に行う。全ての活動が実行済となると、支援処理部１４は、支援処理を終了する。 In step ST25, the support processing unit 14 selects the most valuable action A corresponding to the current state S based on the learning model (Q table 50) from the initial state and changes it to the state S, as in the first embodiment. The process of transitioning is repeated. At this time, the support processing unit 14 outputs to the output unit 12 an output instructing the user to perform an activity corresponding to action A for each state transition. However, when the input unit 11 receives an input indicating a change in design conditions, the support processing unit 14 transitions the state S to a state S in which only the design conditions have been changed. After that, the support processing unit 14 selects the most valuable action A corresponding to the current state S, performs a state transition, and outputs an output instructing the user to perform the activity corresponding to action A in sequence. It will be held on the 12th. When all activities have been executed, the support processing unit 14 ends the support processing.

このように構成した設計支援装置１０１の動作、及び、効果について説明する。 The operation and effects of the design support apparatus 101 configured in this way will be explained.

設計を開始した段階では、設計対象に求められる設計条件が定まっていないことがある。その場合には、設計条件の番号に、条件が未定であることを示す番号（本実施形態ではｕ）が設定される。その後、ＳＴ２２において抽出された活動に活動１に合致すると、状態ＳはＳ_{００，１，u}となり、活動１が実行される。 At the beginning of design, the design conditions required for the design object may not be determined. In that case, a number indicating that the condition is undetermined (u in this embodiment) is set as the design condition number. Thereafter, if the extracted activity matches activity 1 in ST22, the state S becomes S _00,1,u , and activity 1 is executed.

活動１を実行した直後に、ユーザから設計条件がａとなったことを示す入力があった場合には、状態Ｓは、Ｓ_{００，１，ｕ}から、Ｓ_{００，１，ａ}に遷移する（図１３の破線矢印を参照）。その後、支援処理部１４は学習モデル（Ｑテーブル５０）に基づいて、現在の状態Ｓに対応する最も価値のある行動Ａを選択して状態遷移を行い、順次、行動Ａに対応する活動を行うようにユーザに指示する出力を出力部１２に行う。 Immediately after executing activity 1, if there is an input from the user indicating that the design condition has become a, the state S transitions from S _{00, 1, u} to S _{00, 1, a} ( (See dashed arrow in FIG. 13). After that, the support processing unit 14 selects the most valuable action A corresponding to the current state S based on the learning model (Q table 50), performs a state transition, and sequentially performs the activities corresponding to the action A. The output unit 12 outputs instructions to the user as follows.

このように、状態Ｓは、設計に係る複数の活動の実行の有無と、その組み合わせに至る直前に行われた活動の番号と、設計条件とを含む組み合わせによって定義されている。これにより、設計の条件が変更され、設計の条件に係る入力が行われた場合に、状態Ｓを入力された設計の条件に合致するものに遷移させることができる。その後、設計支援装置１は受け付けられた設計の条件に合致した学習モデルを用いて、価値を最大にする行動を順次行うように指示を行う。よって、設計支援装置１によって設計条件に応じた設計支援が可能となる。 In this way, the state S is defined by a combination including whether or not a plurality of activities related to the design are executed, the number of the activity performed immediately before the combination, and the design condition. Thereby, when the design conditions are changed and an input related to the design conditions is performed, the state S can be changed to one that matches the input design conditions. Thereafter, the design support device 1 instructs the user to sequentially perform actions that maximize value using a learning model that matches the conditions of the accepted design. Therefore, the design support device 1 enables design support according to design conditions.

特に、本実施形態では、設計条件を示す番号に、未定であることを示す番号が含まれる。そのため、設計条件が詳細に定まっていない場合でも、設計支援を行うことができる。また、設計条件が定まると、その条件に合った状態Ｓに遷移し、設計条件に沿った活動の指示が行われるため、設計開発の進捗により合致した設計支援を行うことができる。 In particular, in this embodiment, the numbers indicating design conditions include numbers indicating that they are undetermined. Therefore, even if design conditions are not determined in detail, design support can be provided. Furthermore, once the design conditions are determined, the system transitions to a state S that meets the conditions, and instructions for activities in accordance with the design conditions are given, making it possible to provide design support that better matches the progress of design and development.

以上で具体的実施形態の説明を終えるが、本発明は上記実施形態に限定されることなく幅広く変形実施することができる。 Although the description of the specific embodiments has been completed above, the present invention is not limited to the above-mentioned embodiments and can be widely modified and implemented.

上記実施形態では、上位概念ユニット３０は活動の順序に係るものであったが、この態様には限定されない。例えば、上位概念ユニット３０は経路元活動と経路先活動との対応するノード２２のナレッジグラフ２１における位置に関するものであってもよい。より具体的には、上位概念モデル１６は、上位概念ユニット３０として、ナレッジグラフ２１において、経路元活動に対応するノード２２が、経路先活動に対応するノード２２の上層に位置するときに、所定値の太さに設定するものを含むとよい。エキスパートがこの上位概念ユニット３０を選択すると、上層に位置するノード２２に対応する活動を終えた後、下層に位置するノード２２に対応する活動を実行して状態が遷移すると、正の報酬Ｒが付与される。これにより、上層に位置するノード２２に位置する活動から下層に位置するノード２２に対応する活動に向かって順に活動指示が行われ易くなる。これにより、後に指示される活動が先に指示される活動の下位概念（下の抽象レベル）となり易くなるため、ユーザが活動指示の内容を理解し易くなる。 In the embodiment described above, the general concept unit 30 is related to the order of activities, but it is not limited to this aspect. For example, the superordinate unit 30 may relate to the position in the knowledge graph 21 of the corresponding node 22 of the source activity and the destination activity. More specifically, the superordinate concept model 16, as the superordinate concept unit 30, uses a predetermined method when a node 22 corresponding to a route source activity is located in a layer above a node 22 corresponding to a route destination activity in the knowledge graph 21. It is good to include something to set the thickness of the value. When the expert selects this superordinate concept unit 30, after finishing the activity corresponding to the node 22 located in the upper layer, and executing the activity corresponding to the node 22 located in the lower layer and the state transitions, a positive reward R will be given. Granted. This makes it easier to give activity instructions in order from the activity located at the node 22 located in the upper layer to the activity corresponding to the node 22 located in the lower layer. This makes it easier for the user to understand the content of the activity instruction because the activity instructed later becomes a subordinate concept (at a lower abstraction level) of the activity instructed earlier.

更に、上位概念モデル１６は、上位概念ユニット３０として、経路元活動に対応するノード２２と、経路先活動に対応するノード２２との階層の差が所定値以上（例えば、３階層以上）であるときには、報酬Ｒを設定しない（零に設定する）ものを含むとよい。経路元活動に対応するノード２２と、経路先活動に対応するノード２２との階層の差が大きくなると、経路元活動の直後に経路先活動が実行される状態の遷移に対して、報酬Ｒが設定されなくなる。これにより、経路元活動と経路先活動との抽象レベルの差が大きく、両者の活動の流れが理解し難いものに報酬Ｒが設定されなくなるため、活動の流れがユーザに理解し難くなることが防止できる。 Further, in the superordinate concept model 16, as the superordinate concept unit 30, the difference in hierarchy between the node 22 corresponding to the route source activity and the node 22 corresponding to the route destination activity is a predetermined value or more (for example, 3 or more hierarchies). In some cases, it is good to include a reward R that is not set (set to zero). When the difference in the hierarchy between the node 22 corresponding to the route source activity and the node 22 corresponding to the route destination activity becomes larger, the reward R becomes smaller for a state transition in which the route destination activity is executed immediately after the route source activity. It will no longer be set. As a result, the reward R will not be set for cases where the difference in abstraction level between the route source activity and the route destination activity is large and the flow of activities between the two is difficult to understand, so the flow of activities will not be difficult for the user to understand. It can be prevented.

上記実施形態では、支援処理部１４はステップＳＴ２５において、学習モデルに基づいて、状態遷移を繰り返し行い、支援処理部１４は、状態遷移ごとに活動を行うようにユーザに指示していたが、この態様には限定されない。例えば、支援処理部１４はステップＳＴ２５において、学習モデル（Ｑテーブル５０）に基づいて、現在の状態Ｓにおいて、選択可能な行動Ａと、それぞれに対応する価値とを出力部１２に出力し、入力部１１においてユーザから希望する行動Ａを受け付けて、入力に応じた状態遷移を繰り返すように構成してもよい。但し、支援処理部１４が選択可能な行動Ａと、それぞれに対応する価値とを出力部１２に出力する際には、行動Ａを価値の高い順に並べて出力部１２に出力するとよい。 In the above embodiment, the support processing unit 14 repeatedly performs state transitions based on the learning model in step ST25, and the support processing unit 14 instructs the user to perform an activity for each state transition. It is not limited to the aspect. For example, in step ST25, the support processing unit 14 outputs selectable actions A and their corresponding values to the output unit 12 in the current state S based on the learning model (Q table 50), and inputs The unit 11 may be configured to receive the desired action A from the user and repeat the state transition according to the input. However, when the support processing section 14 outputs the selectable actions A and their corresponding values to the output section 12, it is preferable to arrange the actions A in descending order of value and output them to the output section 12.

上記実施形態では、設計支援装置１、１０１は、ナレッジグラフ２１から抽出された抽出グラフに基づいて学習済モデルを構築して設計支援を行っていたが、この態様には限定されない。例えば、設計支援装置１、１０１は設計を行うために要する活動のリストを保持し、そのリストを用いて、上位概念モデル１６の上位概念ユニット３０と照合することにより、状態空間Ｐを生成し、報酬Ｒを設定した後、学習済モデルを生成するように構成されていてもよい。 In the embodiments described above, the design support apparatuses 1 and 101 perform design support by constructing a learned model based on the extracted graph extracted from the knowledge graph 21, but the design support apparatuses 1 and 101 are not limited to this aspect. For example, the design support device 1, 101 maintains a list of activities required to perform a design, and uses the list to generate the state space P by comparing it with the high-level concept unit 30 of the high-level concept model 16, After setting the reward R, the learned model may be generated.

上記実施形態では、設計支援装置１、１０１は１つのコンピュータによって構成されていたが、この態様には限定されない。設計支援装置１はインターネット等のネットワークによって接続された複数のコンピュータが協働することによって構成されていてもよい。設計支援装置１はネットワーク上の複数のサーバを含むいわゆるクラウドコンピューティングシステムによって構成されていてもよい。 In the embodiments described above, the design support apparatuses 1 and 101 are configured by one computer, but the design support apparatuses 1 and 101 are not limited to this aspect. The design support apparatus 1 may be configured by a plurality of computers connected through a network such as the Internet working together. The design support device 1 may be configured by a so-called cloud computing system including a plurality of servers on a network.

上記実施形態では、設計支援装置１は車両の設計支援に用いられていたが、この態様には限定されない。設計支援装置１は、例えば、船舶、航空機等の輸送機器や機械の設計支援に適用されてもよい。 In the embodiment described above, the design support device 1 is used for vehicle design support, but the present invention is not limited to this aspect. The design support device 1 may be applied, for example, to support the design of transportation equipment and machines such as ships and aircraft.

１：第１実施形態に係る設計支援装置
１６：上位概念モデル
２１：ナレッジグラフ
２２：ノード
２３：エッジ
５０：Ｑテーブル（学習済モデル）
１０１：第２実施形態に係る設計支援装置
Ａ：行動
Ｐ：状態空間
Ｒ：報酬
Ｓ：状態 1: Design support device according to the first embodiment 16: Higher level conceptual model 21: Knowledge graph 22: Node 23: Edge 50: Q table (learned model)
101: Design support device A according to the second embodiment: Action P: State space R: Reward S: State

Claims

The learned model maximizes the value by giving rewards to the combination of a state that is determined depending on whether or not a design activity is executed and a behavior that is the activity that can be selected under the state. A method for constructing a trained model, the method comprising:
granting the reward based on rules input in advance;
performing reinforcement learning based on the given reward,
The rule includes information regarding a route source activity, a route destination activity to be performed after the route source activity, and a thickness indicating the importance of the route destination activity being performed after the route source activity,
In the step of giving the reward, when the action immediately before reaching the state matches the route source activity, and the action and the route destination activity match, the reward is given based on the thickness. How to build the trained model to be set.

2. The learned model construction method according to claim 1, wherein the state is defined by a combination including whether or not a plurality of the activities are executed and the activity that was executed immediately before.

The learned model construction method according to claim 1, wherein the state is defined by a combination including whether or not a plurality of the activities are executed, the activity that was executed immediately before, and a number corresponding to a design condition. .

4. The learned model construction method according to claim 3, wherein the number corresponding to the design condition includes the number indicating that the design condition is undetermined.

Using a knowledge graph that includes a plurality of nodes corresponding to the activity related to the design and edges connecting the corresponding nodes based on the relationship between the activities, the activity corresponding to the node is selected from among the activities corresponding to the node. The learning according to any one of claims 1 to 4, comprising the step of extracting the activity for defining the state by matching it with a rule, and generating a state space that is a set of the states. How to build an existing model.

6. The learned model construction method according to claim 5, wherein the knowledge graph includes design information and data related to a design procedure.

In the knowledge graph, the nodes are recorded in layers,
7. The learned model construction method according to claim 5, wherein the rule includes that the node corresponding to the route source activity is located above the node corresponding to the route destination activity.

8. The trained model according to claim 7, wherein the rule is such that when a difference in hierarchy between the node corresponding to the route source activity and the node corresponding to the route destination activity is a predetermined value or more, the reward is not given. How to build.

said rule is entered by text,
In the step of generating the state space, the state space is selected from among the activities corresponding to the nodes by matching words included in the text with the activities corresponding to the nodes. The learned model construction method according to any one of claims 5 to 8, which extracts activities.

A design support device that performs design support based on a trained model constructed by the method according to any one of claims 1 to 9,
setting an initial state based on information input by the user;
A design support apparatus executes a step of outputting an instruction to the user to sequentially perform the actions that maximize value from the initial state.

A design support device that performs design support based on a learned model constructed by the method according to claim 3 or 4,
setting an initial state based on information input by the user;
outputting instructions to the user to sequentially perform the actions that maximize value from the initial state;
In the step of performing the output, when input related to the conditions of the design is received, after transitioning the state to one that matches the input conditions, in order to sequentially perform the actions that maximize the value, A design support device that performs the output to instruct the user.