JP2013101511A

JP2013101511A - Compound classification device, compound classification program, and compound classification method

Info

Publication number: JP2013101511A
Application number: JP2011244975A
Authority: JP
Inventors: Noriko Ikeda; 紀子池田; Kazunari Tanaka; 一成田中
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2011-11-08
Filing date: 2011-11-08
Publication date: 2013-05-23
Anticipated expiration: 2031-11-08
Also published as: JP5853608B2

Abstract

PROBLEM TO BE SOLVED: To classify compound groups on the basis of mother nucleuses representing partial structures on which compounds are based.SOLUTION: A compound classification device 100 detects a character string representing a name of a partial structure as mother nucleuses of respective compounds from compound names of the respective compounds in a compound group to be classified by reference to a storage part 110. Specifically, the compound classification device 100 determines whether a character string of t (t=1, 2, 3, ...) characters from the end of the compound name of the compound matches respective names stored in the storage part 110 by utilizing the fact that, for example, a character string representing a motor nucleus is at the rearmost part in substitutive nomenclature. The compound classification device 100 classifies the compound group to be classified on the basis of character strings representing the detected mother nucleuses of the respective compounds. Specifically, the compound classification device 100 groups and classifies, for example, compounds having the same character string representing the mother nucleuses with respect to first to fifth compounds to be classified.

Description

本発明は、化合物分類装置、化合物分類プログラムおよび化合物分類方法に関する。 The present invention relates to a compound classification device, a compound classification program, and a compound classification method.

化学系や薬学系などの特許文献や学術論文などの文書において、ある化合物の化合物名とともに、その化合物の代わりに用いてもよい他の化合物の化合物名が列挙される場合がある。また、文書に列挙された複数の化合物名から、どのような意図の化合物群であるか判断される場合がある。 In documents such as patent documents and academic papers such as chemical and pharmaceutical systems, there may be a case where the compound name of a certain compound is listed together with the compound name of another compound that may be used instead of the compound. In addition, the intended compound group may be determined from a plurality of compound names listed in the document.

関連する先行技術としては、例えば、テキストデータの相違行を文字列単位に比較し相違のあった文字列を抽出し、外部から与えた情報によって特定される文字列を無視することで残った行を相違点として認識し、相違点を編集しリスト出力する技術がある（例えば、下記特許文献１参照。）。 As related prior art, for example, by comparing different lines of text data in character string units, extracting different character strings, and ignoring character strings specified by information given from the outside, the remaining lines There is a technique for recognizing the difference as a difference, editing the difference, and outputting the list (for example, see Patent Document 1 below).

特開平７−１０４９９０号公報JP 7-104990 A

しかしながら、従来技術によれば、文書に列挙された化合物群の化合物名から、化合物同士の類似性や差分を判断することが難しいという問題がある。例えば、文書に列挙された化合物名が３つ以上あった場合、それらの化合物がどのように類似し、どのように異なるのか判断することが難しい。 However, according to the prior art, there is a problem that it is difficult to determine the similarity or difference between compounds from the compound names of the compound groups listed in the document. For example, when there are three or more compound names listed in a document, it is difficult to determine how similar those compounds are and how they differ.

本発明は、上述した従来技術による問題点を解消するため、化合物の基礎となる部分構造を表す母核をもとに化合物群を分類することができる化合物分類装置、化合物分類プログラムおよび化合物分類方法を提供することを目的とする。 The present invention solves the above-described problems caused by the prior art, a compound classification apparatus, a compound classification program, and a compound classification method capable of classifying a compound group based on a mother nucleus representing a partial structure that is a basis of a compound The purpose is to provide.

上述した課題を解決し、目的を達成するため、本発明の一側面によれば、化合物の母核となる部分構造の名称を記憶する記憶部を参照して、分類対象となる化合物群の各々の化合物の化合物名の中から、前記各々の化合物の母核となる部分構造の名称を表す文字列を検出し、検出した前記各々の化合物の母核を表す文字列に基づいて、前記化合物群を分類し、分類した分類結果を出力する化合物分類装置、化合物分類プログラムおよび化合物分類方法が提案される。 In order to solve the above-described problems and achieve the object, according to one aspect of the present invention, each of the compound groups to be classified is referred to by referring to the storage unit that stores the names of the partial structures that are the cores of the compounds. And detecting a character string representing the name of the partial structure serving as a mother nucleus of each compound from the compound names of the compounds, and based on the detected character string representing the mother nucleus of each compound, the compound group And a compound classification program, a compound classification program, and a compound classification method are proposed.

本発明の一側面によれば、化合物の基礎となる部分構造を表す母核をもとに化合物群を分類することができるという効果を奏する。 According to one aspect of the present invention, there is an effect that a group of compounds can be classified based on a mother nucleus representing a partial structure that is a basis of a compound.

図１は、実施の形態にかかる化合物分類方法の一実施例を示す説明図である。FIG. 1 is an explanatory diagram illustrating an example of a compound classification method according to an embodiment. 図２は、システム２００のシステム構成例を示す説明図である。FIG. 2 is an explanatory diagram illustrating a system configuration example of the system 200. 図３は、化合物分類装置１００のハードウェア構成例を示すブロック図である。FIG. 3 is a block diagram illustrating a hardware configuration example of the compound classification device 100. 図４は、構造解析ルールＤＢ２２０の記憶内容の一例を示す説明図である。FIG. 4 is an explanatory diagram showing an example of the contents stored in the structure analysis rule DB 220. 図５は、構造式ＤＢ２３０の記憶内容の一例を示す説明図である。FIG. 5 is an explanatory diagram showing an example of the contents stored in the structural formula DB 230. 図６は、基本構造抽出ルールＤＢ２４０の記憶内容の一例を示す説明図である。FIG. 6 is an explanatory diagram showing an example of the contents stored in the basic structure extraction rule DB 240. 図７は、化合物分類装置１００の機能的構成例を示すブロック図である。FIG. 7 is a block diagram illustrating a functional configuration example of the compound classification device 100. 図８は、分割テーブル８００の記憶内容の変遷例を示す説明図（その１）である。FIG. 8 is an explanatory diagram (part 1) illustrating a transition example of the storage contents of the division table 800. 図９は、分割テーブル８００の記憶内容の変遷例を示す説明図（その２）である。FIG. 9 is an explanatory diagram (part 2) illustrating a transition example of the storage contents of the division table 800. 図１０は、分割テーブル８００の記憶内容の変遷例を示す説明図（その３）である。FIG. 10 is an explanatory diagram (part 3) of a transition example of the stored contents of the division table 800. 図１１は、母核比較テーブルの記憶内容の変遷例を示す説明図（その１）である。FIG. 11 is an explanatory diagram (part 1) illustrating a transition example of the contents stored in the mother nucleus comparison table. 図１２は、母核比較テーブルの記憶内容の変遷例を示す説明図（その２）である。FIG. 12 is an explanatory diagram (part 2) of a transition example of the stored contents of the mother nucleus comparison table. 図１３は、母核比較テーブルの記憶内容の変遷例を示す説明図（その３）である。FIG. 13 is an explanatory diagram (part 3) illustrating a transition example of the contents stored in the mother nucleus comparison table. 図１４は、母核比較テーブルの記憶内容の変遷例を示す説明図（その４）である。FIG. 14 is an explanatory diagram (part 4) of a transition example of the contents stored in the mother nucleus comparison table. 図１５は、母核比較テーブルの記憶内容の変遷例を示す説明図（その５）である。FIG. 15 is an explanatory diagram (part 5) of a transition example of the stored contents of the mother nucleus comparison table. 図１６は、母核比較テーブルの記憶内容の変遷例を示す説明図（その６）である。FIG. 16 is an explanatory diagram (No. 6) showing a transition example of the contents stored in the mother nucleus comparison table. 図１７は、置換基比較テーブルの記憶内容の変遷例を示す説明図（その１）である。FIG. 17 is an explanatory diagram (part 1) illustrating a transition example of the contents stored in the substituent comparison table. 図１８は、置換基比較テーブルの記憶内容の変遷例を示す説明図（その２）である。FIG. 18 is an explanatory diagram (part 2) of a transition example of the contents stored in the substituent comparison table. 図１９は、置換基比較テーブルの記憶内容の変遷例を示す説明図（その３）である。FIG. 19 is an explanatory diagram (part 3) of a transition example of the contents stored in the substituent comparison table. 図２０は、置換基比較テーブルの記憶内容の変遷例を示す説明図（その４）である。FIG. 20 is an explanatory diagram (part 4) of a transition example of the contents stored in the substituent comparison table. 図２１は、置換基比較テーブルの記憶内容の変遷例を示す説明図（その５）である。FIG. 21 is an explanatory diagram (part 5) of a transition example of the contents stored in the substituent comparison table. 図２２は、比較リストの具体例を示す説明図（その１）である。FIG. 22 is an explanatory diagram (part 1) of a specific example of the comparison list. 図２３は、比較リストの具体例を示す説明図（その２）である。FIG. 23 is an explanatory diagram (part 2) of a specific example of the comparison list. 図２４は、比較リストの加工例を示す説明図である。FIG. 24 is an explanatory diagram of a processing example of the comparison list. 図２５は、化合物分類装置１００の化合物分類処理手順の一例を示すフローチャートである。FIG. 25 is a flowchart showing an example of a compound classification processing procedure of the compound classification device 100. 図２６は、化合物名分割処理の具体的処理手順の一例を示すフローチャートである。FIG. 26 is a flowchart illustrating an example of a specific processing procedure of the compound name division processing. 図２７は、母核分割処理の具体的処理手順の一例を示すフローチャート（その１）である。FIG. 27 is a flowchart (part 1) illustrating an example of a specific processing procedure of the mother nucleus dividing process. 図２８は、母核分割処理の具体的処理手順の一例を示すフローチャート（その２）である。FIG. 28 is a flowchart (No. 2) illustrating an example of a specific processing procedure of the nucleus dividing process. 図２９は、置換基分割処理の具体的処理手順の一例を示すフローチャートである。FIG. 29 is a flowchart illustrating an example of a specific processing procedure of substituent group splitting processing. 図３０は、母核比較テーブル作成処理の具体的処理手順の一例を示すフローチャートである。FIG. 30 is a flowchart illustrating an example of a specific processing procedure of the mother nucleus comparison table creation process. 図３１は、置換基比較テーブル作成処理の具体的処理手順の一例を示すフローチャートである。FIG. 31 is a flowchart illustrating an example of a specific processing procedure of the substituent comparison table creation processing.

以下に添付図面を参照して、この発明にかかる化合物分類装置、化合物分類プログラムおよび化合物分類方法の実施の形態を詳細に説明する。 Exemplary embodiments of a compound classification device, a compound classification program, and a compound classification method according to the present invention will be explained below in detail with reference to the accompanying drawings.

（化合物の命名法）
まず、本実施の形態において使用する化合物の命名法について説明する。ここで、化合物とは、２種類以上の元素からできている化学物質のことである。化合物は、例えば、有機化合物と無機化合物とに分類される。 (Nomenclature of compounds)
First, the nomenclature of the compounds used in this embodiment will be described. Here, a compound is a chemical substance made of two or more kinds of elements. The compounds are classified into, for example, organic compounds and inorganic compounds.

有機化合物は、炭素原子を構造の基本骨格に持つ化合物の総称である。有機化合物は、分子構造の違いによって、例えば、直鎖炭化水素、芳香族炭化水素、脂環式炭化水素などに分類することができる。なお、骨格を形成する炭素以外の元素として、ケイ素や酸素があり、無機分子と呼ばれる。 An organic compound is a general term for compounds having carbon atoms in the basic skeleton of the structure. Organic compounds can be classified into, for example, linear hydrocarbons, aromatic hydrocarbons, alicyclic hydrocarbons, etc., depending on the difference in molecular structure. Note that elements other than carbon forming the skeleton include silicon and oxygen, which are called inorganic molecules.

また、無機化合物は、有機化合物以外の化合物であり、炭素以外の元素で構成される化合物である。ただし、炭素化合物のうち、例えば、炭素の同素体（例えば、グラファイト、ダイヤモンド）や二酸化炭素は、無機化合物に分類される。以下の説明では、化合物として有機化合物を例に挙げて説明する。 The inorganic compound is a compound other than an organic compound and is a compound composed of an element other than carbon. However, among carbon compounds, for example, carbon allotropes (eg, graphite, diamond) and carbon dioxide are classified as inorganic compounds. In the following description, an organic compound will be described as an example.

有機化合物は、例えば、炭素骨格の長さや分岐により多様な構造をとる。炭素骨格は、有機化合物において炭素同士が結合している部分である。炭素骨格の長さは、炭素の数によって表される。また、有機化合物は、窒素（Ｎ）、酸素（Ｏ）、硫黄（Ｓ）、燐（Ｐ）、ハロゲン（Ｆ、Ｃｌ、Ｂｒ、Ｉ）などが炭素に結合した多様な官能基が生成される。官能基は、有機化合物のおおよその性質を決める原子団である。 Organic compounds take various structures depending on, for example, the length and branching of the carbon skeleton. The carbon skeleton is a portion where carbons are bonded to each other in an organic compound. The length of the carbon skeleton is represented by the number of carbons. In addition, various functional groups in which nitrogen (N), oxygen (O), sulfur (S), phosphorus (P), halogen (F, Cl, Br, I) and the like are bonded to carbon are generated from organic compounds. . A functional group is an atomic group that determines the approximate properties of an organic compound.

ここで、有機化合物の化合物名は、例えば、ＩＵＰＡＣ（ＩｎｔｅｒｎａｔｉｏｎａｌＵｎｉｏｎｏｆＰｕｒｅａｎｄＡｐｐｌｉｅｄＣｈｅｍｉｓｔｒｙ）が定める命名法によって命名される。ＩＵＰＡＣが定める命名法としては、例えば、置換命名法、基官能命名法、付加命名法、減去命名法、接合命名法、代置命名法などがある。 Here, the compound name of the organic compound is named, for example, by a nomenclature specified by IUPAC (International Union of Pure and Applied Chemistry). Examples of nomenclature established by IUPAC include substitution nomenclature, group functional nomenclature, addition nomenclature, subtraction nomenclature, junction nomenclature, and substitution nomenclature.

本実施の形態では、有機化合物の化合物名が、ＩＵＰＡＣが定める置換命名法（ｓｕｂｓｔｉｔｕｔｅｎｏｍｅｎｃｌａｔｕｒｅ）によって命名されている場合を想定する。置換命名法において、有機化合物の化合物名は、例えば「結合位置−接頭語−（語頭＋語幹＋語尾）」という形式で表現される。 In the present embodiment, it is assumed that the compound name of the organic compound is named by a substitution nomenclature defined by IUPAC. In the substitution nomenclature, the compound name of the organic compound is expressed, for example, in the format of “bonding position-prefix- (beginning + stem + ending)”.

また、置換命名法において、（語頭＋語幹＋語尾）は「母核」と呼ばれ、接頭語は「置換基」と呼ばれる。すなわち、置換命名法では、有機化合物の化合物名は、例えば、「置換基＋母核」というルールのもと記述される。母核および置換基は、化合物の部分構造を表す原子団である。 In the substitution nomenclature, (beginning + stem + ending) is called “mother core”, and the prefix is called “substituent”. That is, in the substitution nomenclature, the compound name of the organic compound is described under the rule of “substituent + mother nucleus”, for example. The mother nucleus and the substituent are atomic groups representing a partial structure of the compound.

母核は、有機化合物の基礎となる部分構造である。置換基は、有機化合物の系統や命名に使う部分構造であり、母核と対になって使用される概念である。また、母核と置換基は、母核を「親」、置換基を「子」とする親子関係にある。結合位置は、母核の何番目の炭素に置換基が結合しているのかを表している。ただし、結合位置は省略されている場合がある。 The mother nucleus is a partial structure that is the basis of an organic compound. Substituent is a partial structure used for the lineage and nomenclature of organic compounds, and is a concept used in pairs with the mother nucleus. The mother nucleus and the substituent have a parent-child relationship in which the mother nucleus is “parent” and the substituent is “child”. The bonding position represents the number of carbon in the mother nucleus to which the substituent is bonded. However, the coupling position may be omitted.

また、置換基の中に別の置換基を持つものは「複合置換基」と呼ばれる。複合置換基には、置換基と母核とが含まれる。すなわち、有機化合物の化合物名は、子の中に別の親子関係が存在するという複数世代の親子関係を含む場合がある。有機化合物の化合物名において、複合置換基を表す文字列は、例えば、括弧やかぎ括弧で囲まれている。 Moreover, what has another substituent in a substituent is called a "composite substituent." The composite substituent includes a substituent and a mother nucleus. That is, the compound name of the organic compound may include a multi-generation parent-child relationship in which another parent-child relationship exists in the child. In the compound name of the organic compound, the character string representing the composite substituent is enclosed in parentheses or angle brackets, for example.

本実施の形態では、各世代の親子関係を「１階層」とし、複数世代の親子関係を「階層構造」と表現する場合がある。また、最上位の階層を「第１階層」と表記し、階層が下位になるにしたがって順に「第２階層」、「第３階層」、…、「第ｎ階層」と表記する（ｎ：１以上の自然数）。また、第１〜第ｎ階層のうち任意の階層を「第ｉ階層」と表記する（ｉ＝１，２，…，ｎ）。 In this embodiment, the parent-child relationship of each generation may be expressed as “one hierarchy”, and the parent-child relationship of a plurality of generations may be expressed as “hierarchical structure”. Further, the highest hierarchy is expressed as “first hierarchy”, and as the hierarchy becomes lower, “second hierarchy”, “third hierarchy”,..., “Nth hierarchy” (n: 1). More natural numbers). In addition, an arbitrary hierarchy among the first to nth hierarchies is denoted as “i-th hierarchy” (i = 1, 2,..., N).

第ｉ階層には、１つの母核と、１つ以上の置換基が含まれる。ここでは、第ｉ階層に含まれる１つ以上の置換基を「第１置換基」、「第２置換基」、…、「第ｍ置換基」と表記する（ｍ：１以上の自然数）。また、第１〜第ｍ置換基のうち任意の置換基を「第ｊ置換基」と表記する（ｊ＝１，２，…，ｍ）。 The i-th layer includes one mother nucleus and one or more substituents. Here, one or more substituents included in the i-th layer are expressed as “first substituent”, “second substituent”,..., “Mth substituent” (m: a natural number of 1 or more). Further, an arbitrary substituent among the first to m-th substituents is referred to as a “j-th substituent” (j = 1, 2,..., M).

なお、第ｉ階層に含まれる１つ以上の置換基の番号（１，２，…，ｍ）を、どのような順序で付けるかは任意である。例えば、各置換基の名称のアルファベット順に番号を付けてもよく、また、各置換基が結合する母核の炭素の番号が若い順に番号を付けてもよい。以下の説明では、第ｉ階層の各置換基の番号を、化合物名の先頭から順に第１置換基、第２置換基、…、第ｍ置換基とする。 Note that the order of assigning the numbers (1, 2,..., M) of one or more substituents included in the i-th layer is arbitrary. For example, numbers may be assigned in alphabetical order of the names of the substituents, or numbers may be assigned in ascending order of the carbon numbers of the mother nucleus to which the substituents are bonded. In the following description, the number of each substituent in the i-th layer is defined as the first substituent, the second substituent,..., The m-th substituent in order from the top of the compound name.

ここで、有機化合物の化合物名として『２−（３−メチル−４−ヒドロキシフェニル）プロパン』を例に挙げて説明する。この化合物名において、第１階層の母核は「プロパン」、第１置換基は「３−メチル−４−ヒドロキシフェニル」、第１置換基の結合位置は「２」である。 Here, “2- (3-methyl-4-hydroxyphenyl) propane” will be described as an example of the compound name of the organic compound. In this compound name, the parent nucleus of the first layer is “propane”, the first substituent is “3-methyl-4-hydroxyphenyl”, and the bonding position of the first substituent is “2”.

また、第１置換基は括弧で囲まれている複合置換基である。このため、この化合物名には第２階層が存在する。具体的には、第２階層の母核は「フェニル」、第１置換基は「メチル」、第１置換基の結合位置は「３」、第２置換基は「ヒドロキシ」、第２置換基の結合位置は「４」である。第２階層を構成する複合置換基では、表記上、第１階層の母核に近い置換基が親、すなわち、第２階層の母核となり、母核に遠い置換基が子、すなわち、第２階層の置換基となる。 The first substituent is a composite substituent surrounded by parentheses. For this reason, this compound name has a second hierarchy. Specifically, the parent nucleus of the second hierarchy is “phenyl”, the first substituent is “methyl”, the bonding position of the first substituent is “3”, the second substituent is “hydroxy”, the second substituent The coupling position of “4” is “4”. In the compound substituents constituting the second hierarchy, the substituent close to the first nucleus is the parent, that is, the second nucleus, and the substituent far from the parent is the child, that is, the second hierarchy. Hierarchical substituents.

（化合物分類方法の一実施例）
つぎに、本実施の形態にかかる化合物分類方法の一実施例について説明する。図１は、実施の形態にかかる化合物分類方法の一実施例を示す説明図である。図１において、化合物分類装置１００は、分類対象となる化合物群を分類する機能を有するコンピュータである。 (One Example of Compound Classification Method)
Next, an example of the compound classification method according to the present embodiment will be described. FIG. 1 is an explanatory diagram illustrating an example of a compound classification method according to an embodiment. In FIG. 1, a compound classification device 100 is a computer having a function of classifying a group of compounds to be classified.

分類対象となる化合物群は、例えば、化学系や薬学系などの特許文献や学術論文などの文書に列挙された化合物の集合である。特許文献や学術論文などの文書において、列挙された化合物群は、何らかの類似性を有する化合物の集合であることが多い。本化合物分類方法では、分類対象となる化合物群の各々の化合物名から化合物の基礎となる部分構造を表す母核を判別して、各化合物の母核をもとに化合物群を分類する。 The compound group to be classified is, for example, a set of compounds listed in documents such as patent documents and academic papers such as chemical and pharmaceutical systems. In documents such as patent documents and academic papers, the listed compound group is often a collection of compounds having some similarity. In this compound classification method, a mother nucleus representing a partial structure serving as a basis of a compound is determined from each compound name of a compound group to be classified, and the compound group is classified based on the mother nucleus of each compound.

以下、分類対象となる化合物群を「第１〜第５の化合物」として、化合物分類装置１００の化合物分類処理例について説明する。 Hereinafter, a compound classification processing example of the compound classification device 100 will be described with the compound group to be classified as “first to fifth compounds”.

（１）化合物分類装置１００は、記憶部１１０を参照して、分類対象となる化合物群の各々の化合物の化合物名の中から、各々の化合物の母核となる部分構造の名称を表す文字列を検出する。記憶部１１０は、化合物分類装置１００がアクセス可能な記憶装置であり、化合物の母核となる部分構造の名称である母核名を記憶している。 (1) The compound classification device 100 refers to the storage unit 110, and from among the compound names of each compound in the compound group to be classified, a character string representing the name of the partial structure serving as the parent nucleus of each compound Is detected. The storage unit 110 is a storage device that can be accessed by the compound classification device 100 and stores a mother nucleus name that is a name of a partial structure that serves as a mother nucleus of the compound.

ここで、第１の化合物の化合物名は「ＡＡＡＸＸＸ」である。第２の化合物の化合物名は「ＢＢＢＹＹＹ」である。第３の化合物の化合物名は「ＣＣＣＸＸＸ」である。第４の化合物の化合物名は「ＤＤＤＹＹＹ」である。第５の化合物の化合物名は「ＥＥＥＸＸＸ」である。 Here, the compound name of the first compound is “AAAXXX”. The compound name of the second compound is “BBBYYY”. The compound name of the third compound is “CCCXXX”. The compound name of the fourth compound is “DDDYYY”. The compound name of the fifth compound is “EEEXXXX”.

具体的には、例えば、置換命名法では母核を表す文字列が最後方にくることを利用して、化合物分類装置１００が、化合物の化合物名の末尾からｔ（ｔ＝１，２，３，…）文字の文字列と、記憶部１１０に記憶されている母核名との一致判定を行う。そして、化合物分類装置１００が、記憶部１１０に記憶されている母核名と一致する文字列を、化合物の母核を表す文字列として検出する。 Specifically, for example, in the substitution nomenclature, the compound classification apparatus 100 uses the fact that the character string representing the mother nucleus comes to the end, so that the compound classification apparatus 100 starts t (t = 1, 2, 3) from the end of the compound name of the compound. ,...) A match between the character string and the core name stored in the storage unit 110 is determined. Then, the compound classification device 100 detects a character string that matches the mother nucleus name stored in the storage unit 110 as a character string that represents the mother nucleus of the compound.

ここでは、第１〜第５の化合物のうち第１、第３および第５の化合物の化合物名の末尾から３文字の文字列「ＸＸＸ」が、記憶部１１０に記憶されている母核名「ＸＸＸ」と一致する。このため、第１、第３および第５の化合物の母核を表す文字列「ＸＸＸ」が検出される。また、第１〜第５の化合物のうち第２および第４の化合物の化合物名の末尾から３文字の文字列「ＹＹＹ」が、記憶部１１０に記憶されている母核名「ＹＹＹ」と一致する。このため、第２および第４の化合物の母核を表す文字列「ＹＹＹ」が検出される。 Here, among the first to fifth compounds, the three-character string “XXX” from the end of the compound names of the first, third, and fifth compounds is the parent name “ XXX ". For this reason, the character string “XXX” representing the mother nucleus of the first, third, and fifth compounds is detected. In addition, the three-character string “YYY” from the end of the compound names of the second and fourth compounds of the first to fifth compounds matches the mother nucleus name “YYY” stored in the storage unit 110. To do. For this reason, the character string “YYY” representing the mother nucleus of the second and fourth compounds is detected.

（２）化合物分類装置１００は、検出した各々の化合物の母核を表す文字列に基づいて、分類対象となる化合物群を分類する。具体的には、例えば、化合物分類装置１００が、分類対象となる第１〜第５の化合物を、母核を表す文字列が同一となる化合物同士をグループ化して分類する。 (2) The compound classification device 100 classifies the compound group to be classified based on the character string representing the mother nucleus of each detected compound. Specifically, for example, the compound classification device 100 classifies the first to fifth compounds to be classified by grouping compounds having the same character string representing the mother nucleus.

ここでは、第１〜第５の化合物が、第１、第３および第５の化合物を含むグループ１と、第２および第４の化合物を含むグループ２とに分類されている。グループ１は、母核を表す文字列が「ＸＸＸ」となる化合物の集合である。グループ２は、母核を表す文字列が「ＹＹＹ」となる化合物の集合である。 Here, the first to fifth compounds are classified into group 1 including the first, third and fifth compounds and group 2 including the second and fourth compounds. Group 1 is a set of compounds in which the character string representing the mother nucleus is “XXX”. Group 2 is a set of compounds in which the character string representing the mother nucleus is “YYY”.

このように、本実施の形態にかかる化合物分類装置１００によれば、分類対象となる第１〜第５の化合物を、化合物の基礎となる部分構造を表す母核が同一となる化合物同士で分類することができる。これにより、第１〜第５の化合物の中から化合物の母核が同一の化合物の集合を判別することができる。この結果、例えば、第１〜第５の化合物のうちの母核が同一の化合物同士の類似性や差分を判断し易くすることができる。 Thus, according to the compound classification apparatus 100 according to the present embodiment, the first to fifth compounds to be classified are classified by the compounds having the same mother nucleus representing the partial structure that is the basis of the compound. can do. Thereby, it is possible to determine a set of compounds having the same compound nucleus from the first to fifth compounds. As a result, for example, it is possible to easily determine the similarity or difference between compounds having the same mother nucleus among the first to fifth compounds.

また、詳細は後述するが、化合物分類装置１００は、分類対象となる第１〜第５の化合物のうち、特定の化合物の母核を表す文字列と他の化合物の母核を表す文字列とを比較することにより、第１〜第５の化合物を分類することにしてもよい。これにより、第１〜第５の化合物の中から、特定の化合物と母核が同一の化合物の集合を判別して、特定の化合物と母核が同一の化合物同士の類似性や差分を判断し易くすることができる。 Although details will be described later, the compound classification apparatus 100 includes a character string representing a mother nucleus of a specific compound and a character string representing a mother nucleus of another compound among the first to fifth compounds to be classified. The first to fifth compounds may be classified by comparing. As a result, from among the first to fifth compounds, a set of compounds in which the specific compound and the mother nucleus are the same is discriminated, and the similarity and difference between the compounds in which the specific compound and the mother nucleus are the same are determined. Can be made easier.

（システム２００のシステム構成例）
つぎに、実施の形態にかかるシステム２００のシステム構成例について説明する。図２は、システム２００のシステム構成例を示す説明図である。図２において、システム２００は、化合物分類装置１００と、複数のクライアント装置２０１（図面では、３台）と、を含む。 (System configuration example of system 200)
Next, a system configuration example of the system 200 according to the embodiment will be described. FIG. 2 is an explanatory diagram illustrating a system configuration example of the system 200. In FIG. 2, a system 200 includes a compound classification device 100 and a plurality of client devices 201 (three in the drawing).

システム２００において、化合物分類装置１００およびクライアント装置２０１は、有線または無線のネットワーク２１０を介して接続されている。ネットワーク２１０は、例えば、インターネット、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）などである。 In the system 200, the compound classification device 100 and the client device 201 are connected via a wired or wireless network 210. The network 210 is, for example, the Internet, a LAN (Local Area Network), a WAN (Wide Area Network), or the like.

ここで、化合物分類装置１００は、構造解析ルールＤＢ（データベース）２２０、構造式ＤＢ２３０および基本構造抽出ルールＤＢ２４０を有する。なお、構造解析ルールＤＢ２２０、構造式ＤＢ２３０および基本構造抽出ルールＤＢ２４０についての詳細な説明は、図４〜図６を用いて後述する。 Here, the compound classification device 100 includes a structural analysis rule DB (database) 220, a structural formula DB 230, and a basic structure extraction rule DB 240. The detailed description of the structural analysis rule DB 220, the structural formula DB 230, and the basic structure extraction rule DB 240 will be described later with reference to FIGS.

クライアント装置２０１は、システム２００のユーザが使用するＰＣ（パーソナル・コンピュータ）、ノートＰＣなどである。例えば、システム２００において、クライアント装置２０１から化合物分類装置１００に分類対象となる化合物群の各々の化合物名を表すテキストデータが送信されると、化合物群を分類した分類結果が化合物分類置１００からクライアント端末２０１に送信される。 The client device 201 is a PC (personal computer), a notebook PC, or the like used by the user of the system 200. For example, in the system 200, when text data representing each compound name of a compound group to be classified is transmitted from the client device 201 to the compound classification device 100, the classification result of classifying the compound group is transferred from the compound classification device 100 to the client. It is transmitted to the terminal 201.

（化合物分類装置１００のハードウェア構成例）
図３は、化合物分類装置１００のハードウェア構成例を示すブロック図である。図３において、化合物分類装置１００は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）３０１と、ＲＯＭ（Ｒｅａｄ‐ＯｎｌｙＭｅｍｏｒｙ）３０２と、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）３０３と、磁気ディスクドライブ３０４と、磁気ディスク３０５と、光ディスクドライブ３０６と、光ディスク３０７と、Ｉ／Ｆ（Ｉｎｔｅｒｆａｃｅ）３０８と、ディスプレイ３０９と、キーボード３１０と、マウス３１１と、を有している。また、各構成部はバス３００によってそれぞれ接続されている。 (Hardware configuration example of compound classification apparatus 100)
FIG. 3 is a block diagram illustrating a hardware configuration example of the compound classification device 100. In FIG. 3, a compound classification device 100 includes a CPU (Central Processing Unit) 301, a ROM (Read-Only Memory) 302, a RAM (Random Access Memory) 303, a magnetic disk drive 304, a magnetic disk 305, and an optical disk. A drive 306, an optical disk 307, an I / F (Interface) 308, a display 309, a keyboard 310, and a mouse 311 are included. Each component is connected by a bus 300.

ここで、ＣＰＵ３０１は、化合物分類装置１００の全体の制御を司る。ＲＯＭ３０２は、ブートプログラムなどのプログラムを記憶している。ＲＡＭ３０３は、ＣＰＵ３０１のワークエリアとして使用される。磁気ディスクドライブ３０４は、ＣＰＵ３０１の制御にしたがって磁気ディスク３０５に対するデータのリード／ライトを制御する。磁気ディスク３０５は、磁気ディスクドライブ３０４の制御で書き込まれたデータを記憶する。 Here, the CPU 301 governs overall control of the compound classification apparatus 100. The ROM 302 stores a program such as a boot program. The RAM 303 is used as a work area for the CPU 301. The magnetic disk drive 304 controls the reading / writing of the data with respect to the magnetic disk 305 according to control of CPU301. The magnetic disk 305 stores data written under the control of the magnetic disk drive 304.

光ディスクドライブ３０６は、ＣＰＵ３０１の制御にしたがって光ディスク３０７に対するデータのリード／ライトを制御する。光ディスク３０７は、光ディスクドライブ３０６の制御で書き込まれたデータを記憶したり、光ディスク３０７に記憶されたデータをコンピュータに読み取らせたりする。 The optical disk drive 306 controls the reading / writing of the data with respect to the optical disk 307 according to control of CPU301. The optical disk 307 stores data written under the control of the optical disk drive 306, and causes the computer to read data stored on the optical disk 307.

Ｉ／Ｆ３０８は、通信回線を通じてネットワーク２１０に接続され、ネットワーク２１０を介して、他のコンピュータ、例えば、クライアント装置２０１に接続される。そして、Ｉ／Ｆ３０８は、ネットワーク２１０と内部のインターフェースを司り、他のコンピュータからのデータの入出力を制御する。Ｉ／Ｆ３０８には、例えば、モデムやＬＡＮアダプタなどを採用することができる。 The I / F 308 is connected to the network 210 via a communication line, and is connected to another computer, for example, the client device 201 via the network 210. The I / F 308 controls an internal interface with the network 210 and controls input / output of data from other computers. For example, a modem or a LAN adapter may be employed as the I / F 308.

ディスプレイ３０９は、カーソル、アイコンあるいはツールボックスをはじめ、文書、画像、機能情報などのデータを表示する。このディスプレイ３０９は、例えば、ＣＲＴ、ＴＦＴ液晶ディスプレイ、プラズマディスプレイなどを採用することができる。 The display 309 displays data such as a document, an image, and function information as well as a cursor, an icon, or a tool box. As this display 309, for example, a CRT, a TFT liquid crystal display, a plasma display, or the like can be adopted.

キーボード３１０は、文字、数字、各種指示などの入力のためのキーを備え、データの入力を行う。また、タッチパネル式の入力パッドやテンキーなどであってもよい。マウス３１１は、カーソルの移動や範囲選択、あるいはウィンドウの移動やサイズの変更などを行う。ポインティングデバイスとして同様に機能を備えるものであれば、トラックボールやジョイスティックなどであってもよい。 The keyboard 310 includes keys for inputting characters, numbers, various instructions, and the like, and inputs data. Moreover, a touch panel type input pad or a numeric keypad may be used. The mouse 311 performs cursor movement, range selection, window movement, size change, and the like. A trackball or a joystick may be used as long as they have the same function as a pointing device.

なお、化合物分類装置１００は、上述した構成部のうち、例えば、光ディスクドライブ３０６、光ディスク３０７、ディスプレイ３０９、キーボード３１０、マウス３１１などを有していなくてもよい。また、クライアント装置２０１は、上述した化合物分類装置１００と同様のハードウェア構成により実現することができる。 Note that the compound classification device 100 may not include, for example, the optical disk drive 306, the optical disk 307, the display 309, the keyboard 310, and the mouse 311 among the above-described components. The client device 201 can be realized by the same hardware configuration as that of the compound classification device 100 described above.

（各種ＤＢ２２０，２３０，２４０の記憶内容）
つぎに、各種ＤＢ２２０，２３０，２４０の記憶内容について説明する。各種ＤＢ２２０，２３０，２４０は、例えば、図３に示したＲＯＭ３０２、ＲＡＭ３０３、磁気ディスク３０５、光ディスク３０７などの記憶装置に記憶されている。 (Storage contents of various DBs 220, 230, and 240)
Next, the contents stored in the various DBs 220, 230, and 240 will be described. The various DBs 220, 230, and 240 are stored in a storage device such as the ROM 302, the RAM 303, the magnetic disk 305, and the optical disk 307 illustrated in FIG.

図４は、構造解析ルールＤＢ２２０の記憶内容の一例を示す説明図である。図４において、構造解析ルールＤＢ２２０は、ルールＩＤ、ルール名、ルール内容および付記のフィールドを有する。各フィールドに情報を設定することで、ルール情報（例えば、ルール情報４００−１〜４００−８）がレコードとして記憶されている。 FIG. 4 is an explanatory diagram showing an example of the contents stored in the structure analysis rule DB 220. In FIG. 4, the structural analysis rule DB 220 includes fields for rule ID, rule name, rule content, and additional notes. By setting information in each field, rule information (for example, rule information 400-1 to 400-8) is stored as a record.

ここで、ルールＩＤは、構造解析ルールの識別子である。構造解析ルールは、化合物の構造を解析するための決まりごとを規定したものである。ルール名は、構造解析ルールの名称である。ルール内容は、構造解析ルールの内容である。付記は、ルール内容の補足である。 Here, the rule ID is an identifier of the structure analysis rule. The structure analysis rule defines rules for analyzing the structure of a compound. The rule name is the name of the structure analysis rule. The rule content is the content of the structure analysis rule. The supplementary notes are a supplement to the rules.

ルール情報４００−１を例に挙げると、ルール１のルール名「有機化合物」およびルール内容「親子関係が基本である。親は母核、子は置換基となる。」が示されている。ルール情報４００−１によれば、化合物分類装置１００は、有機化合物は親子関係が基本であり、親は母核、子は置換基となることを認識することができる。 Taking rule information 400-1 as an example, the rule name “Organic compound” of rule 1 and the rule content “parent-child relationship is basic. Parent is mother nucleus and child is a substituent” are shown. According to the rule information 400-1, the compound classification apparatus 100 can recognize that an organic compound has a parent-child relationship as a basis, a parent being a mother nucleus and a child being a substituent.

また、ルール情報４００−２を例に挙げると、ルール２のルール名「母核」、ルール内容「語頭＋語幹＋語尾の構成である。炭素鎖が第１階層の母核となる。」および付記「母核炭素鎖は構造式ＤＢを参照」が示されている。ルール情報４００−２によれば、化合物分類装置１００は、母核は語頭＋語幹＋語尾の構成であることを認識することができる。また、化合物分類装置１００は、炭素鎖が第１階層の母核となることを認識することができる。また、化合物分類装置１００は、母核炭素鎖は構造式ＤＢ２３０を参照して特定できることを認識することができる。 Taking rule information 400-2 as an example, the rule name “base” of rule 2 and the rule content “configuration of prefix + stem + ending. The carbon chain is the core of the first hierarchy” and The supplementary note “see the structural formula DB for the parent carbon chain” is shown. According to the rule information 400-2, the compound classification device 100 can recognize that the mother nucleus has a configuration of a prefix + stem + suffix. In addition, the compound classification device 100 can recognize that the carbon chain is the mother nucleus of the first hierarchy. Moreover, the compound classification apparatus 100 can recognize that the mother nucleus carbon chain can be specified with reference to the structural formula DB230.

図５は、構造式ＤＢ２３０の記憶内容の一例を示す説明図である。図５において、構造式ＤＢ２３０は、化合物ＩＤ、化合物の種類、環フラグ、化合物名、構造式、原子間結合なし構造式および備考のフィールドを有する。各フィールドに情報を設定することで、化合物ごとの構造式情報５１０−１〜５１０−Ｋ，５２０−１〜５２０−Ｐがレコードとして記憶されている。具体的には、構造式情報５１０−１〜５１０−Ｋは、母核を表す化合物の構造式情報である。また、構造式情報５２０−１〜５２０−Ｐは、置換基を表す化合物の構造式情報である。 FIG. 5 is an explanatory diagram showing an example of the contents stored in the structural formula DB 230. In FIG. 5, the structural formula DB 230 includes fields of a compound ID, a compound type, a ring flag, a compound name, a structural formula, a structural formula without an interatomic bond, and a remarks. By setting information in each field, structural formula information 510-1 to 510-K and 520-1 to 520-P for each compound are stored as records. Specifically, the structural formula information 510-1 to 510-K is structural formula information of a compound representing a mother nucleus. Structural formula information 520-1 to 520-P is structural formula information of a compound representing a substituent.

ここで、化合物ＩＤは、母核または置換基を表す化合物の識別子である。以下の説明では、母核Ｂ１〜ＢＫのうち任意の母核を「母核Ｂｋ」と表記する場合がある（ｋ＝１，２，…，Ｋ）。また、置換基Ｃ１〜ＣＰのうち任意の置換基を「置換基Ｃｐ」と表記する場合がある（ｐ＝１，２，…，Ｐ）。 Here, the compound ID is an identifier of a compound representing a mother nucleus or a substituent. In the following description, an arbitrary mother nucleus among the mother nuclei B1 to BK may be expressed as “mother nucleus Bk” (k = 1, 2,..., K). Moreover, arbitrary substituents among the substituents C1 to CP may be referred to as “substituent Cp” (p = 1, 2,..., P).

化合物の種類は、母核または置換基を表す化合物の種類である。環フラグは、母核または置換基を表す化合物が環構造であるか否かを示すフラグである。環フラグは、環構造の場合「Ｙｅｓ」、環構造ではない場合「Ｎｏ」となる。化合物名は、母核または置換基を表す化合物の名称である。 The type of compound is the type of compound that represents the mother nucleus or substituent. The ring flag is a flag indicating whether or not the compound representing the mother nucleus or the substituent has a ring structure. The ring flag is “Yes” in the case of a ring structure, and “No” if it is not a ring structure. The compound name is the name of the compound representing the mother nucleus or substituent.

構造式は、母核または置換基を表す化合物の構造式である。構造式とは、化合物内での元素の結合状態を図示した化学式である。構造式に含まれる各々の炭素元素には、炭素番号が付されている。原子間結合なし構造式は、構造式から原子間の結合を示す価標を除外したものである。備考は、構造式に関する補足情報である。備考フィールドには、例えば、構造式を略記法で表したものなどが設定される。 The structural formula is a structural formula of a compound representing a mother nucleus or a substituent. The structural formula is a chemical formula illustrating the bonding state of elements in a compound. Each carbon element included in the structural formula is given a carbon number. The structural formula without an interatomic bond is obtained by excluding a valence mark indicating a bond between atoms from the structural formula. The remarks are supplementary information regarding the structural formula. In the remarks field, for example, a structural formula expressed in abbreviated notation is set.

構造式情報５１０−１を例に挙げると、母核Ｂ１を表す化合物の種類「直鎖炭化水素」、環フラグ「Ｎｏ」、化合物名「メタン」、構造式「ＣＨ４」および原子間結合なし構造式「ＣＨ４」が示されている。また、構造式「ＣＨ４」に含まれる炭素元素「Ｃ」には炭素番号「１」が付されている。 Taking the structural formula information 510-1 as an example, the type of the compound representing the mother nucleus B1 “linear hydrocarbon”, the ring flag “No”, the compound name “methane”, the structural formula “CH4”, and the structure without an interatomic bond The formula “CH4” is shown. Also, the carbon number “1” is attached to the carbon element “C” contained in the structural formula “CH4”.

構造式情報５２０−１を例に挙げると、置換基Ｃ１を表す化合物の環フラグ「Ｎｏ」、化合物名「メチル」、構造式「ＣＨ３−」および原子間結合なし構造式「ＣＨ３−」が示されている。また、構造式「ＣＨ３−」に含まれる炭素元素「Ｃ」には炭素番号「１」が付されている。 Taking the structural formula information 520-1 as an example, the ring flag “No”, the compound name “methyl”, the structural formula “CH3-”, and the structural formula “CH3-” without interatomic bond of the compound representing the substituent C1 are shown. Has been. Also, the carbon number “1” is attached to the carbon element “C” included in the structural formula “CH3-”.

なお、図５に示した構造式ＤＢ２３０のデータ構造では、母核Ｂ１〜ＢＫの構造式情報５１０−１〜５１０−Ｋと、置換基Ｃ１〜ＣＰの構造式情報５２０−１〜５２０−Ｐと、を区別して示したが、これに限らない。例えば、第２階層以降は、置換基Ｃ１〜ＣＰも母核となり得るため、構造式ＤＢ２３０において、母核と置換値とを区別することなく、化合物ごとに構造式情報を管理することにしてもよい。 In the data structure of the structural formula DB 230 shown in FIG. 5, the structural formula information 510-1 to 510-K of the mother nuclei B1 to BK, the structural formula information 520-1 to 520-P of the substituents C1 to CP, and However, the present invention is not limited to this. For example, since the substituents C1 to CP can be the mother nucleus after the second hierarchy, the structural formula information is managed for each compound in the structural formula DB 230 without distinguishing the mother nucleus from the substitution value. Good.

図６は、基本構造抽出ルールＤＢ２４０の記憶内容の一例を示す説明図である。図６において、基本構造抽出ルールＤＢ２４０は、ルールＩＤおよびルール内容のフィールドを有する。各フィールドに情報を設定することで、基本構造抽出ルール情報（例えば、基本構造抽出ルール情報６００−１〜６００−５）がレコードとして記憶されている。 FIG. 6 is an explanatory diagram showing an example of the contents stored in the basic structure extraction rule DB 240. In FIG. 6, the basic structure extraction rule DB 240 has fields for rule ID and rule content. By setting information in each field, basic structure extraction rule information (for example, basic structure extraction rule information 600-1 to 600-5) is stored as a record.

ここで、ルールＩＤは、基本構造抽出ルールの識別子である。基本構造抽出ルールは、電子文書の中から、基本構造となる化合物の化合物名を抽出するための決まりごとを規定したものである。基本構造となる化合物とは、例えば、化学系や薬学系などの特許文献や学術論文などの電子文書に列挙された化合物群のうち、最も基本的な構造を有する化合物である。ルール内容は、基本構造抽出ルールの内容である。 Here, the rule ID is an identifier of the basic structure extraction rule. The basic structure extraction rule defines a rule for extracting a compound name of a compound having a basic structure from an electronic document. The compound having a basic structure is, for example, a compound having the most basic structure among a group of compounds listed in a patent document such as a chemical system or a pharmaceutical system or an electronic document such as an academic paper. The rule content is the content of the basic structure extraction rule.

基本構造抽出ルール６００−１を例に挙げると、ルール１のルール内容『特許明細書中の化合物の中には、「特にＸＸＸが好ましい。」で表現される化合物が存在する場合がある。』が示されている。基本構造抽出ルール情報６００−１によれば、化合物分類装置１００は、特許明細書中に表現された「特にＸＸＸが好ましい。」の「ＸＸＸ」は、基本構造となる化合物の化合物名となることを認識することができる。 Taking the basic structure extraction rule 600-1 as an example, there may be a compound expressed by the rule content “rule in rule 1”, “particularly, XXX is preferable” among the compounds in the patent specification. "It is shown. According to the basic structure extraction rule information 600-1, in the compound classification apparatus 100, “XXX” of “particularly XXX” expressed in the patent specification is the compound name of the compound having the basic structure. Can be recognized.

また、基本構造抽出ルール６００−５を例に挙げると、ルール５のルール内容『特許明細書中の化合物は、化合物群中で「、」で区切って分割して先頭のＸＸＸを抽出する。』が示されている。基本構造抽出ルール６００−５によれば、化合物分類装置１００は、特許明細書中の化合物は、「、」で区切って表現されていることを認識することができる。 Further, taking the basic structure extraction rule 600-5 as an example, the rule content of rule 5 “compounds in patent specifications are divided by“, ”in the compound group and divided to extract the first XXX. "It is shown. According to the basic structure extraction rule 600-5, the compound classification device 100 can recognize that the compounds in the patent specification are expressed by being separated by “,”.

（化合物分類装置１００の機能的構成例）
つぎに、化合物分類装置１００の機能的構成例について説明する。図７は、化合物分類装置１００の機能的構成例を示すブロック図である。図７において、化合物分類装置１００は、受付部７０１と、検出部７０２と、抽出部７０３と、特定部７０４と、分類部７０５と、比較部７０６と、算出部７０７と、判定部７０８と、設定部７０９と、作成部７１０と、出力部７１１と、を含む構成である。受付部７０１〜出力部７１１は制御部となる機能であり、具体的には、例えば、図３に示したＲＯＭ３０２、ＲＡＭ３０３、磁気ディスク３０５、光ディスク３０７などの記憶装置に記憶されたプログラムをＣＰＵ３０１に実行させることにより、または、Ｉ／Ｆ３０８により、その機能を実現する。また、各機能部の処理結果は、例えば、ＲＡＭ３０３、磁気ディスク３０５、光ディスク３０７などの記憶装置に記憶される。 (Functional configuration example of the compound classification apparatus 100)
Next, a functional configuration example of the compound classification device 100 will be described. FIG. 7 is a block diagram illustrating a functional configuration example of the compound classification device 100. In FIG. 7, the compound classification device 100 includes a reception unit 701, a detection unit 702, an extraction unit 703, a specification unit 704, a classification unit 705, a comparison unit 706, a calculation unit 707, a determination unit 708, The configuration includes a setting unit 709, a creation unit 710, and an output unit 711. The receiving unit 701 to the output unit 711 are functions as control units. Specifically, for example, a program stored in a storage device such as the ROM 302, the RAM 303, the magnetic disk 305, and the optical disk 307 illustrated in FIG. The function is realized by executing or by the I / F 308. Further, the processing results of the respective functional units are stored in a storage device such as the RAM 303, the magnetic disk 305, and the optical disk 307, for example.

受付部７０１は、分類対象となる化合物群の各々の化合物の化合物名を受け付ける機能を有する。具体的には、例えば、受付部７０１が、図３に示したキーボード３１０やマウス３１１を用いたユーザの操作入力により、分類対象となる化合物群の各々の化合物の化合物名を受け付けることにしてもよい。また、受付部７０１が、例えば、分類対象となる化合物群の各々の化合物の化合物名を表すテキストデータをクライアント装置２０１から受信することにより、分類対象となる化合物群の各々の化合物の化合物名を受け付けることにしてもよい。 The receiving unit 701 has a function of receiving the compound name of each compound in the compound group to be classified. Specifically, for example, the accepting unit 701 accepts the compound name of each compound in the compound group to be classified by the user's operation input using the keyboard 310 and the mouse 311 shown in FIG. Good. In addition, the reception unit 701 receives, for example, text data representing the compound name of each compound in the compound group to be classified from the client device 201, whereby the compound name of each compound in the compound group to be classified is determined. You may decide to accept.

また、受付部７０１は、分類対象となる化合物群のうち基本構造となる化合物の指定を受け付けることにしてもよい。具体的には、例えば、受付部７０１が、キーボード３１０やマウス３１１を用いたユーザの操作入力により、分類対象となる化合物群のうち基本構造となる化合物の指定を受け付けることにしてもよい。また、受付部７０１が、例えば、分類対象となる化合物群のうち基本構造となる化合物の化合物名を表すテキストデータをクライアント装置２０１から受信することにより、基本構造となる化合物の指定を受け付けることにしてもよい。 In addition, the receiving unit 701 may receive a designation of a compound that is a basic structure among a group of compounds to be classified. Specifically, for example, the accepting unit 701 may accept designation of a compound having a basic structure from among a group of compounds to be classified by a user operation input using the keyboard 310 or the mouse 311. In addition, the receiving unit 701 receives, for example, text data representing the compound name of the compound having the basic structure from the group of compounds to be classified from the client device 201, thereby receiving the designation of the compound having the basic structure. May be.

なお、受け付けられた分類対象となる化合物群の各々の化合物の化合物名は、例えば、後述の図８に示す分割テーブル８００に記憶される。 In addition, the compound name of each compound of the compound group which becomes the classification target received is memorize | stored in the division | segmentation table 800 shown in below-mentioned FIG. 8, for example.

また、化合物分類装置１００は、図６に示した基本構造抽出ルールＤＢ２４０を参照して、電子文書の中から分類対象となる化合物群の各々の化合物の化合物名を検出することにしてもよい。この場合、受付部７０１は、電子文書の中から検出された分類対象となる化合物群の各々の化合物の化合物名を受け付けることにしてもよい。 Further, the compound classification device 100 may detect the compound name of each compound in the compound group to be classified from the electronic document with reference to the basic structure extraction rule DB 240 shown in FIG. In this case, the accepting unit 701 may accept the compound name of each compound in the compound group to be classified detected from the electronic document.

また、化合物分類装置１００は、例えば、基本構造抽出ルールＤＢ２４０を参照して、分類対象となる化合物群の検出元となる電子文書の中から、分類対象となる化合物群のうち基本構造となる化合物の化合物名を検出することにしてもよい。この場合、受付部７０１は、電子文書の中から検出された基本構造となる化合物の化合物名を受け付けることにしてもよい。 In addition, the compound classification device 100 refers to the basic structure extraction rule DB 240, for example, and selects the compound that is the basic structure from among the electronic documents that are the detection source of the compound group that is the classification target. The compound name may be detected. In this case, the accepting unit 701 may accept the compound name of the compound having the basic structure detected from the electronic document.

ここで、電子文書は、例えば、特許文献や学術論文などの技術文書である。電子文書は、例えば、化合物分類装置１００に入力されてＲＡＭ３０３、磁気ディスク３０５、光ディスク３０７などの記憶装置に記憶されている。一例として、特許明細書中に、『前記一般式（Ｉ）で表されるＡＡＡ類の代表例としては、＃＃＃、＄＄＄、＠＠＠などが挙げられる。特に、＄＄＄が好ましい。』と記載されているとする。 Here, the electronic document is, for example, a technical document such as a patent document or an academic paper. For example, the electronic document is input to the compound classification device 100 and stored in a storage device such as the RAM 303, the magnetic disk 305, and the optical disk 307. As an example, in a patent specification, “Examples of AAAs represented by the general formula (I) include ##, $$, @@@, and the like. Particularly, $$ is preferable. ”.

この場合、化合物分類装置１００は、分類対象となる化合物群として、特許明細書の中から「、」で区切られている「＃＃＃」、「＄＄＄」および「＠＠＠」を検出する。また、化合物分類装置１００は、分類対象となる化合物群のうちの基本構造となる化合物の化合物名として、特許明細書の中から「＄＄＄」を検出する。 In this case, the compound classification device 100 detects “##”, “$$$”, and “@@@” delimited by “,” from the patent specification as the compound group to be classified. To do. In addition, the compound classification apparatus 100 detects “$$$” from the patent specification as the compound name of the compound having the basic structure in the compound group to be classified.

以下の説明では、分類対象となる化合物群を「化合物群Ｍ１〜ＭＲ」と表記する場合がある（Ｒは２以上の自然数）。また、化合物群Ｍ１〜ＭＲのうち任意の化合物を「化合物Ｍｒ」と表記する場合がある（ｒ＝１，２，…，Ｒ）。また、化合物Ｍｒの化合物名を「化合物名Ｎｒ」と表記する場合がある。 In the following description, a compound group to be classified may be referred to as “compound groups M1 to MR” (R is a natural number of 2 or more). In addition, any compound in the compound groups M1 to MR may be referred to as “compound Mr” (r = 1, 2,..., R). In addition, the compound name of the compound Mr may be expressed as “compound name Nr”.

検出部７０２は、構造式ＤＢ２３０を参照して、化合物群Ｍ１〜ＭＲの各々の化合物の化合物名の中から、各々の化合物の母核となる部分構造の名称を表す文字列を検出する機能を有する。ここで、文字列とは、化合物の化合物名において連続する１以上の文字の集合である。 The detection unit 702 has a function of referring to the structural formula DB 230 and detecting a character string representing the name of the partial structure serving as the mother nucleus of each compound from the compound names of each compound in the compound groups M1 to MR. Have. Here, the character string is a set of one or more characters that are continuous in the compound name of the compound.

上述したように、置換命名法において、化合物の母核を表す文字列は、化合物の化合物名の最後方に記述される。検出部７０２は、例えば、この置換命名法の特徴を利用して、化合物Ｍｒの化合物名Ｎｒの中から、化合物Ｍｒの母核を表す文字列を検出することができる。 As described above, in substitution nomenclature, a character string representing the mother nucleus of a compound is written at the end of the compound name of the compound. The detection unit 702 can detect, for example, a character string representing the mother nucleus of the compound Mr from the compound name Nr of the compound Mr using the feature of the substitution nomenclature.

具体的には、例えば、検出部７０２が、構造式ＤＢ２３０の中から母核Ｂｋの化合物名を選択する。つぎに、検出部７０２が、選択した母核Ｂｋの化合物名の文字数ｔを特定する。そして、検出部７０２が、化合物Ｍｒの化合物名Ｎｒの末尾からｔ文字の文字列と、母核Ｂｋの化合物名とが一致するか否かを判断する。ここで、母核Ｂｋの化合物名と一致する場合、検出部７０２が、化合物名Ｎｒの末尾からｔ文字の文字列を、化合物Ｍｒの母核を表す文字列として検出する。 Specifically, for example, the detection unit 702 selects the compound name of the mother nucleus Bk from the structural formula DB 230. Next, the detection unit 702 specifies the number of characters t in the compound name of the selected mother nucleus Bk. Then, the detection unit 702 determines whether the character string of t characters from the end of the compound name Nr of the compound Mr matches the compound name of the mother nucleus Bk. Here, when it matches with the compound name of the mother nucleus Bk, the detection unit 702 detects a character string of t characters from the end of the compound name Nr as a character string representing the mother nucleus of the compound Mr.

また、他の検出手法として、例えば、検出部７０２が、「ｔ＝１」として、化合物Ｍｒの化合物名Ｎｒの末尾からｔ文字の文字列を検出する。そして、検出部７０２が、構造式ＤＢ２３０の中から、検出したｔ文字の文字列と化合物名が一致する母核Ｂｋを検出する。ここで、母核Ｂｋが検出された場合、検出部７０２が、化合物名Ｎｒの末尾からｔ文字の文字列を、化合物Ｍｒの母核を表す文字列として検出する。一方、母核Ｂｋが検出されなかった場合、検出部７０２が、「ｔ」をインクリメントして、化合物名Ｎｒの末尾からｔ文字の文字列を検出することにより処理を繰り返す。なお、「ｔ」が、構造式ＤＢ２３０に登録されている母核を表す化合物の化合物名の最大文字数を超えた場合、化合物Ｍｒの母核を表す文字列は非検出となる。 As another detection method, for example, the detection unit 702 detects a character string of t characters from the end of the compound name Nr of the compound Mr as “t = 1”. Then, the detection unit 702 detects a mother nucleus Bk in which the detected t character string and the compound name match from the structural formula DB 230. When the mother nucleus Bk is detected, the detection unit 702 detects a character string of t characters from the end of the compound name Nr as a character string representing the mother nucleus of the compound Mr. On the other hand, when the mother nucleus Bk is not detected, the detection unit 702 repeats the process by incrementing “t” and detecting a character string of t characters from the end of the compound name Nr. When “t” exceeds the maximum number of characters of the compound name of the compound representing the mother nucleus registered in the structural formula DB 230, the character string representing the mother nucleus of the compound Mr is not detected.

なお、検出された化合物Ｍｒの母核を表す文字列は、例えば、後述の図１１に示す母核比較テーブル１１００に記憶される。 In addition, the character string showing the mother nucleus of the detected compound Mr is memorize | stored in the mother nucleus comparison table 1100 shown in below-mentioned FIG. 11, for example.

抽出部７０３は、化合物群Ｍ１〜ＭＲの各々の化合物の化合物名のうち各々の化合物の母核を表す文字列を除く残余の文字列の中から、各々の化合物の置換基となる部分構造の名称を表す文字列を抽出する機能を有する。また、抽出部７０３は、各々の化合物の化合物名のうち各々の化合物の母核を表す文字列を除く残余の文字列の中から、各々の化合物の母核に結合する置換基の結合位置を表す文字列を抽出することにしてもよい。 The extraction unit 703 has a partial structure serving as a substituent of each compound from the remaining character strings excluding the character string representing the mother nucleus of each compound among the compound names of each compound in the compound groups M1 to MR. It has a function of extracting a character string representing a name. In addition, the extraction unit 703 selects the bonding position of the substituent that is bonded to the mother nucleus of each compound from the remaining character strings excluding the character string representing the mother nucleus of each compound among the compound names of each compound. You may decide to extract the character string to represent.

ここで、置換命名法において、化合物の置換基は、例えば「結合位置−置換基」という形式で記述される。そこで、まず、抽出部７０３が、化合物名Ｎｒのうち化合物Ｍｒの母核を表す文字列を除く残余の文字列を「数字−文字列」の組に分割する。文字列については、括弧に囲まれた部分も一つの文字列とする。そして、抽出部７０３が、各組の文字列を先頭から順番に第１〜第ｍ置換基の名称として抽出する。また、抽出部７０３が、各組の数字を先頭から順番に第１〜第ｍ置換基の結合位置として抽出する。 Here, in the substitution nomenclature, the substituent of the compound is described in the form of “bonding position-substituent”, for example. Therefore, first, the extraction unit 703 divides the remaining character string of the compound name Nr, excluding the character string representing the mother nucleus of the compound Mr, into a “number-character string” pair. For character strings, the part enclosed in parentheses is also a single character string. Then, the extracting unit 703 extracts each set of character strings as the names of the first to m-th substituents in order from the top. In addition, the extraction unit 703 extracts each set of numbers as the binding positions of the first to m-th substituents in order from the top.

また、第ｊ置換基を表す文字列に倍数接頭辞が含まれている場合、第ｊ置換基が結合する母核の結合位置を表す文字列は、例えば、「数字，数字−文字列」というようにハイフン（−）の前の数字がカンマ（，）で区切られた形となることがある。ここで、倍数接頭辞とは、置換基の名称の前に付いて、置換基の数を示す接頭語である。 When the character string representing the jth substituent includes a multiple prefix, the character string representing the binding position of the mother nucleus to which the jth substituent is bound is, for example, “number, number-character string”. In this way, the number before the hyphen (-) may be separated by a comma (,). Here, the multiple prefix is a prefix indicating the number of substituents before the name of the substituent.

例えば、「ジ」は、置換基が２つであることを示す倍数接頭辞である。また、「トリ」は、置換基が３つであることを示す倍数接頭辞である。この場合、抽出部７０３は、化合物名Ｎｒのうち母核を表す文字列を除く残余の文字列を、例えば「数字，数字−文字列」を一組として分割する。すなわち、抽出部７０３が、第ｊ置換基が結合する母核の結合位置を表す文字列として「数字，数字−」を抽出する。 For example, “di” is a multiple prefix indicating that there are two substituents. “Tori” is a multiple prefix indicating that there are three substituents. In this case, the extraction unit 703 divides the remaining character string excluding the character string representing the mother nucleus in the compound name Nr, for example, with “number, number-character string” as one set. That is, the extraction unit 703 extracts “number, number −” as a character string representing the binding position of the mother nucleus to which the jth substituent is bonded.

また、抽出部７０３は、置換基を表す文字列に倍数接頭辞が含まれている場合、置換基が結合する母核の結合位置を表す文字列と、置換基を表す文字列とを展開することにしてもよい。ここで、展開とは、倍数接頭辞を用いて集約されていた複数の置換基を各々の置換基に分解することである。 In addition, when the character string representing the substituent includes a multiple prefix, the extraction unit 703 expands the character string representing the binding position of the mother nucleus to which the substituent is bound and the character string representing the substituent. You may decide. Here, expansion is to decompose a plurality of substituents that have been aggregated using a multiple prefix into respective substituents.

具体的には、例えば、抽出部７０３が、置換基が結合する母核の結合位置を表す文字列に含まれる「数字，」の「，」を「−」に変換する。そして、抽出部７０３が、「，」が「−」に変換された変換後の文字列を「数字−」ごとに分割するとともに、分割後の「数字−」のうち２番目以降の「数字−」の先頭に「−」を追加する。 Specifically, for example, the extraction unit 703 converts “,” of “number,” included in the character string indicating the binding position of the mother nucleus to which the substituent is bonded to “−”. Then, the extraction unit 703 divides the converted character string obtained by converting “,” into “−” for each “number-”, and the second and subsequent “number-” among the “number-” after the division. "-" Is added to the head of "."

この結果、１番目の「数字−」が、１番目の置換基が結合する母核の結合位置となる。また、２番目以降の「−数字−」が、それぞれ２番目以降の置換基が結合する母核の結合位置となる。また、抽出部７０３が、置換基を表す文字列から倍数接頭辞を削除し、削除後の文字列を「−−」の間（連続するハイフンとハイフンとの間）に挿入する。すなわち、置換基を表す文字列は、展開前の置換基を表す文字列から倍数接頭辞を削除した文字列となる。なお、展開例については、図９および図１０を用いて後述する。 As a result, the first “number-” is the binding position of the mother nucleus to which the first substituent is bonded. Further, the second and subsequent “-numbers-” are the bonding positions of the mother nucleus to which the second and subsequent substituents are bonded. In addition, the extraction unit 703 deletes the multiple prefix from the character string representing the substituent, and inserts the character string after the deletion between “-” (between consecutive hyphens). That is, the character string representing the substituent is a character string obtained by deleting the multiple prefix from the character string representing the substituent before expansion. An example of development will be described later with reference to FIGS. 9 and 10.

なお、抽出された化合物Ｍｒの置換基を表す文字列は、例えば、後述の図１７に示す置換基比較テーブル１７００に記憶される。また、抽出された化合物Ｍｒの母核に結合する各置換基の結合位置を表す文字列は、例えば、母核比較テーブル１１００および置換基比較テーブル１７００に記憶される。 In addition, the character string showing the substituent of the extracted compound Mr is memorize | stored in the substituent comparison table 1700 shown in below-mentioned FIG. 17, for example. Moreover, the character string showing the coupling | bonding position of each substituent couple | bonded with the mother nucleus of the extracted compound Mr is memorize | stored in the mother nucleus comparison table 1100 and the substituent comparison table 1700, for example.

特定部７０４は、構造式ＤＢ２３０を参照して、検出された各々の化合物の母核を表す文字列に対応する母核の構造の種類を特定する機能を有する。具体的には、例えば、特定部７０４が、構造式ＤＢ２３０内の構造式情報５１０−１〜５１０−Ｋの中から、化合物Ｍｒの母核を表す文字列が化合物名フィールドに設定されている構造式情報５１０−ｋを特定する。そして、特定部７０４が、構造式情報５１０−ｋの化合物の種類フィールドに設定されている化合物の種類を特定する。これにより、化合物Ｍｒの母核を表す化合物の構造の種類を特定することができる。なお、特定された化合物Ｍｒの母核の構造の種類は、例えば、母核比較テーブル１１００に記憶される。 The specifying unit 704 has a function of referring to the structural formula DB 230 and specifying the type of structure of the mother nucleus corresponding to the character string representing the mother nucleus of each detected compound. Specifically, for example, the specifying unit 704 has a structure in which a character string representing the mother nucleus of the compound Mr is set in the compound name field from the structural formula information 510-1 to 510-K in the structural formula DB 230. The formula information 510-k is specified. Then, the specifying unit 704 specifies the type of compound set in the compound type field of the structural formula information 510-k. Thereby, the type of the structure of the compound representing the mother nucleus of the compound Mr can be specified. The type of the structure of the mother nucleus of the specified compound Mr is stored in the mother nucleus comparison table 1100, for example.

また、特定部７０４は、構造式ＤＢ２３０を参照して、検出された各々の化合物の母核を表す文字列に対応する母核の構造式に含まれる特定の元素の元素数を特定する機能を有する。ここで、特定の元素は、例えば、炭素、窒素、酸素、硫黄、燐、ハロゲンなどである。なお、特定の元素とする元素の元素記号は、例えば、ＲＯＭ３０２、ＲＡＭ３０３、磁気ディスク３０５、光ディスク３０７などの記憶装置に記憶されている。 Further, the specifying unit 704 has a function of referring to the structural formula DB 230 to specify the number of elements of a specific element included in the structural formula of the mother nucleus corresponding to the character string representing the mother nucleus of each detected compound. Have. Here, the specific element is, for example, carbon, nitrogen, oxygen, sulfur, phosphorus, halogen or the like. In addition, the element symbol of the element as the specific element is stored in a storage device such as the ROM 302, the RAM 303, the magnetic disk 305, and the optical disk 307, for example.

以下の説明では、特定の元素として「炭素」を例に挙げて説明する。具体的には、例えば、特定部７０４が、構造式ＤＢ２３０内の構造式情報５１０−１〜５１０−Ｋの中から、化合物Ｍｒの母核を表す文字列が化合物名フィールドに設定されている構造式情報５１０−ｋを特定する。つぎに、特定部７０４が、構造式情報５１０−ｋの構造式フィールドに設定されている構造式を特定する。そして、特定部７０４が、特定した構造式に含まれる炭素の元素数を特定する。これにより、化合物Ｍｒの母核を表す化合物の構造式に含まれる炭素数を特定することができる。なお、特定された化合物Ｍｒの母核の炭素数は、例えば、母核比較テーブル１１００に記憶される。 In the following description, “carbon” will be described as an example of the specific element. Specifically, for example, the specifying unit 704 has a structure in which a character string representing the mother nucleus of the compound Mr is set in the compound name field from the structural formula information 510-1 to 510-K in the structural formula DB 230. The formula information 510-k is specified. Next, the specifying unit 704 specifies the structural formula set in the structural formula field of the structural formula information 510-k. Then, the specifying unit 704 specifies the number of carbon elements included in the specified structural formula. Thereby, the carbon number contained in the structural formula of the compound representing the mother nucleus of the compound Mr can be specified. The number of carbon atoms of the mother nucleus of the specified compound Mr is stored in, for example, the mother nucleus comparison table 1100.

また、特定部７０４は、構造式ＤＢ２３０を参照して、抽出された各々の化合物の置換基を表す文字列に対応する置換基の構造式に含まれる炭素数を特定することにしてもよい。具体的には、例えば、特定部７０４が、構造式ＤＢ２３０内の構造式情報５２０−１〜５２０−Ｐの中から、化合物Ｍｒの第ｊ置換基を表す文字列が化合物名フィールドに設定されている構造式情報５２０−ｐを特定する。つぎに、特定部７０４が、構造式情報５２０−ｐの構造式フィールドに設定されている構造式を特定する。そして、特定部７０４が、特定した構造式に含まれる炭素の元素数を特定する。これにより、化合物Ｍｒの第ｊ置換基母核を表す化合物の構造式に含まれる炭素数を特定することができる。なお、特定された化合物Ｍｒの第ｊ置換基の炭素数は、例えば、置換基比較テーブル１７００に記憶される。 The identifying unit 704 may identify the number of carbon atoms included in the structural formula of the substituent corresponding to the character string representing the substituent of each extracted compound with reference to the structural formula DB 230. Specifically, for example, the specifying unit 704 sets a character string representing the j-th substituent of the compound Mr from the structural formula information 520-1 to 520-P in the structural formula DB 230 in the compound name field. The structural formula information 520-p is specified. Next, the specifying unit 704 specifies the structural formula set in the structural formula field of the structural formula information 520-p. Then, the specifying unit 704 specifies the number of carbon elements included in the specified structural formula. Thereby, the carbon number contained in the structural formula of the compound representing the j-th substituent parent nucleus of the compound Mr can be specified. The number of carbon atoms of the j-th substituent of the identified compound Mr is stored in, for example, the substituent comparison table 1700.

特定部７０４は、抽出された抽出結果に基づいて、化合物群Ｍ１〜ＭＲの各々の化合物の置換基の数を特定することにしてもよい。例えば、化合物Ｍｒの母核に結合する置換基として第１〜第ｍ置換基が抽出された場合、特定部７０４は、化合物Ｍｒの置換基数「ｍ」を特定する。なお、特定された化合物Ｍｒの置換基数は、例えば、母核比較テーブル１１００に記憶される。 The specifying unit 704 may specify the number of substituents of each compound in the compound groups M1 to MR based on the extracted extraction result. For example, when the first to m-th substituents are extracted as substituents bonded to the mother nucleus of the compound Mr, the specifying unit 704 specifies the number of substituents “m” of the compound Mr. Note that the number of substituents of the identified compound Mr is stored, for example, in the mother nucleus comparison table 1100.

分類部７０５は、化合物群Ｍ１〜ＭＲを分類する機能を有する。具体的には、例えば、分類部７０５が、共通する特徴を有する化合物同士をまとめた集合ごとに化合物群Ｍ１〜ＭＲを分類することにしてもよい。 The classification unit 705 has a function of classifying the compound groups M1 to MR. Specifically, for example, the classification unit 705 may classify the compound groups M1 to MR for each set in which compounds having common characteristics are collected.

分類部７０５は、検出された各々の化合物の母核を表す文字列に基づいて、化合物群Ｍ１〜ＭＲを分類する機能を有する。具体的には、例えば、分類部７０５が、母核を表す文字列が同一の化合物の集合ごとに化合物群Ｍ１〜ＭＲを分類する。これにより、化合物の基礎となる部分構造を表す母核が同一の化合物同士を分類することができる。 The classification unit 705 has a function of classifying the compound groups M 1 to MR based on a character string representing the detected nucleus of each compound. Specifically, for example, the classification unit 705 classifies the compound groups M1 to MR for each set of compounds having the same character string representing the mother nucleus. Thereby, compounds having the same mother nucleus representing the partial structure serving as the basis of the compounds can be classified.

以下、母核を表す文字列が同一の化合物群を、さらに細かく分類する分類部７０５の具体的な処理内容を表す（分類例１）〜（分類例６）について説明する。 Hereinafter, (Classification Example 1) to (Classification Example 6) representing specific processing contents of the classifying unit 705 that classifies a compound group having the same character string representing the mother nucleus more finely will be described.

（分類例１）
分類部７０５は、さらに、抽出された各々の化合物の置換基を表す文字列に基づいて、化合物群Ｍ１〜ＭＲを分類することにしてもよい。具体的には、例えば、分類部７０５が、母核を表す文字列が同一かつ置換基を表す文字列が同一の化合物の集合ごとに化合物群Ｍ１〜ＭＲを分類することにしてもよい。これにより、化合物の母核が同一かつ化合物の系統や命名に使う部分構造を表す置換基が同一の化合物同士を分類することができる。なお、文字列の比較対象となる置換基は、例えば、各々の化合物の第ｊ置換基である。 (Classification example 1)
The classifying unit 705 may further classify the compound groups M1 to MR based on the character string representing the extracted substituent of each compound. Specifically, for example, the classification unit 705 may classify the compound groups M1 to MR for each set of compounds having the same character string representing the mother nucleus and the same character string representing the substituent. This makes it possible to classify compounds having the same compound nucleus and the same substituents representing the partial structure used for compound lineage and nomenclature. In addition, the substituent used as the comparison object of a character string is the jth substituent of each compound, for example.

（分類例２）
分類部７０５は、さらに、特定された各々の化合物の置換基の数に基づいて、化合物群Ｍ１〜ＭＲを分類することにしてもよい。具体的には、例えば、分類部７０５が、母核を表す文字列が同一かつ置換基の数が同一の化合物の集合ごとに化合物群Ｍ１〜ＭＲを分類することにしてもよい。これにより、化合物の母核が同一かつ置換基数が同一の化合物同士を分類することができる。 (Classification example 2)
The classification unit 705 may further classify the compound groups M1 to MR based on the number of substituents of each identified compound. Specifically, for example, the classification unit 705 may classify the compound groups M1 to MR for each set of compounds having the same character string representing the mother nucleus and the same number of substituents. Thereby, compounds having the same mother nucleus and the same number of substituents can be classified.

また、分類部７０５が、母核を表す文字列が同一かつ置換基の数の差が所定数α以内の化合物の集合ごとに、化合物群Ｍ１〜ＭＲを分類することにしてもよい。これにより、化合物の母核が同一かつ置換基数の差が所定数α以内の化合物同士を分類することができる。所定数αは、例えば、「α＝１」や「α＝２」などに設定される。なお、所定数αは、例えば、予め設定されてＲＯＭ３０２、ＲＡＭ３０３、磁気ディスク３０５、光ディスク３０７などの記憶装置に記憶されていてもよい。 The classifying unit 705 may classify the compound groups M1 to MR for each set of compounds having the same character string representing the mother nucleus and having a difference in the number of substituents within a predetermined number α. Thereby, compounds having the same mother nucleus and a difference in the number of substituents within a predetermined number α can be classified. The predetermined number α is set to “α = 1”, “α = 2”, or the like, for example. The predetermined number α may be set in advance and stored in a storage device such as the ROM 302, the RAM 303, the magnetic disk 305, and the optical disk 307, for example.

（分類例３）
分類部７０５は、さらに、各々の化合物の置換基の結合位置を表す文字列に基づいて、化合物群Ｍ１〜ＭＲを分類することにしてもよい。具体的には、例えば、分類部７０５が、母核を表す文字列が同一かつ母核に結合する各化合物の置換基の結合位置を表す文字列が同一の化合物の集合ごとに化合物群Ｍ１〜ＭＲを分類することにしてもよい。これにより、化合物の母核が同一かつ各置換基の結合位置が同一の化合物同士を分類することができる。 (Classification example 3)
The classification unit 705 may further classify the compound groups M1 to MR based on a character string that represents the bonding position of the substituent of each compound. Specifically, for example, the classification unit 705 includes the compound groups M1 to M3 for each set of compounds having the same character string representing the mother nucleus and the same character string representing the bonding position of the substituent of each compound that binds to the mother nucleus. The MR may be classified. Thereby, compounds having the same mother nucleus and the same bonding position of each substituent can be classified.

（分類例４）
分類部７０５は、さらに、特定された各々の化合物の母核の構造の種類に基づいて、化合物群Ｍ１〜ＭＲを分類することにしてもよい。ここで、母核の構造の種類とは、母核を表す化合物の分子構造の種類を表すものである。母核の構造の種類としては、例えば、直鎖炭化水素、芳香族炭化水素、脂環式炭化水素などがある。 (Classification example 4)
The classifying unit 705 may further classify the compound groups M1 to MR based on the type of structure of the mother nucleus of each identified compound. Here, the type of structure of the mother nucleus represents the type of molecular structure of the compound representing the mother nucleus. Examples of the structure of the mother nucleus include straight chain hydrocarbons, aromatic hydrocarbons, and alicyclic hydrocarbons.

具体的には、例えば、分類部７０５が、母核を表す文字列が同一の化合物の第１集合と、母核を表す文字列が異なりかつ母核の構造の種類が同一の化合物の第２集合とに化合物群Ｍ１〜ＭＲを分類することにしてもよい。これにより、化合物の母核が同一の化合物同士を分類するとともに、化合物の母核は異なるが母核の構造の種類が同一の化合物同士を分類することができる。なお、母核が同一の化合物同士は、母核の種類も同一である。 Specifically, for example, the classification unit 705 performs the first collection of compounds having the same character string representing the mother nucleus and the second set of compounds having different character strings representing the mother nucleus and the same type of structure of the mother nucleus. The compound groups M1 to MR may be classified into sets. Accordingly, it is possible to classify compounds having the same mother nucleus of the compound, and to classify compounds having different mother nuclei but the same type of structure of the mother nucleus. Note that compounds having the same mother nucleus have the same kind of mother nucleus.

（分類例５）
分類部７０５は、さらに、特定された各々の化合物の母核の構造式に含まれる炭素数に基づいて、化合物群Ｍ１〜ＭＲを分類することにしてもよい。具体的には、例えば、分類部７０５が、母核を表す文字列が同一の化合物の第１集合と、母核を表す文字列が異なりかつ母核の構造式に含まれる炭素数が同一の化合物の第２集合とに化合物群Ｍ１〜ＭＲを分類することにしてもよい。これにより、化合物の母核が同一の化合物同士を分類するとともに、化合物の母核は異なるが母核の炭素数が同一の化合物同士を分類することができる。なお、母核が同一の化合物同士は、炭素数も同一である。 (Classification example 5)
The classification unit 705 may further classify the compound groups M1 to MR based on the number of carbons included in the structural formula of the mother nucleus of each identified compound. Specifically, for example, the classification unit 705 differs from the first set of compounds having the same character string representing the mother nucleus with the same number of carbon atoms included in the structural formula of the mother nucleus. The compound groups M1 to MR may be classified into the second set of compounds. Thereby, it is possible to classify compounds having the same mother nucleus of the compound and also classify compounds having the same number of carbon atoms in the mother nucleus although the mother nucleus of the compound is different. Note that compounds having the same mother nucleus have the same carbon number.

また、分類部７０５が、母核を表す文字列が同一の化合物の第１集合と、母核を表す文字列が異なりかつ母核の構造式に含まれる炭素数の差が所定数β以内の化合物の第２集合とに化合物群Ｍ１〜ＭＲを分類することにしてもよい。これにより、化合物の母核が同一の化合物同士を分類するとともに、化合物の母核は異なるが母核の炭素数の差が所定数β以内の化合物同士を分類することができる。所定数βは、例えば、「β＝３」や「β＝５」などに設定される。なお、所定数βは、例えば、予め設定されてＲＯＭ３０２、ＲＡＭ３０３、磁気ディスク３０５、光ディスク３０７などの記憶装置に記憶されていてもよい。 In addition, the classification unit 705 determines that the difference between the first set of compounds having the same character string representing the mother nucleus and the number of carbon atoms included in the structural formula of the mother nucleus is different from the predetermined number β. The compound groups M1 to MR may be classified into the second set of compounds. Thereby, the compounds having the same mother nucleus can be classified, and the compounds having different mother nuclei but having a difference in carbon number of the mother nucleus within a predetermined number β can be classified. The predetermined number β is set to “β = 3” or “β = 5”, for example. The predetermined number β may be set in advance and stored in a storage device such as the ROM 302, the RAM 303, the magnetic disk 305, and the optical disk 307, for example.

（分類例６）
分類部７０５は、さらに、特定された各々の化合物の置換基の構造式に含まれる炭素数に基づいて、化合物群Ｍ１〜ＭＲを分類することにしてもよい。具体的には、例えば、分類部７０５が、母核を表す文字列が同一かつ置換基の構造式に含まれる炭素数が同一の化合物の集合ごとに化合物群Ｍ１〜ＭＲを分類することにしてもよい。これにより、化合物の母核が同一かつ置換基の炭素数が同一の化合物同士を分類することができる。なお、炭素数の比較対象となる置換基は、例えば、各々の化合物の第ｊ置換基である。 (Classification example 6)
The classifying unit 705 may further classify the compound groups M1 to MR based on the number of carbon atoms included in the structural formula of the substituent of each identified compound. Specifically, for example, the classification unit 705 classifies the compound groups M1 to MR for each set of compounds having the same character string representing the mother nucleus and the same number of carbon atoms included in the structural formula of the substituent. Also good. Thereby, compounds having the same mother nucleus and the same number of carbon atoms in the substituent can be classified. In addition, the substituent used as a comparison object of carbon number is, for example, the j-th substituent of each compound.

また、分類部７０５が、母核を表す文字列が同一かつ置換基の構造式に含まれる炭素数の差が所定数γ以内の化合物の集合ごとに化合物群Ｍ１〜ＭＲを分類することにしてもよい。これにより、化合物の母核が同一かつ置換基の炭素数の差が所定数γ以内の化合物同士を分類することができる。所定数γは、例えば、「γ＝３」や「γ＝５」などに設定される。なお、所定数γは、例えば、予め設定されてＲＯＭ３０２、ＲＡＭ３０３、磁気ディスク３０５、光ディスク３０７などの記憶装置に記憶されていてもよい。 Further, the classification unit 705 classifies the compound groups M1 to MR for each set of compounds having the same character string representing the mother nucleus and having a difference in the number of carbon atoms included in the structural formula of the substituent within a predetermined number γ. Also good. This makes it possible to classify compounds having the same mother nucleus and a difference in carbon number of substituents within a predetermined number γ. For example, the predetermined number γ is set to “γ = 3”, “γ = 5”, or the like. The predetermined number γ may be set in advance and stored in a storage device such as the ROM 302, the RAM 303, the magnetic disk 305, and the optical disk 307, for example.

また、分類部７０５は、上述した（分類例１）〜（分類例６）のうちの２以上の分類例を組み合わせて、化合物群Ｍ１〜ＭＲを分類することにしてもよい。例えば、（分類例１）および（分類例２）を組み合わせることにより、分類部７０５が、母核を表す文字列が同一かつ置換基を表す文字列が同一かつ置換基数が同一の化合物の集合ごとに化合物群Ｍ１〜ＭＲを分類することにしてもよい。 Further, the classification unit 705 may classify the compound groups M1 to MR by combining two or more classification examples from the above (Classification Example 1) to (Classification Example 6). For example, by combining (Category 1) and (Category 2), the classification unit 705 causes the set of compounds having the same character string representing the mother nucleus, the same character string representing the substituent, and the same number of substituents. The compound groups M1 to MR may be classified.

これにより、化合物の母核が同一かつ置換基が同一かつ置換基数が同一の化合物同士を分類することができる。また、化合物の母核が同一かつ置換基（少なくともいずれかの置換基）が同一かつ置換基数が異なる化合物同士を分類することができる。また、化合物の母核が同一かつ置換基数が同一かつ置換基が異なる化合物同士を分類することができる。 Thereby, compounds having the same mother nucleus, the same substituents, and the same number of substituents can be classified. In addition, compounds having the same mother nucleus, the same substituent (at least one substituent), and different numbers of substituents can be classified. In addition, compounds having the same mother nucleus, the same number of substituents, and different substituents can be classified.

なお、上述した説明では、母核を表す文字列が同一の化合物群をさらに分類する場合について説明したが、これに限らない。例えば、分類部７０５は、母核を表す文字列、母核の構造の種類、母核の炭素数、置換基を表す文字列、置換基数、置換基の結合位置、置換基の炭素数の少なくともいずれかが共通する化合物同士をまとめた集合ごとに化合物群Ｍ１〜ＭＲを分類することにしてもよい。 In the above description, the case has been described in which the compound group having the same character string representing the mother nucleus is further classified, but the present invention is not limited to this. For example, the classification unit 705 includes at least the character string representing the mother nucleus, the structure type of the mother nucleus, the number of carbon atoms in the mother nucleus, the character string representing the substituent, the number of substituents, the bonding position of the substituent, and the number of carbon atoms of the substituent. You may decide to classify compound groups M1-MR for every set which put together the compound in which either is common.

つぎに、化合物群Ｍ１〜ＭＲのうち特定の化合物と、特定の化合物と共通の特徴を有する他の化合物とをまとめた集合に化合物群Ｍ１〜ＭＲを分類する場合について説明する。ここで、特定の化合物とは、例えば、化合物群Ｍ１〜ＭＲのうち基本構造となる化合物である。基本構造となる化合物は、例えば、上記受付部７０１の受付結果から特定される。 Next, a case where the compound groups M1 to MR are classified into a set in which the specific compounds of the compound groups M1 to MR and other compounds having characteristics common to the specific compounds are grouped will be described. Here, the specific compound is, for example, a compound having a basic structure among the compound groups M1 to MR. The compound serving as the basic structure is specified from the reception result of the reception unit 701, for example.

比較部７０６は、化合物群Ｍ１〜ＭＲのうち特定の化合物の母核を表す文字列と、化合物群Ｍ１〜ＭＲのうち特定の化合物とは異なる他の化合物の母核を表す文字列とを比較する機能を有する。この場合、分類部７０５は、比較された比較結果に基づいて、化合物群Ｍ１〜ＭＲを分類することにしてもよい。 The comparison unit 706 compares the character string representing the mother nucleus of a specific compound in the compound groups M1 to MR with the character string representing the mother nucleus of another compound different from the specific compound in the compound groups M1 to MR. It has the function to do. In this case, the classifying unit 705 may classify the compound groups M1 to MR based on the compared result of comparison.

具体的には、例えば、分類部７０５が、特定の化合物と母核を表す文字列が同一の化合物の集合と、特定の化合物と母核を表す文字列が異なる化合物の集合とに化合物群Ｍ１〜ＭＲを分類する。これにより、化合物群Ｍ１〜ＭＲのうち基本構造となる化合物と母核が同一の化合物同士を分類することができる。 Specifically, for example, the classification unit 705 includes a compound group M1 into a set of compounds having the same character string representing a mother nucleus and a specific compound, and a set of compounds having different character strings representing a specific compound and a mother nucleus. Classify MR. This makes it possible to classify compounds having the same basic structure as the compound having the basic structure among the compound groups M1 to MR.

以下、特定の化合物と母核を表す文字列が同一の化合物群をさらに分類する分類部７０５の具体的な処理内容を表す（分類例７）〜（分類例１２）について説明する。 Hereinafter, (Classification Example 7) to (Classification Example 12) representing specific processing contents of the classification unit 705 that further classifies compound groups having the same character string representing the specific compound and the mother nucleus will be described.

（分類例７）
比較部７０６は、さらに、特定の化合物の置換基を表す文字列と、他の化合物の置換基を表す文字列とを比較することにしてもよい。この場合、分類部７０５が、例えば、特定の化合物と母核を表す文字列が同一かつ置換基を表す文字列が同一の化合物の第１集合と、第１集合の化合物とは異なる化合物の第２集合とに化合物群Ｍ１〜ＭＲを分類することにしてもよい。これにより、化合物群Ｍ１〜ＭＲのうち基本構造となる化合物と母核が同一かつ置換基が同一の化合物同士を分類することができる。 (Classification example 7)
The comparison unit 706 may further compare a character string representing a substituent of a specific compound with a character string representing a substituent of another compound. In this case, the classifying unit 705, for example, sets the first set of compounds having the same character string representing the mother nucleus and the same character string representing the substituent as the specific compound and the first set of compounds different from the first set of compounds. The compound groups M1 to MR may be classified into two sets. This makes it possible to classify the compounds having the same mother nucleus and the same substituents as the compounds having the basic structure in the compound groups M1 to MR.

なお、文字列の比較対象となる置換基は、例えば、特定の化合物の第ｊ置換基と、他の化合物の第ｊ置換基である。また、特定の化合物の第ｊ置換基が複合置換基の場合は、比較部７０６が、特定の化合物の第ｊ置換基を表す文字列と、他の化合物の第１〜第ｍ置換基のうち複合置換基となる置換基を表す文字列とを比較することにしてもよい。この際、他の化合物の第１〜第ｍ置換基のうち複合置換基となる置換基が複数存在する場合は、他の化合物の複合置換基となる複数の置換基のうち、特定の化合物の第ｊ置換基を表す文字列との類似度が最大の置換基を比較対象とすることにしてもよい。 In addition, the substituent used as the comparison object of a character string is the jth substituent of a specific compound and the jth substituent of another compound, for example. When the jth substituent of the specific compound is a composite substituent, the comparison unit 706 includes a character string representing the jth substituent of the specific compound and the first to mth substituents of the other compounds. You may decide to compare with the character string showing the substituent used as a composite substituent. At this time, when there are a plurality of substituents serving as composite substituents among the first to m-th substituents of other compounds, among the plurality of substituents serving as composite substituents of the other compounds, The substituent having the maximum similarity with the character string representing the j-th substituent may be set as a comparison target.

また、分類部７０５が、例えば、第２集合の化合物群を、特定の化合物と母核を表す文字列が同一の化合物の第３集合と、第３集合の化合物とは異なる化合物の第４集合とに分類することにしてもよい。これにより、化合物群Ｍ１〜ＭＲのうち基本構造となる化合物と母核が同一かつ置換基が異なる化合物同士を分類することができる。 In addition, the classification unit 705 may select, for example, the second set of compound groups, the third set of compounds having the same character string representing the mother nucleus as the specific compound, and the fourth set of compounds different from the third set of compounds. You may decide to classify. This makes it possible to classify compounds having the same mother nucleus and different substituents from the compound groups M1 to MR in the basic structure.

（分類例８）
比較部７０６は、さらに、特定の化合物の置換基数と、他の化合物の置換基数とを比較することにしてもよい。この場合、分類部７０５が、例えば、特定の化合物と母核を表す文字列が同一かつ置換基数が同一の化合物の第１集合と、第１集合の化合物とは異なる化合物の第２集合とに化合物群Ｍ１〜ＭＲを分類することにしてもよい。これにより、化合物群Ｍ１〜ＭＲのうち基本構造となる化合物と母核が同一かつ置換基数が同一の化合物同士を分類することができる。 (Classification example 8)
The comparison unit 706 may further compare the number of substituents of a specific compound with the number of substituents of another compound. In this case, for example, the classification unit 705 includes a first set of compounds having the same character string representing the nucleus and the same number of substituents as the specific compound, and a second set of compounds different from the first set of compounds. The compound groups M1 to MR may be classified. This makes it possible to classify the compounds having the same mother nucleus and the same number of substituents as the basic structure of the compound groups M1 to MR.

また、分類部７０５が、例えば、第２集合の化合物群を、特定の化合物と母核を表す文字列が同一の化合物の第３集合と、第３集合の化合物とは異なる化合物の第４集合とに分類することにしてもよい。これにより、化合物群Ｍ１〜ＭＲのうち基本構造となる化合物と母核が同一かつ置換基数が異なる化合物同士を分類することができる。 In addition, the classification unit 705 may select, for example, the second set of compound groups, the third set of compounds having the same character string representing the mother nucleus as the specific compound, and the fourth set of compounds different from the third set of compounds. You may decide to classify. This makes it possible to classify the compounds having the same base nucleus and different numbers of substituents from the compound groups M1 to MR.

（分類例９）
比較部７０６は、さらに、特定の化合物の置換基の結合位置を表す文字列と、他の化合物の置換基の結合位置を表す文字列とを比較することにしてもよい。この場合、分類部７０５が、例えば、特定の化合物と母核を表す文字列が同一かつ置換基の結合位置が同一の化合物の第１集合と、第１集合の化合物とは異なる化合物の第２集合とに化合物群Ｍ１〜ＭＲを分類することにしてもよい。これにより、化合物群Ｍ１〜ＭＲのうち基本構造となる化合物と母核が同一かつ置換基の結合位置が同一の化合物同士を分類することができる。 (Classification example 9)
The comparison unit 706 may further compare a character string that represents the bonding position of a substituent of a specific compound with a character string that represents the bonding position of a substituent of another compound. In this case, for example, the classification unit 705 may include a first set of compounds having the same character string representing the mother nucleus and a specific compound and the same bonding position of substituents, and a second set of compounds different from the first set of compounds. The compound groups M1 to MR may be classified into sets. Thereby, among the compound groups M1 to MR, compounds having the same mother nucleus and the same bonding position of substituents can be classified from the compounds having the basic structure.

また、分類部７０５が、第２集合の化合物群を、特定の化合物と母核を表す文字列が同一の化合物の第３集合と、第３集合の化合物とは異なる化合物の第４集合とに分類することにしてもよい。これにより、化合物群Ｍ１〜ＭＲのうち基本構造となる化合物と母核が同一かつ置換基の結合位置が異なる化合物同士を分類することができる。 Further, the classification unit 705 divides the second set of compound groups into a third set of compounds having the same character string representing the mother nucleus as a specific compound, and a fourth set of compounds different from the third set of compounds. You may decide to classify. This makes it possible to classify compounds having the same mother nucleus and different substituent bonding positions from among the compound groups M1 to MR.

（分類例１０）
比較部７０６は、さらに、特定の化合物の母核の構造の種類と、他の化合物の母核の構造の種類とを比較することにしてもよい。この場合、分類部７０５が、例えば、特定の化合物と母核を表す文字列が同一の化合物の第１集合と、特定の化合物と母核を表す文字列が異なり、かつ、特定の化合物と母核の構造の種類が同一の第２集合とに化合物群Ｍ１〜ＭＲを分類することにしてもよい。 (Classification example 10)
The comparison unit 706 may further compare the type of the structure of the mother nucleus of a specific compound with the type of structure of the mother nucleus of another compound. In this case, the classification unit 705, for example, differs from the first set of compounds having the same character string representing the mother nucleus with the specific compound, the character string representing the specific compound and the mother nucleus, and the specific compound and the mother character. The compound groups M1 to MR may be classified into a second set having the same nuclear structure type.

これにより、化合物群Ｍ１〜ＭＲのうち基本構造となる化合物と母核が同一の化合物同士を分類するとともに、基本構造となる化合物と母核は異なるが母核の構造の種類が基本構造となる化合物と同一の化合物同士を分類することができる。 As a result, compounds having the same basic structure as the compound having the basic structure among the compound groups M1 to MR are classified, and the compound having the basic structure is different from the mother nucleus, but the type of structure of the mother nucleus is the basic structure. The same compounds as the compounds can be classified.

（分類例１１）
比較部７０６は、さらに、特定の化合物の母核の構造式に含まれる炭素数と、他の化合物の母核の構造式に含まれる炭素数とを比較することにしてもよい。この場合、分類部７０５が、例えば、特定の化合物と母核を表す文字列が同一の化合物の第１集合と、特定の化合物と母核を表す文字列が異なり、かつ、特定の化合物と母核の炭素数が同一の第２集合とに化合物群Ｍ１〜ＭＲを分類することにしてもよい。 (Classification example 11)
The comparison unit 706 may further compare the number of carbons contained in the structural formula of the mother nucleus of a specific compound with the number of carbons contained in the structural formula of the mother nucleus of another compound. In this case, the classification unit 705, for example, differs from the first set of compounds having the same character string representing the mother nucleus with the specific compound, the character string representing the specific compound and the mother nucleus, and the specific compound and the mother character. The compound groups M1 to MR may be classified into a second set having the same number of carbon atoms in the nucleus.

これにより、化合物群Ｍ１〜ＭＲのうち基本構造となる化合物と母核が同一の化合物同士を分類するとともに、基本構造となる化合物と母核は異なるが母核の炭素数が基本構造となる化合物と同一の化合物同士を分類することができる。 This classifies compounds having the same basic structure as the compound having the basic structure among the compound groups M1 to MR, and the compound having the basic structure in which the number of carbon atoms in the mother nucleus is different from the basic structure. The same compounds can be classified.

（分類例１２）
比較部７０６は、さらに、特定の化合物の置換基の構造式に含まれる炭素数と、他の化合物の置換基の構造式に含まれる炭素数とを比較することにしてもよい。この場合、分類部７０５が、例えば、特定の化合物と母核を表す文字列が同一かつ置換基の炭素数が同一の化合物の第１集合と、第１集合の化合物とは異なる化合物の第２集合とに化合物群Ｍ１〜ＭＲを分類することにしてもよい。これにより、化合物群Ｍ１〜ＭＲのうち基本構造となる化合物と母核が同一かつ置換基の炭素数が同一の化合物同士を分類することができる。 (Classification example 12)
The comparison unit 706 may further compare the number of carbons included in the structural formula of the substituent of a specific compound with the number of carbons included in the structural formula of the substituent of another compound. In this case, for example, the classification unit 705 may include a first set of compounds having the same character string representing the mother nucleus as the specific compound and the same number of substituent carbon atoms, and a second set of compounds different from the first set of compounds. The compound groups M1 to MR may be classified into sets. Thereby, among the compound groups M1 to MR, compounds having the same mother nucleus and the same number of carbon atoms of substituents as those of the basic structure can be classified.

また、分類部７０５が、第２集合の化合物群を、特定の化合物と母核を表す文字列が同一の化合物の第３集合と、第３集合の化合物とは異なる化合物の第４集合とに分類することにしてもよい。これにより、化合物群Ｍ１〜ＭＲのうち基本構造となる化合物と母核が同一かつ置換基の炭素数が異なる化合物同士を分類することができる。 Further, the classification unit 705 divides the second set of compound groups into a third set of compounds having the same character string representing the mother nucleus as a specific compound, and a fourth set of compounds different from the third set of compounds. You may decide to classify. Thereby, among the compound groups M1 to MR, compounds having the same mother nucleus as the compounds having the basic structure and different substituent carbon numbers can be classified.

算出部７０７は、比較された比較結果に基づいて、特定の化合物と他の化合物との類似度合いを表す類似度を算出することにしてもよい。具体的には、例えば、算出部７０７が、ある項目について、他の化合物が特定の化合物と共通の項目値を有する場合、他の化合物の類似度に所定値を加算することにより、他の化合物の類似度を算出することにしてもよい。 The calculation unit 707 may calculate the degree of similarity representing the degree of similarity between a specific compound and another compound based on the compared result. Specifically, for example, when another compound has a common item value with a specific compound for a certain item, the calculation unit 707 adds the predetermined value to the similarity of the other compound to thereby calculate the other compound. The similarity may be calculated.

ここで、項目とは、例えば、母核を表す文字列、置換基を表す文字列、置換基数、置換基の結合位置、母核の構造の種類、母核の炭素数、置換基の炭素数などである。また、他の化合物の類似度の初期値は、例えば「０」である。所定値は、全項目で共通の値であってもよく、また、項目ごとに設定される値であってもよい。 Here, the item is, for example, a character string representing the mother nucleus, a character string representing the substituent, the number of substituents, the bonding position of the substituent, the type of structure of the mother nucleus, the number of carbons in the mother nucleus, the number of carbons in the substituent. Etc. The initial value of the similarity of other compounds is, for example, “0”. The predetermined value may be a value common to all items, or may be a value set for each item.

より具体的には、例えば、算出部７０７が、他の化合物の母核を表す文字列が特定の化合物と同一の場合、他の化合物の類似度に「３」を加算し、他の化合物の母核の構造の種類が特定の化合物と同一の場合、他の化合物の類似度に「１」を加算する。所定値は、予め設定されてＲＯＭ３０２、ＲＡＭ３０３、磁気ディスク３０５、光ディスク３０７などの記憶装置に記憶されている。 More specifically, for example, when the character string representing the mother nucleus of another compound is the same as the specific compound, the calculation unit 707 adds “3” to the similarity of the other compound, When the type of structure of the mother nucleus is the same as that of a specific compound, “1” is added to the similarity of other compounds. The predetermined value is set in advance and stored in a storage device such as the ROM 302, the RAM 303, the magnetic disk 305, and the optical disk 307.

判定部７０８は、抽出された化合物Ｍｒの第ｊ置換基を表す文字列に基づいて、化合物Ｍｒの第ｊ置換基が別の置換基を含む複合置換基か否かを判定する機能を有する。上述したように、有機化合物の化合物名において、複合置換基を表す文字列は、例えば、括弧やかぎ括弧で囲まれている。このため、判定部７０８が、例えば、化合物Ｍｒの第ｊ置換基を表す文字列が、括弧やかぎ括弧で囲まれた文字列か否かを判定することにより、第ｊ置換基が複合置換基か否かを判定することができる。 The determination unit 708 has a function of determining whether or not the j-th substituent of the compound Mr is a composite substituent including another substituent, based on the extracted character string representing the j-th substituent of the compound Mr. As described above, in the compound name of the organic compound, the character string representing the composite substituent is, for example, enclosed in parentheses and angle brackets. Therefore, the determination unit 708 determines, for example, whether the character string representing the jth substituent of the compound Mr is a character string enclosed in parentheses or angle brackets, whereby the jth substituent is a composite substituent. It can be determined whether or not.

設定部７０９は、化合物Ｍｒの第ｊ置換基が複合置換基であると判定された場合、化合物Ｍｒの第ｊ置換基を表す文字列を、分類対象となる化合物の化合物名に設定する機能を有する。この場合、検出部７０２は、構造式ＤＢ２３０を参照して、設定された分類対象となる化合物の化合物名の中から、該化合物の母核となる部分構造の名称を表す文字列を検出することにしてもよい。 The setting unit 709 has a function of setting a character string representing the jth substituent of the compound Mr to the compound name of the compound to be classified when it is determined that the jth substituent of the compound Mr is a composite substituent. Have. In this case, the detection unit 702 refers to the structural formula DB 230 to detect a character string representing the name of the partial structure serving as the mother nucleus of the compound from the set compound names of the compounds to be classified. It may be.

これにより、別の置換基を含む複合置換基を新たな分類対象となる化合物として、上記抽出部７０３、特定部７０４および分類部７０５等の一連の処理が再帰的に実行され、複合置換基を表す文字列を分類することができる。 As a result, a series of processes such as the extraction unit 703, the identification unit 704, the classification unit 705, and the like are recursively executed with the compound substituent including another substituent as a new classification target compound, and the compound substituent is converted into the compound substituent. The character string to be represented can be classified.

ただし、第２階層以降、すなわち、複合置換基を表す文字列を分類対象とする場合、検出部７０２は、例えば、構造式ＤＢ２３０の中から置換基Ｃｐの化合物名を選択する。そして、検出部７０２が、選択した置換基Ｃｐの化合物名の文字数ｔを特定する。つぎに、検出部７０２が、新たな分類対象となる化合物の化合物名の末尾からｔ文字の文字列と、置換基Ｃｐの化合物名とが一致するか否かを判断する。ここで、置換基Ｃｐの化合物名と一致する場合、検出部７０２が、新たな分類対象となる化合物の化合物名の末尾からｔ文字の文字列を、該化合物の母核を表す文字列として検出する。 However, when the second and subsequent layers, that is, character strings representing complex substituents are to be classified, the detection unit 702 selects, for example, the compound name of the substituent Cp from the structural formula DB 230. Then, the detection unit 702 specifies the number of characters t of the compound name of the selected substituent Cp. Next, the detection unit 702 determines whether the character string of t characters from the end of the compound name of the compound to be newly classified matches the compound name of the substituent Cp. Here, when it matches with the compound name of the substituent Cp, the detection unit 702 detects a character string of t characters from the end of the compound name of the compound to be newly classified as a character string representing the parent nucleus of the compound. To do.

また、分類部７０５は、化合物群Ｍ１〜ＭＲの母核比較テーブルを作成することにしてもよい。母核比較テーブルは、各化合物Ｍｒの母核の特徴を比較するための表データである。具体的には、例えば、分類部７０５が、分類した集合ごとに、該集合に含まれる各化合物Ｍｒの母核の化合物名、置換基数、置換基の結合位置、母核の構造の種類、母核の炭素数などを示す母核比較テーブルを作成することにしてもよい。 The classification unit 705 may create a mother nucleus comparison table for the compound groups M1 to MR. The mother nucleus comparison table is table data for comparing the characteristics of the mother nucleus of each compound Mr. Specifically, for example, for each set classified by the classification unit 705, the name of the mother nucleus of each compound Mr, the number of substituents, the bonding position of the substituent, the type of structure of the mother nucleus, the mother You may decide to create the mother nucleus comparison table which shows carbon number etc. of a nucleus.

この際、分類部７０５が、算出された特定の化合物との類似度合いを表す他の化合物の類似度に基づいて、各集合に含まれる他の化合物を特定の化合物との類似度が高い順にソートした母核比較テーブルを作成することにしてもよい。なお、母核比較テーブルの具体例については、図１１〜図１６を用いて後述する。 At this time, the classification unit 705 sorts the other compounds included in each set in descending order of the similarity with the specific compound based on the similarity of the other compound that represents the calculated similarity with the specific compound. Alternatively, the mother nucleus comparison table may be created. A specific example of the mother nucleus comparison table will be described later with reference to FIGS.

また、分類部７０５は、化合物群Ｍ１〜ＭＲの置換基比較テーブルを作成することにしてもよい。置換基比較テーブルは、各化合物Ｍｒの置換基の特徴を比較するための表データである。具体的には、例えば、分類部７０５が、分類した集合ごとに、該集合に含まれる各化合物Ｍｒの第ｊ置換基の化合物名、結合位置、炭素数などを示す置換基比較テーブルを作成することにしてもよい。 The classification unit 705 may create a substituent comparison table for the compound groups M1 to MR. The substituent comparison table is table data for comparing the characteristics of the substituents of each compound Mr. Specifically, for example, the classification unit 705 creates, for each classified set, a substituent comparison table indicating the compound name, bond position, carbon number, and the like of the jth substituent of each compound Mr included in the set. You may decide.

この際、分類部７０５が、算出された特定の化合物との類似度合いを表す他の化合物の類似度に基づいて、各集合に含まれる他の化合物を特定の化合物との類似度が高い順にソートした置換基比較テーブルを作成することにしてもよい。なお、置換基比較テーブルの具体例については、図１７〜図２１を用いて後述する。 At this time, the classification unit 705 sorts the other compounds included in each set in descending order of the similarity with the specific compound based on the similarity of the other compound that represents the calculated similarity with the specific compound. You may decide to create the substituent comparison table. A specific example of the substituent comparison table will be described later with reference to FIGS.

作成部７１０は、化合物群Ｍ１〜ＭＲの比較リストを作成する機能を有する。比較リストとは、各化合物Ｍｒの特徴を比較するための表データである。具体的には、例えば、作成部７１０が、母核比較テーブルおよび置換基比較テーブルを参照して、化合物群Ｍ１〜ＭＲの比較リストを作成することにしてもよい。 The creation unit 710 has a function of creating a comparison list of the compound groups M1 to MR. The comparison list is tabular data for comparing the characteristics of each compound Mr. Specifically, for example, the creation unit 710 may create a comparison list of the compound groups M1 to MR with reference to the mother nucleus comparison table and the substituent comparison table.

この際、作成部７１０が、算出された特定の化合物との類似度合いを表す他の化合物の類似度に基づいて、分類された各集合に含まれる他の化合物を特定の化合物との類似度が高い順にソートした比較リストを作成することにしてもよい。なお、比較リストの具体例については、図２２および図２３を用いて後述する。 At this time, based on the similarity of the other compound that represents the calculated degree of similarity with the specific compound, the creation unit 710 selects another compound included in each classified set as the degree of similarity with the specific compound. A comparison list sorted in descending order may be created. A specific example of the comparison list will be described later with reference to FIGS.

出力部７１１は、分類された分類結果を出力する機能を有する。具体的には、例えば、出力部７１１が、作成された母核比較テーブルの記憶内容や置換基比較テーブルの記憶内容を出力することにしてもよい。また、出力部７１１は、作成された比較リストを出力することにしてもよい。 The output unit 711 has a function of outputting the classified classification results. Specifically, for example, the output unit 711 may output the storage contents of the created mother nucleus comparison table and the storage contents of the substituent comparison table. The output unit 711 may output the created comparison list.

出力部７１１の出力形式としては、例えば、ディスプレイ３０９への表示、プリンタ（不図示）への印刷出力、Ｉ／Ｆ３０８による外部のコンピュータへの送信がある。外部のコンピュータは、例えば、化合物群Ｍ１〜ＭＲの各々の化合物の化合物名を表すテキストデータの送信元のクライアント装置２０１である。また、ＲＡＭ３０３、磁気ディスク３０５、光ディスク３０７などの記憶領域に記憶することとしてもよい。 Examples of the output format of the output unit 711 include display on the display 309, print output to a printer (not shown), and transmission to an external computer via the I / F 308. The external computer is, for example, the client apparatus 201 that is the transmission source of text data representing the compound name of each compound in the compound groups M1 to MR. Alternatively, the data may be stored in a storage area such as the RAM 303, the magnetic disk 305, and the optical disk 307.

（分割テーブル８００の記憶内容の変遷例）
つぎに、図８〜図１０を用いて、分割テーブル８００の記憶内容の変遷例について説明する。分割テーブル８００には、上記検出部７０２の検出結果および抽出部７０３の抽出結果が反映される。この結果、分割テーブル８００によれば、第ｉ階層の母核を表す文字列、第ｊ置換基を表す文字列および結合位置を判別することができる。 (Transition example of stored contents of the division table 800)
Next, transition examples of the storage contents of the division table 800 will be described with reference to FIGS. The division table 800 reflects the detection result of the detection unit 702 and the extraction result of the extraction unit 703. As a result, according to the division table 800, it is possible to determine the character string representing the parent nucleus of the i-th layer, the character string representing the j-th substituent, and the binding position.

図８〜図１０は、分割テーブル８００の記憶内容の変遷例を示す説明図である。図８において、分割テーブル８００は、化合物ＩＤおよび化合物名のフィールドを有する。各フィールドに情報を設定することで、各化合物Ｍｒの化合物名情報がレコードとして記憶される。ここで、化合物ＩＤは、化合物Ｍｒの識別子である。化合物名は、化合物Ｍｒの名称である。 8 to 10 are explanatory diagrams illustrating transition examples of storage contents of the division table 800. FIG. In FIG. 8, the division table 800 has fields for compound ID and compound name. By setting information in each field, compound name information of each compound Mr is stored as a record. Here, the compound ID is an identifier of the compound Mr. The compound name is the name of the compound Mr.

図８の（８−１）において、化合物Ｍ１〜Ｍ１０の化合物名Ｎ１〜Ｎ１０が各フィールドに設定された結果、化合物名情報８００−１〜８００−１０がレコードとして記憶されている。化合物Ｍ１〜Ｍ１０の化合物名Ｎ１〜Ｎ１０は、受付部７０１により、分類対象となる化合物の化合物名として受け付けられたものである。 In (8-1) of FIG. 8, as a result of setting compound names N1 to N10 of compounds M1 to M10 in each field, compound name information 800-1 to 800-10 is stored as a record. The compound names N1 to N10 of the compounds M1 to M10 are received by the receiving unit 701 as the compound names of the compounds to be classified.

図８の（８−２）において、検出部７０２により、各化合物名Ｎ１〜Ｎ１０の中から各化合物Ｍ１〜Ｍ１０の第１階層の母核を表す文字列が検出された結果、各化合物名Ｎ１〜Ｎ１０に第１階層の区切り記号が挿入されている。ここで、第ｉ階層の区切り記号とは、第ｉ階層の母核を表す文字列の直前に挿入される記号であり、例えば「／ｉ／」である。第ｉ階層の区切り記号によれば、化合物名Ｎｒの中から第ｉ階層の母核を表す文字列を識別することができる。 In (8-2) of FIG. 8, the detection unit 702 detects a character string representing the mother nucleus of the first layer of each of the compounds M1 to M10 from the compound names N1 to N10. A delimiter in the first layer is inserted in .about.N10. Here, the delimiter at the i-th layer is a symbol inserted immediately before the character string representing the mother nucleus of the i-th layer, and is “/ i /”, for example. According to the delimiter of the i-th layer, a character string representing the mother nucleus of the i-th layer can be identified from the compound name Nr.

例えば、化合物Ｍ１の化合物名Ｎ１「２−（３−メチル−４−ヒドロキシフェニル）プロパン」の中から、化合物Ｍ１の第１階層の母核を表す文字列「プロパン」が検出された結果、「プロパン」の直前に第１階層の区切り記号「／１／」が挿入されている。 For example, from the compound name N1 “2- (3-methyl-4-hydroxyphenyl) propane” of the compound M1, the character string “propane” representing the mother nucleus of the first layer of the compound M1 is detected. A delimiter “/ 1 /” in the first layer is inserted immediately before “propane”.

なお、化合物Ｍｒの第ｉ階層の母核を表す文字列が非検出であった場合、例えば、化合物Ｍｒの化合物名Ｎｒと関連付けて、第ｉ階層の母核を表す文字列が非検出であったことを示す不明フラグがＲＡＭ３０３、磁気ディスク３０５、光ディスク３０７などの記憶装置に記憶される。 When the character string representing the mother nucleus of the i-th layer of the compound Mr is not detected, for example, the character string representing the mother nucleus of the i-th layer is not detected in association with the compound name Nr of the compound Mr. An unknown flag indicating this is stored in a storage device such as the RAM 303, the magnetic disk 305, and the optical disk 307.

図９の（８−３）において、検出部７０２により、各化合物名Ｎ１〜Ｎ１０の中から各化合物Ｍ１〜Ｍ１０の第２階層の母核を表す文字列が検出された結果、各化合物名Ｎ１〜Ｎ１０に第２階層の区切り記号が挿入されている。 In (8-3) of FIG. 9, the detection unit 702 detects a character string representing the parent nucleus of the second hierarchy of each of the compounds M1 to M10 from each of the compound names N1 to N10. A delimiter in the second layer is inserted in .about.N10.

例えば、化合物Ｍ１の化合物名Ｎ１「２−（３−メチル−４−ヒドロキシフェニル）プロパン」の中から化合物Ｍ１の第２階層の母核を表す文字列「フェニル」が検出された結果、化合物名情報８００−１の化合物名の「フェニル」の直前に第２階層の区切り記号「／２／」が挿入されている。 For example, as a result of detection of the character string “phenyl” representing the mother nucleus of the second layer of the compound M1 from the compound name N1 “2- (3-methyl-4-hydroxyphenyl) propane” of the compound M1, the compound name A delimiter “/ 2 /” in the second hierarchy is inserted immediately before “phenyl” in the compound name of information 800-1.

図９の（８−４）において、抽出部７０３により、化合物名Ｍ４，Ｍ７のうち倍数接頭辞を含む置換基を表す文字列の「数字，数字−」の「，」が「−」に変換され、「数字，数字−」の２番目の数字の直前に「−」が挿入されている。例えば、化合物名情報８００−４の倍数接頭辞を含む化合物名の「２，３−」の「，」が「−」に変換され、「２，３−」の２番目の数字「３」の先頭に「−」が挿入されている。 In (8-4) of FIG. 9, the extraction unit 703 converts “,” in “numeric character, numeral −” of the character string representing the substituent including the multiple prefix among the compound names M4 and M7 to “−”. Then, “-” is inserted immediately before the second number of “number, number-”. For example, “,” of “2,3-” in the compound name including the multiple prefix of the compound name information 800-4 is converted to “−”, and the second number “3” of “2,3-” is converted. “-” Is inserted at the beginning.

図１０の（８−５）において、抽出部７０３により、化合物名Ｍ４，Ｍ７の倍数接頭辞を含む置換基を表す文字列から倍数接頭辞が削除され、倍数接頭辞が削除された削除後の文字列が「−−」の間に挿入されている。例えば、化合物名情報８００−４の化合物名Ｍ４の倍数接頭辞を含む置換基を表す文字列「ジメチル」から倍数接頭辞「ジ」が削除され、倍数接頭辞が削除された削除後の文字列「メチル」が「−−」の間に挿入されている。これにより、倍数接頭辞を含む置換基の結合位置を分割することができる。 In (8-5) of FIG. 10, the extraction unit 703 deletes the multiple prefix from the character string representing the substituent including the multiple prefix of the compound names M4 and M7, and deletes the multiple prefix. A character string is inserted between “-”. For example, the character string after deletion in which the multiple prefix “di” is deleted from the character string “dimethyl” representing the substituent including the multiple prefix of the compound name M4 in the compound name information 800-4, and the multiple prefix is deleted. "Methyl" is inserted between "-". Thereby, the bonding position of the substituent containing the multiple prefix can be divided.

図１０の（８−６）において、抽出部７０３により、各化合物名Ｎ１〜Ｎ１０の中から各化合物Ｍ１〜Ｍ１０の第１および第２階層の置換基を表す文字列が抽出された結果、各化合物名Ｎ１〜Ｎ１０に区切り記号が挿入されている。ここで、区切り記号とは、第ｉ階層の置換基を表す文字列の直後に挿入される記号であり、例えば「／／」である。区切り記号によれば、化合物名Ｎｒの中から第ｉ階層の置換基を表す文字列を識別することができる。 In (8-6) of FIG. 10, the extraction unit 703 extracts character strings representing the first and second hierarchy substituents of the compounds M1 to M10 from the compound names N1 to N10. Delimiters are inserted in the compound names N1 to N10. Here, the delimiter symbol is a symbol inserted immediately after a character string representing a substituent in the i-th layer, and is, for example, “//”. According to the delimiter, it is possible to identify a character string representing the i-th layer substituent from the compound name Nr.

例えば、化合物Ｍ１の化合物名Ｎ１「２−（３−メチル−４−ヒドロキシフェニル）プロパン」の中から、化合物Ｍ１の第２階層の第１置換基を表す文字列「メチル」が検出された結果、「メチル」の直後に区切り記号「／／」が挿入されている。また、化合物Ｍ１の第２階層の第２置換基を表す文字列「ヒドロキシ」が検出された結果、「ヒドロキシ」の直後に区切り記号「／／」が挿入されている。 For example, as a result of detecting the character string “methyl” representing the first substituent in the second layer of the compound M1 from the compound name N1 “2- (3-methyl-4-hydroxyphenyl) propane” of the compound M1. , “/” Is inserted immediately after “methyl”. Further, as a result of detecting the character string “hydroxy” representing the second substituent of the second layer of the compound M1, a delimiter “//” is inserted immediately after “hydroxy”.

なお、化合物Ｍｒの第ｉ階層の第ｊ置換基を表す文字列が非抽出であった場合、例えば、化合物Ｍｒの化合物名Ｎｒと関連付けて、第ｉ階層の第ｊ置換基を表す文字列が非抽出であったことを示す不明フラグがＲＡＭ３０３、磁気ディスク３０５、光ディスク３０７などの記憶装置に記憶される。 If the character string representing the j-th substituent in the i-th layer of the compound Mr is not extracted, for example, a character string representing the j-th substituent in the i-th layer is associated with the compound name Nr of the compound Mr. An unknown flag indicating non-extraction is stored in a storage device such as the RAM 303, the magnetic disk 305, and the optical disk 307.

分割テーブル８００によれば、各化合物Ｍ１〜Ｍ１０の第１および第２階層の母核を表す文字列、第１および第２階層の第ｊ置換基を表す文字列および結合位置を判別することができる。ただし、各階層の第１置換基の直前の「数字−」は、母核に結合する第１置換基の結合位置である。また、「−数字−」は、母核に結合する第２以降の置換基の結合位置である。 According to the division table 800, it is possible to determine the character strings representing the first and second layer mother nuclei, the character strings representing the first and second layer jth substituents, and the bonding positions of the compounds M1 to M10. it can. However, the “number-” immediately before the first substituent in each hierarchy is the bonding position of the first substituent bonded to the mother nucleus. Further, “-number-” is the bonding position of the second and subsequent substituents bonded to the mother nucleus.

例えば、化合物名情報８００−１によれば、化合物Ｍ１の第１階層の母核を表す文字列「プロパン」および第１階層の複合置換基「３−メチル−４−ヒドロキシフェニル」の結合位置「２」を判別することができる。また、化合物Ｍ１の第２階層の母核を表す文字列「フェニル」、第２階層の第１置換基を表す文字列「メチル」および結合位置「３」、第２階層の第２置換基を表す文字列「ヒドロキシ」および結合位置「４」を判別することができる。 For example, according to the compound name information 800-1, the character string “propane” representing the first nucleus of the compound M1 and the binding position “3-methyl-4-hydroxyphenyl” of the first hierarchy compound substituent “3-methyl-4-hydroxyphenyl” 2 "can be discriminated. In addition, the character string “phenyl” representing the mother nucleus of the second hierarchy of the compound M1, the character string “methyl” representing the first substituent of the second hierarchy and the bonding position “3”, the second substituent of the second hierarchy The character string “hydroxy” and the bonding position “4” can be discriminated.

（母核比較テーブルの記憶内容の変遷例）
つぎに、図１１〜図１６を用いて、図１０の（８−６）に示した分割テーブル８００の記憶内容に基づく母核比較テーブルの記憶内容の変遷例について説明する。以下の説明では、化合物Ｍ１〜Ｍ１０のうち化合物Ｍ１が基本構造となる化合物として指定された場合を例に挙げて説明する。 (Transition example of stored contents of mother nucleus comparison table)
Next, a transition example of the storage contents of the mother nucleus comparison table based on the storage contents of the division table 800 shown in (8-6) of FIG. 10 will be described with reference to FIGS. In the following description, the case where the compound M1 is designated as the compound having the basic structure among the compounds M1 to M10 will be described as an example.

図１１〜図１６は、母核比較テーブルの記憶内容の変遷例を示す説明図である。図１１において、母核比較テーブル１１００は、化合物ＩＤ、階層名、母核の化合物名、同一フラグ、結合位置、同一フラグ、置換基数、同一フラグ、母核炭素数、同一フラグ、種類および同一フラグのフィールドを有する。各フィールドに情報を設定することで、化合物Ｍ１〜Ｍ１０ごとの母核比較情報がレコードとして記憶される。 FIGS. 11-16 is explanatory drawing which shows the example of a change of the memory content of a mother nucleus comparison table. In FIG. 11, the mother nucleus comparison table 1100 includes a compound ID, a hierarchical name, a mother nucleus compound name, the same flag, a bonding position, the same flag, the number of substituents, the same flag, the number of mother carbon, the same flag, the type, and the same flag. Field. By setting information in each field, mother nucleus comparison information for each of the compounds M1 to M10 is stored as a record.

ここで、化合物ＩＤは、化合物Ｍｒの識別子である。階層名は、第ｉ階層の名称である。例えば、第１階層の名称は「第１」である。母核の化合物名は、化合物Ｍｒの第ｉ階層の母核を表す化合物の名称である。同一フラグは、母核の化合物名が、基本構造となる化合物と同一か否かを示すフラグである。同一フラグは、初期状態では「０」であり、基本構造となる化合物と同一の場合に「１」が設定される。 Here, the compound ID is an identifier of the compound Mr. The hierarchy name is the name of the i-th hierarchy. For example, the name of the first hierarchy is “first”. The compound name of the mother nucleus is the name of the compound that represents the mother nucleus of the i-th layer of the compound Mr. The same flag is a flag indicating whether or not the compound name of the mother nucleus is the same as the compound having the basic structure. The same flag is “0” in the initial state, and “1” is set when it is the same as the compound serving as the basic structure.

結合位置は、化合物Ｍｒの第ｉ階層の母核に結合する第１〜第ｍ置換基の結合位置である。同一フラグは、母核に結合する第１〜第ｍ置換基の結合位置が、基本構造となる化合物と同一か否かを示すフラグである。置換基数は、化合物Ｍｒの第ｉ階層の母核に結合する置換基の数である。同一フラグは、母核に結合する置換基の数が、基本構造となる化合物と同一か否かを示すフラグである。 The bonding position is the bonding position of the first to m-th substituents bonded to the i-th layer mother nucleus of the compound Mr. The same flag is a flag indicating whether or not the bonding positions of the first to m-th substituents bonded to the mother nucleus are the same as the compound serving as the basic structure. The number of substituents is the number of substituents bonded to the mother nucleus in the i-th layer of the compound Mr. The same flag is a flag indicating whether or not the number of substituents bonded to the mother nucleus is the same as that of the compound having the basic structure.

母核炭素数は、化合物Ｍｒの第ｉ階層の母核の構造式に含まれる炭素数である。同一フラグは、母核の構造式に含まれる炭素数が、基本構造となる化合物と同一か否かを示すフラグである。種類は、化合物Ｍｒの第ｉ階層の母核の構造の種類である。同一フラグは、母核の構造の種類が、基本構造となる化合物と同一か否かを示すフラグである。 The number of carbon atoms in the nucleus is the number of carbon atoms contained in the structural formula of the nucleus in the i-th layer of the compound Mr. The same flag is a flag indicating whether or not the number of carbon atoms contained in the structural formula of the mother nucleus is the same as that of the compound having the basic structure. The type is the type of the structure of the mother nucleus in the i-th layer of the compound Mr. The same flag is a flag indicating whether or not the type of structure of the mother nucleus is the same as the compound that is the basic structure.

図１１において、分類部７０５により、図１０の（８−６）に示した分割テーブル８００を参照して、化合物Ｍ１〜Ｍ１０の第１階層の母核の化合物名が設定されている。また、分類部７０５により、分割テーブル８００を参照して、化合物Ｍ１〜Ｍ１０の第１階層の母核に結合する第１階層の各置換基の結合位置が設定されている。 In FIG. 11, the classifying unit 705 refers to the division table 800 shown in (8-6) of FIG. 10, and sets the compound names of the mother nuclei in the first hierarchy of the compounds M1 to M10. In addition, the classification unit 705 refers to the division table 800 to set the bonding position of each substituent in the first layer that binds to the parent nucleus of the first layer of the compounds M1 to M10.

図１２において、分類部７０５により、特定部７０４によって特定された化合物Ｍ１〜Ｍ１０の第１階層の母核に結合する置換基の置換基数が設定されている。また、分類部７０５により、特定部７０４によって特定された化合物Ｍ１〜Ｍ１０の第１階層の母核の構造式に含まれる炭素数が設定されている。また、分類部７０５により、特定部７０４によって特定された化合物Ｍ１〜Ｍ１０の第１階層の母核の構造の種類が設定されている。 In FIG. 12, the classification unit 705 sets the number of substituents that are bonded to the parent nucleus of the first layer of the compounds M 1 to M 10 specified by the specifying unit 704. In addition, the classification unit 705 sets the number of carbons included in the structural formula of the mother nucleus of the first layer of the compounds M1 to M10 specified by the specifying unit 704. In addition, the classification unit 705 sets the type of the structure of the mother nucleus of the first layer of the compounds M1 to M10 specified by the specifying unit 704.

図１３において、分類部７０５により、比較部７０６によって比較された比較結果に基づいて、第１階層の母核の化合物名が、基本構造となる化合物Ｍ１と同一となる化合物Ｍ１〜Ｍ５の同一フラグに「１」が設定されている。また、分類部７０５により、比較された比較結果に基づいて、第１階層の母核に結合する各置換基の結合位置が、基本構造となる化合物Ｍ１と同一となる化合物Ｍ１〜Ｍ５の同一フラグに「１」が設定されている。 In FIG. 13, based on the comparison result compared by the comparing unit 706 by the classifying unit 705, the same flag of the compounds M1 to M5 in which the compound name of the mother nucleus in the first hierarchy is the same as the compound M1 as the basic structure Is set to “1”. Further, based on the comparison result compared by the classification unit 705, the same flag of the compounds M1 to M5 in which the bonding position of each substituent bonded to the parent nucleus of the first hierarchy is the same as the compound M1 as the basic structure Is set to “1”.

また、分類部７０５により、比較された比較結果に基づいて、第１階層の母核に結合する置換基数が、基本構造となる化合物Ｍ１と同一となる化合物Ｍ１〜Ｍ５，Ｍ８〜Ｍ１０の同一フラグに「１」が設定されている。また、分類部７０５により、比較された比較結果に基づいて、第１階層の母核の構造式に含まれる炭素数が、基本構造となる化合物Ｍ１と同一となる化合物Ｍ１〜Ｍ５の同一フラグに「１」が設定されている。また、分類部７０５により、比較された比較結果に基づいて、第１階層の母核の構造の種類が、基本構造となる化合物Ｍ１と同一となる化合物Ｍ１〜Ｍ９の同一フラグに「１」が設定されている。 In addition, based on the comparison result compared by the classification unit 705, the same flag of the compounds M1 to M5 and M8 to M10 in which the number of substituents bonded to the mother nucleus of the first hierarchy is the same as the compound M1 as the basic structure Is set to “1”. Further, based on the comparison result compared by the classification unit 705, the number of carbons included in the structural formula of the mother nucleus of the first hierarchy is set to the same flag of the compounds M1 to M5 that are the same as the compound M1 that is the basic structure. “1” is set. Further, based on the comparison result compared by the classification unit 705, “1” is set in the same flag of the compounds M1 to M9 in which the type of the structure of the mother nucleus in the first layer is the same as the compound M1 as the basic structure. Is set.

図１４において、分類部７０５により、分割テーブル８００を参照して、化合物Ｍ１〜Ｍ１０の第２階層の母核の化合物名が設定されている。また、分類部７０５により、分割テーブル８００を参照して、化合物Ｍ１〜Ｍ１０の第２階層の母核に結合する第２階層の各置換基の結合位置が設定されている。 In FIG. 14, the classifying unit 705 refers to the division table 800 and sets the compound names of the mother nuclei in the second hierarchy of the compounds M1 to M10. In addition, the classification unit 705 refers to the division table 800 to set the bonding position of each substituent in the second hierarchy that binds to the parent nucleus of the second hierarchy of the compounds M1 to M10.

図１５において、分類部７０５により、特定部７０４によって特定された化合物Ｍ１〜Ｍ１０の第２階層の母核に結合する置換基の置換基数が設定されている。また、分類部７０５により、特定部７０４によって特定された化合物Ｍ１〜Ｍ１０の第２階層の母核の構造式に含まれる炭素数が設定されている。 In FIG. 15, the classification unit 705 sets the number of substituents that are bonded to the parent nucleus of the second hierarchy of the compounds M 1 to M 10 specified by the specifying unit 704. In addition, the classification unit 705 sets the number of carbons included in the structural formula of the parent nucleus of the second hierarchy of the compounds M1 to M10 specified by the specifying unit 704.

図１６において、分類部７０５により、比較部７０６によって比較された比較結果に基づいて、第２階層の母核の化合物名が、基本構造となる化合物Ｍ１と同一となる化合物Ｍ１〜Ｍ１０の同一フラグに「１」が設定されている。また、分類部７０５により、比較された比較結果に基づいて、第２階層の母核に結合する各置換基の結合位置が、基本構造となる化合物Ｍ１と同一となる化合物Ｍ１，Ｍ６〜Ｍ１０の同一フラグに「１」が設定されている。 In FIG. 16, based on the comparison result compared by the classification unit 705 by the classification unit 705, the same flag of the compounds M1 to M10 in which the compound name of the parent nucleus in the second hierarchy is the same as the compound M1 as the basic structure Is set to “1”. Further, based on the comparison result compared by the classification unit 705, the bonding positions of the substituents bonded to the parent nucleus of the second hierarchy are the same as those of the compounds M1, M6 to M10 in which the bonding positions of the compounds M1 that are the basic structure are the same. “1” is set in the same flag.

また、分類部７０５により、比較された比較結果に基づいて、第２階層の母核に結合する置換基数が、基本構造となる化合物Ｍ１と同一となる化合物Ｍ１，Ｍ２，Ｍ６〜Ｍ１０の同一フラグに「１」が設定されている。また、分類部７０５により、比較された比較結果に基づいて、第２階層の母核の構造式に含まれる炭素数が、基本構造となる化合物Ｍ１と同一となる化合物Ｍ１〜Ｍ１０の同一フラグに「１」が設定されている。 In addition, based on the comparison result compared by the classification unit 705, the same flag of the compounds M1, M2, M6 to M10 in which the number of substituents bonded to the mother nucleus of the second hierarchy is the same as the compound M1 as the basic structure Is set to “1”. In addition, based on the comparison result compared by the classification unit 705, the number of carbons included in the structural formula of the parent nucleus in the second hierarchy is set to the same flag of the compounds M1 to M10 that are the same as the compound M1 that is the basic structure. “1” is set.

ここで、分類部７０５は、母核比較テーブル１１００の記憶内容に基づいて、分類対象となる化合物Ｍ１〜Ｍ１０を分類することにしてもよい。ここでは、第１階層の母核の化合物名、結合位置、置換基数、母核炭素数および種類が、基本構造となる化合物Ｍ１と同一となる化合物Ｍ１〜Ｍ５と、それ以外の化合物Ｍ６〜Ｍ１０とに分類されている。 Here, the classification unit 705 may classify the compounds M1 to M10 to be classified based on the stored contents of the nucleus comparison table 1100. Here, compounds M1 to M5 in which the compound name, bonding position, number of substituents, number of carbon atoms and type of mother nucleus in the first layer are the same as the compound M1 as the basic structure, and other compounds M6 to M10 are used. And is classified.

（置換基比較テーブルの記憶内容の変遷例）
つぎに、図１７〜図２１を用いて、図１０の（８−６）に示した分割テーブル８００の記憶内容に基づく置換基比較テーブルの記憶内容の変遷例について説明する。 (Transition example of the contents of the substituent comparison table)
Next, transition examples of the storage contents of the substituent comparison table based on the storage contents of the division table 800 shown in (8-6) of FIG. 10 will be described with reference to FIGS.

図１７〜図２１は、置換基比較テーブルの記憶内容の変遷例を示す説明図である。図１７において、置換基比較テーブル１７００は、化合物ＩＤ、階層名、第ｊ置換基の化合物名、同一フラグ、結合位置、同一フラグ、置換基炭素数および同一フラグのフィールドを有する。各フィールドに情報を設定することで、化合物Ｍ１〜Ｍ１０ごとの置換基比較情報がレコードとして記憶される。 FIGS. 17-21 is explanatory drawing which shows the example of a transition of the memory content of a substituent comparison table. In FIG. 17, the substituent comparison table 1700 has fields of compound ID, hierarchical name, compound name of the j-th substituent, the same flag, a bonding position, the same flag, the number of substituent carbon atoms, and the same flag. By setting information in each field, substituent comparison information for each of the compounds M1 to M10 is stored as a record.

ここで、化合物ＩＤは、化合物Ｍｒの識別子である。階層名は、第ｉ階層の名称である。第ｊ置換基の化合物名は、化合物Ｍｒの第ｉ階層の第ｊ置換基を表す化合物の名称である。同一フラグは、第ｊ置換基の化合物名が、基本構造となる化合物と同一か否かを示すフラグである。 Here, the compound ID is an identifier of the compound Mr. The hierarchy name is the name of the i-th hierarchy. The compound name of the j-th substituent is the name of the compound representing the j-th substituent in the i-th layer of the compound Mr. The same flag is a flag indicating whether or not the compound name of the j-th substituent is the same as the compound having the basic structure.

結合位置は、化合物Ｍｒの第ｉ階層の母核に結合する第ｊ置換基の結合位置である。同一フラグは、第ｊ置換基の結合位置が、基本構造となる化合物と同一か否かを示すフラグである。置換基炭素数は、化合物Ｍｒの第ｉ階層の第ｊ置換基の構造式に含まれる炭素数である。同一フラグは、第ｊ置換基の構造式に含まれる炭素数が、基本構造となる化合物と同一か否かを示すフラグである。 The bonding position is the bonding position of the j-th substituent bonded to the mother nucleus in the i-th layer of the compound Mr. The same flag is a flag indicating whether or not the bonding position of the jth substituent is the same as that of the compound serving as the basic structure. The number of carbon atoms in the substituent is the number of carbon atoms included in the structural formula of the j-th substituent in the i-th layer of the compound Mr. The same flag is a flag indicating whether or not the number of carbon atoms contained in the structural formula of the j-th substituent is the same as that of the compound having the basic structure.

なお、各同一フラグの判定は、基本構造となる化合物Ｍ１の第ｊ置換基と各化合物Ｍ２〜Ｍ１０の第ｊ置換基とを比較することにより行われる。 The determination of each identical flag is performed by comparing the jth substituent of the compound M1 serving as the basic structure with the jth substituent of each of the compounds M2 to M10.

図１７において、分類部７０５により、図１０の（８−６）に示した分割テーブル８００を参照して、化合物Ｍ１〜Ｍ１０の第１階層の第１置換基の化合物名が設定されている。また、分類部７０５により、分割テーブル８００を参照して、化合物Ｍ１〜Ｍ１０の第１階層の母核に結合する第１置換基の結合位置が設定されている。ただし、第ｊ置換基が複合置換基の場合、第ｊ置換基の化合物名フィールドに「複」が設定される。 In FIG. 17, the classification unit 705 refers to the partition table 800 shown in (8-6) of FIG. 10 to set the compound name of the first substituent in the first layer of the compounds M1 to M10. In addition, the classification unit 705 refers to the division table 800 to set the bonding position of the first substituent that bonds to the parent nucleus of the first layer of the compounds M1 to M10. However, when the jth substituent is a composite substituent, “compound” is set in the compound name field of the jth substituent.

また、分類部７０５により、化合物Ｍ６，Ｍ７の第１階層の第２置換基の化合物名が設定されている。また、分類部７０５により、分割テーブル８００を参照して、化合物Ｍ６，Ｍ７の第１階層の母核に結合する第２置換基の結合位置が設定されている。また、分類部７０５により、化合物Ｍ７の第１階層の第３置換基の化合物名が設定されている。また、分類部７０５により、分割テーブル８００を参照して、化合物Ｍ７の第１階層の母核に結合する第３置換基の結合位置が設定されている。 In addition, the classifying unit 705 sets the compound name of the second substituent in the first layer of the compounds M6 and M7. In addition, the classification unit 705 refers to the partition table 800 to set the bonding position of the second substituent that bonds to the parent nucleus of the first hierarchy of the compounds M6 and M7. In addition, the classification unit 705 sets the compound name of the third substituent in the first layer of the compound M7. In addition, the classification unit 705 refers to the division table 800 to set the bonding position of the third substituent bonded to the parent nucleus of the first layer of the compound M7.

図１８において、分類部７０５により、特定部７０４によって特定された化合物Ｍ１〜Ｍ１０の第１階層の第１置換基の構造式に含まれる炭素数が設定されている。また、分類部７０５により、特定された化合物Ｍ６，Ｍ７の第１階層の第２置換基の構造式に含まれる炭素数が設定されている。また、分類部７０５により、特定された化合物Ｍ７の第１階層の第３置換基の構造式に含まれる炭素数が設定されている。 In FIG. 18, the classification unit 705 sets the number of carbon atoms included in the structural formula of the first substituent in the first layer of the compounds M 1 to M 10 specified by the specifying unit 704. In addition, the classification unit 705 sets the number of carbon atoms included in the structural formula of the second substituent in the first layer of the identified compounds M6 and M7. In addition, the classification unit 705 sets the number of carbon atoms included in the structural formula of the third substituent in the first layer of the specified compound M7.

図１９において、分類部７０５により、比較部７０６によって比較された比較結果に基づいて、第１階層の第１置換基の化合物名が、基本構造となる化合物Ｍ１と同一となる化合物Ｍ１〜Ｍ５，Ｍ７〜Ｍ１０の同一フラグに「１」が設定されている。また、分類部７０５により、比較された比較結果に基づいて、第１階層の母核に結合する第１置換基の結合位置が、基本構造となる化合物Ｍ１と同一となる化合物Ｍ１〜Ｍ５の同一フラグに「１」が設定されている。また、分類部７０５により、比較された比較結果に基づいて、第１階層の第１置換基の構造式に含まれる炭素数が、基本構造となる化合物Ｍ１と同一となる化合物Ｍ１，Ｍ２，Ｍ７〜Ｍ１０の同一フラグに「１」が設定されている。 In FIG. 19, based on the comparison result compared by the classification unit 705 by the classification unit 705, the compounds M1 to M5 in which the compound name of the first substituent in the first hierarchy is the same as the compound M1 as the basic structure “1” is set in the same flag of M7 to M10. In addition, based on the comparison result compared by the classification unit 705, the bonding positions of the first substituent bonded to the parent nucleus of the first hierarchy are the same as those of the compounds M1 to M5 that are the same as the compound M1 serving as the basic structure. “1” is set in the flag. In addition, based on the comparison result compared by the classification unit 705, the compounds M1, M2, and M7 in which the number of carbon atoms included in the structural formula of the first substituent in the first hierarchy is the same as the compound M1 that is the basic structure. “1” is set in the same flag of .about.M10.

図２０において、分類部７０５により、図１０の（８−６）に示した分割テーブル８００を参照して、化合物Ｍ１〜Ｍ１０の第２階層の第１置換基の化合物名が設定されている。また、分類部７０５により、分割テーブル８００を参照して、化合物Ｍ１〜Ｍ１０の第２階層の母核に結合する第１置換基の結合位置が設定されている。また、分類部７０５により、特定部７０４によって特定された化合物Ｍ１〜Ｍ１０の第２階層の第１置換基の構造式に含まれる炭素数が設定されている。 In FIG. 20, the classification unit 705 refers to the partition table 800 shown in (8-6) of FIG. 10, and sets the compound names of the first substituents in the second hierarchy of the compounds M1 to M10. In addition, the classification unit 705 refers to the division table 800 to set the bonding position of the first substituent that bonds to the parent nucleus of the second hierarchy of the compounds M1 to M10. In addition, the classification unit 705 sets the number of carbon atoms included in the structural formula of the first substituent in the second layer of the compounds M1 to M10 specified by the specifying unit 704.

また、分類部７０５により、分割テーブル８００を参照して、化合物Ｍ１〜Ｍ１０の第２階層の第２置換基の化合物名が設定されている。また、分類部７０５により、分割テーブル８００を参照して、化合物Ｍ１〜Ｍ１０の第２階層の母核に結合する第２置換基の結合位置が設定されている。また、分類部７０５により、特定された化合物Ｍ１〜Ｍ１０の第２階層の第２置換基の構造式に含まれる炭素数が設定されている。 In addition, the classification unit 705 refers to the division table 800 to set the compound name of the second substituent of the second hierarchy of the compounds M1 to M10. In addition, the classification unit 705 refers to the division table 800 to set the bonding position of the second substituent that bonds to the parent nucleus of the second hierarchy of the compounds M1 to M10. Further, the classification unit 705 sets the number of carbon atoms included in the structural formula of the second substituent of the second hierarchy of the identified compounds M1 to M10.

また、分類部７０５により、分割テーブル８００を参照して、化合物Ｍ３〜Ｍ５の第２階層の第３置換基の化合物名が設定されている。また、分類部７０５により、分割テーブル８００を参照して、化合物Ｍ３〜Ｍ５の第２階層の母核に結合する第３置換基の結合位置が設定されている。また、分類部７０５により、特定された化合物Ｍ３〜Ｍ５の第２階層の第３置換基の構造式に含まれる炭素数が設定されている。 Further, the classification unit 705 refers to the division table 800 to set the compound name of the third substituent in the second hierarchy of the compounds M3 to M5. In addition, the classification unit 705 refers to the division table 800 to set the bonding position of the third substituent that bonds to the parent nucleus of the second hierarchy of the compounds M3 to M5. In addition, the classification unit 705 sets the number of carbon atoms included in the structural formula of the third substituent of the second hierarchy of the specified compounds M3 to M5.

また、分類部７０５により、分割テーブル８００を参照して、化合物Ｍ４の第２階層の第４置換基の化合物名が設定されている。また、分類部７０５により、分割テーブル８００を参照して、化合物Ｍ４の第２階層の母核に結合する第４置換基の結合位置が設定されている。また、分類部７０５により、特定された化合物Ｍ４の第２階層の第４置換基の構造式に含まれる炭素数が設定されている。 In addition, the classification unit 705 refers to the division table 800 to set the compound name of the fourth substituent in the second layer of the compound M4. In addition, the classification unit 705 refers to the division table 800 to set the bonding position of the fourth substituent that bonds to the parent nucleus of the second layer of the compound M4. In addition, the classification unit 705 sets the number of carbon atoms included in the structural formula of the fourth substituent in the second layer of the specified compound M4.

図２１において、分類部７０５により、比較部７０６によって比較された比較結果に基づいて、第２階層の第１置換基の化合物名が、基本構造となる化合物Ｍ１と同一となる化合物Ｍ１〜Ｍ４，Ｍ６〜Ｍ１０の同一フラグに「１」が設定されている。また、分類部７０５により、比較された比較結果に基づいて、第２階層の母核に結合する第１置換基の結合位置が、基本構造となる化合物Ｍ１と同一となる化合物Ｍ１，Ｍ３，Ｍ５〜Ｍ１０の同一フラグに「１」が設定されている。また、分類部７０５により、比較された比較結果に基づいて、第２階層の第１置換基の構造式に含まれる炭素数が、基本構造となる化合物Ｍ１と同一となる化合物Ｍ１〜Ｍ１０の同一フラグに「１」が設定されている。 In FIG. 21, based on the comparison result compared by the comparing unit 706 by the classifying unit 705, the compounds M1 to M4 in which the compound name of the first substituent in the second hierarchy is the same as the compound M1 as the basic structure “1” is set in the same flag of M6 to M10. Further, based on the comparison result compared by the classification unit 705, compounds M1, M3, and M5 in which the bonding position of the first substituent bonded to the parent nucleus of the second hierarchy is the same as the compound M1 that is the basic structure. “1” is set in the same flag of .about.M10. In addition, based on the comparison result compared by the classification unit 705, the number of carbon atoms contained in the structural formula of the first substituent in the second hierarchy is the same as that of the compounds M1 to M10 that are the same as the compound M1 that is the basic structure. “1” is set in the flag.

また、分類部７０５により、比較された比較結果に基づいて、第２階層の第２置換基の化合物名が、基本構造となる化合物Ｍ１と同一となる化合物Ｍ１〜Ｍ３，Ｍ５〜Ｍ１０の同一フラグに「１」が設定されている。また、分類部７０５により、比較された比較結果に基づいて、第２階層の母核に結合する第２置換基の結合位置が、基本構造となる化合物Ｍ１と同一となる化合物Ｍ１〜Ｍ３，Ｍ５〜Ｍ１０の同一フラグに「１」が設定されている。 Further, based on the comparison result compared by the classification unit 705, the same flag of the compounds M1 to M3 and M5 to M10 in which the compound name of the second substituent in the second hierarchy is the same as the compound M1 as the basic structure Is set to “1”. Further, based on the comparison result compared by the classification unit 705, compounds M1 to M3, M5 in which the bonding position of the second substituent bonded to the parent nucleus of the second hierarchy is the same as the compound M1 as the basic structure “1” is set in the same flag of .about.M10.

また、分類部７０５により、比較された比較結果に基づいて、第２階層の第２置換基の構造式に含まれる炭素数が、基本構造となる化合物Ｍ１と同一となる化合物Ｍ１〜Ｍ３，Ｍ５〜Ｍ１０の同一フラグに「１」が設定されている。 Further, based on the comparison result compared by the classification unit 705, compounds M1 to M3, M5 in which the carbon number contained in the structural formula of the second substituent in the second hierarchy is the same as the compound M1 as the basic structure “1” is set in the same flag of .about.M10.

また、母核比較テーブル１１００および置換基比較テーブル１７００内のフィールドに設定する情報が不明な場合は、該フィールドの情報が不明であることを示す情報、例えば、「不明」という文字列が該フィールドに設定されることにしてもよい。具体的には、例えば、化合物Ｍｒの第ｉ階層の母核を表す文字列が非検出であったことを示す不明フラグが記憶装置に記憶されている場合、母核比較テーブル１１００内の化合物Ｍｒの第１階層の母核の化合物名フィールドに「不明」が設定される。 If the information set in the fields in the nucleus comparison table 1100 and the substituent comparison table 1700 is unknown, information indicating that the information in the field is unknown, for example, a character string “unknown” It may be set to. Specifically, for example, when an unknown flag indicating that the character string representing the mother nucleus of the i-th layer of the compound Mr is not detected is stored in the storage device, the compound Mr in the mother nucleus comparison table 1100 is stored. "Unknown" is set in the compound name field of the mother nucleus of the first hierarchy of

（比較リストの具体例）
つぎに、図２２および図２３を用いて、分類対象となる化合物Ｍ１〜Ｍ１０の比較リストの具体例について説明する。比較リストは、例えば、化合物分類装置１００のディスプレイ３０９やクライアント装置２０１のディスプレイ（不図示）に表示される。 (Specific example of comparison list)
Next, specific examples of a comparison list of compounds M1 to M10 to be classified will be described with reference to FIGS. The comparison list is displayed on, for example, the display 309 of the compound classification device 100 or the display (not shown) of the client device 201.

図２２および図２３は、比較リストの具体例を示す説明図である。図２２および図２３において、比較リスト２２００は、分類対象となる化合物Ｍ１〜Ｍ１０の特徴を比較するための表データである。比較リスト２２００は、作成部７１０により、図１３および図１６に示した母核比較テーブル１１００と、図１９および図２１に示した置換基比較テーブル１７００とをマージすることにより作成されたものである。 22 and 23 are explanatory diagrams showing specific examples of the comparison list. 22 and 23, a comparison list 2200 is table data for comparing characteristics of the compounds M1 to M10 to be classified. The comparison list 2200 is created by the creation unit 710 by merging the mother nucleus comparison table 1100 shown in FIGS. 13 and 16 and the substituent comparison table 1700 shown in FIGS. 19 and 21. .

図２２において、比較リスト２２００には、各化合物Ｍ１〜Ｍ１０の第１階層の母核の化合物名、母核に結合する各置換基の結合位置、母核に結合する置換基数、母核の炭素数、母核の構造の種類が示されている。ここで、分類対象となる化合物Ｍ１〜Ｍ１０は、第１階層の母核の化合物名、結合位置、置換基数、母核炭素数および種類が、基本構造となる化合物Ｍ１と同一となる化合物Ｍ１〜Ｍ５と、それ以外の化合物Ｍ６〜Ｍ１０とに分類されている。 In FIG. 22, the comparison list 2200 includes the compound names of the first-tier mother nuclei of each of the compounds M1 to M10, the position of each substituent bonded to the mother nucleus, the number of substituents bonded to the mother nucleus, and the carbon of the mother nucleus. Number and type of structure of mother nucleus are shown. Here, the compounds M1 to M10 to be classified are compounds M1 to M1 whose compound name, bonding position, number of substituents, number of parent carbon atoms and type of the first layer mother nucleus are the same as those of the compound M1 as the basic structure. It is classified into M5 and other compounds M6 to M10.

図２３において、比較リスト２２００には、各化合物Ｍ１〜Ｍ１０の第２階層の母核の化合物名、母核に結合する各置換基の結合位置、母核に結合する置換基数、母核の炭素数、第１〜第４置換基の化合物名、母核に結合する第１〜第４置換基の結合位置、第１〜第４置換基の炭素数が示されている。また、比較リスト２２００には、各化合物Ｍ１〜Ｍ１０の第１階層の第２，第３置換基の化合物名、母核に結合する第２，第３置換基の結合位置、第２，第３置換基の炭素数が示されている。 In FIG. 23, the comparison list 2200 includes the compound names of the mother nuclei in the second hierarchy of the compounds M1 to M10, the bonding positions of the substituents bonded to the mother nuclei, the number of substituents bonded to the mother nuclei, the carbon of the mother nuclei. The number, the compound name of the first to fourth substituents, the bonding position of the first to fourth substituents bonded to the mother nucleus, and the number of carbon atoms of the first to fourth substituents are shown. Further, the comparison list 2200 includes compound names of the second and third substituents in the first hierarchy of each of the compounds M1 to M10, bonding positions of the second and third substituents bonded to the mother nucleus, and second, third. The number of carbons of the substituent is shown.

また、比較リスト２２００において、化合物Ｍ１〜Ｍ１０は、基本構造となる化合物Ｍ１との類似度が高い順にソートされている。具体的には、第１階層の母核の化合物名、結合位置、置換基数、母核炭素数および種類のうち、化合物Ｍ１と同一となる項目数が多い順に化合物Ｍ２〜Ｍ１０がソートされている。また、各項目の項目値のうち、基本構造となる化合物Ｍ１と同一となる項目値がハイライト表示されている。 In the comparison list 2200, the compounds M1 to M10 are sorted in descending order of similarity to the compound M1 that is the basic structure. Specifically, the compounds M2 to M10 are sorted in descending order of the number of items that are the same as those of the compound M1 among the compound names, bonding positions, the number of substituents, the number of parent carbons, and the type of the first layer mother nucleus. . In addition, among the item values of each item, the item value that is the same as the compound M1 as the basic structure is highlighted.

比較リスト２２００によれば、分類対象となる化合物Ｍ１〜Ｍ１０の特徴を比較することができる。また、第１階層の母核の化合物名が同一の化合物群が分類されて表示されるため、化合物の基礎となる母核が同一の化合物同士の類似性や差分を判別し易い。また、基本構造となる化合物Ｍ１と同一となる項目値がハイライト表示されているため、化合物Ｍ１と各化合物Ｍ２〜Ｍ１０との項目間の類似性や差分を判別し易い。 According to the comparison list 2200, the characteristics of the compounds M1 to M10 to be classified can be compared. In addition, since compound groups having the same compound name in the first layer mother nucleus are classified and displayed, it is easy to determine the similarity or difference between compounds having the same mother nucleus as the basis of the compound. In addition, since the item value that is the same as the compound M1 that is the basic structure is highlighted, it is easy to determine the similarity and difference between the items of the compound M1 and each of the compounds M2 to M10.

また、比較リスト２２００によれば、基本構造となる化合物Ｍ１の第１階層の母核は「プロパン」であるが、第１階層の母核を「エタン」や「ペンタン」としても、化合物Ｍ１の代わりに用いることができることがわかる。また、各化合物Ｍ１〜Ｍ１０は、疎水性のフェニル基と親水性のヒドロキシ基とを共通して有していることがわかる。また、図示は省略するが、設定する情報が不明なフィールドには「不明」という文字列が設定されるため、例えば、分類対象となる化合物群に含まれる未知の化合物の存在や、化合物名の誤記等に気付き易くなる。 Further, according to the comparison list 2200, the mother nucleus of the first layer of the basic compound M1 is “propane”. However, even if the first nucleus is “ethane” or “pentane”, It can be seen that it can be used instead. Moreover, it turns out that each compound M1-M10 has a hydrophobic phenyl group and a hydrophilic hydroxy group in common. Although not shown, since the character string “unknown” is set in a field where the information to be set is unknown, for example, the presence of an unknown compound included in the group of compounds to be classified or the compound name It becomes easy to notice mistakes.

（比較リストの加工例）
つぎに、図２２および図２３に示した比較リスト２２００の加工例について説明する。図２４は、比較リストの加工例を示す説明図である。図２４において、比較リスト２４００は、図２２および図２３に示した比較リスト２２００を加工したものである。 (Example of comparison list processing)
Next, processing examples of the comparison list 2200 shown in FIGS. 22 and 23 will be described. FIG. 24 is an explanatory diagram of a processing example of the comparison list. 24, the comparison list 2400 is obtained by processing the comparison list 2200 shown in FIGS.

比較リスト２４００において、母核とは、各化合物Ｍ１〜Ｍ１０の第１階層の母核の化合物名と、母核に結合する第１置換基の結合位置を示すものである。化合物Ｍ６，Ｍ７については、第１階層の母核に結合する第２，第３置換基の結合位置も示されている。また、種類とは、第１階層の母核の構造の種類を示すものである。 In the comparison list 2400, the mother nucleus indicates the compound name of the first layer mother nucleus of each of the compounds M1 to M10 and the bonding position of the first substituent that is bonded to the mother nucleus. For the compounds M6 and M7, the bonding positions of the second and third substituents bonded to the mother nucleus of the first layer are also shown. The type indicates the type of structure of the mother nucleus in the first hierarchy.

母核部分の基本構造との差分とは、基本構造となる化合物Ｍ１の第１階層の母核との差分を示すものである。具体的には、各化合物Ｍ１〜Ｍ１０の第１階層の母核の特徴が示されている。化合物Ｍ１との差分がない化合物Ｍ２〜Ｍ５については、化合物Ｍ１と同じ内容が示されている。 The difference from the basic structure of the mother nucleus portion indicates a difference from the first nucleus of the compound M1 serving as the basic structure. Specifically, the characteristics of the mother nucleus of the first hierarchy of each of the compounds M1 to M10 are shown. About the compounds M2-M5 without a difference with the compound M1, the same content as the compound M1 is shown.

第２階層の母核部分の基本構造との差分とは、基本構造となる化合物Ｍ１の第２階層の母核との差分を示すものである。具体的には、各化合物Ｍ１〜Ｍ１０の第２階層の母核の特徴が示されている。第２階層の置換基部分の基本構造との差分とは、基本構造となる化合物Ｍ１の第２階層の置換基との差分を示すものである。また、比較リスト２４００の各項目の項目値のうち、基本構造となる化合物Ｍ１と同一となる項目値がハイライト表示されている。 The difference from the basic structure of the mother nucleus part of the second hierarchy indicates a difference from the mother nucleus of the second hierarchy of the compound M1 serving as the basic structure. Specifically, the characteristics of the mother nucleus of the second hierarchy of each of the compounds M1 to M10 are shown. The difference from the basic structure of the substituent portion of the second hierarchy indicates the difference from the substituent of the second hierarchy of the compound M1 that is the basic structure. In addition, among the item values of each item in the comparison list 2400, an item value that is the same as the compound M1 as the basic structure is highlighted.

比較リスト２４００によれば、母核の化合物名と母核に結合する各置換基の結合位置などの関連する項目が一項目にまとめて表示されるため、図２２および図２３に示した比較リスト２２００に比べて、分類対象となる化合物Ｍ１〜Ｍ１０の特徴を比較し易くなる。 According to the comparison list 2400, since the related items such as the compound name of the mother nucleus and the bonding position of each substituent bonded to the mother nucleus are displayed together in one item, the comparison list shown in FIG. 22 and FIG. Compared to 2200, it becomes easier to compare the characteristics of the compounds M1 to M10 to be classified.

（化合物分類装置１００の化合物分類処理手順）
つぎに、化合物分類装置１００の化合物分類処理手順について説明する。図２５は、化合物分類装置１００の化合物分類処理手順の一例を示すフローチャートである。図２５のフローチャートにおいて、まず、化合物分類装置１００は、分類対象となる化合物群Ｍ１〜ＭＲの化合物名群Ｎ１〜ＮＲを受け付けたか否かを判断する（ステップＳ２５０１）。 (Compound classification processing procedure of the compound classification apparatus 100)
Next, the compound classification processing procedure of the compound classification apparatus 100 will be described. FIG. 25 is a flowchart showing an example of a compound classification processing procedure of the compound classification device 100. In the flowchart of FIG. 25, first, the compound classification apparatus 100 determines whether or not the compound name groups N1 to NR of the compound groups M1 to MR to be classified are received (step S2501).

ここで、化合物分類装置１００は、化合物群Ｍ１〜ＭＲの化合物名群Ｎ１〜ＮＲを受け付けるのを待つ（ステップＳ２５０１：Ｎｏ）。そして、化合物分類装置１００は、化合物群Ｍ１〜ＭＲの化合物名群Ｎ１〜ＮＲを受け付けた場合（ステップＳ２５０１：Ｙｅｓ）、化合物名群Ｎ１〜ＮＲを分割テーブル８００に登録する（ステップＳ２５０２）。なお、以下の説明では、化合物群Ｍ１〜ＭＲのうち、化合物Ｍ１を基本構造となる化合物とする。 Here, the compound classification device 100 waits to receive the compound name groups N1 to NR of the compound groups M1 to MR (step S2501: No). When the compound classification apparatus 100 receives the compound name groups N1 to NR of the compound groups M1 to MR (step S2501: Yes), the compound classification apparatus 100 registers the compound name groups N1 to NR in the division table 800 (step S2502). In the following description, among the compound groups M1 to MR, the compound M1 is a compound having a basic structure.

つぎに、化合物分類装置１００は、構造解析ルールＤＢ２２０を読み込む（ステップＳ２５０３）。そして、化合物分類装置１００は、化合物Ｍｒの「ｒ」を「ｒ＝１」として（ステップＳ２５０４）、分割テーブル８００の中から化合物Ｍｒの化合物名Ｎｒを選択する（ステップＳ２５０５）。 Next, the compound classification device 100 reads the structural analysis rule DB 220 (step S2503). Then, the compound classification device 100 sets “r” of the compound Mr to “r = 1” (step S2504), and selects the compound name Nr of the compound Mr from the division table 800 (step S2505).

つぎに、化合物分類装置１００は、選択した化合物名Ｎｒの化合物名分割処理を実行する（ステップＳ２５０６）。そして、化合物分類装置１００は、化合物Ｍｒの「ｒ」をインクリメントして（ステップＳ２５０７）、「ｒ」が「Ｒ」より大きくなったか否かを判断する（ステップＳ２５０８）。 Next, the compound classification device 100 executes a compound name division process for the selected compound name Nr (step S2506). The compound classification device 100 increments “r” of the compound Mr (step S2507), and determines whether “r” is greater than “R” (step S2508).

ここで、「ｒ」が「Ｒ」以下の場合（ステップＳ２５０８：Ｎｏ）、ステップＳ２５０５に戻る。一方、「ｒ」が「Ｒ」より大きくなった場合（ステップＳ２５０８：Ｙｅｓ）、化合物分類装置１００は、母核比較テーブル１１００を作成する母核比較テーブル作成処理を実行する（ステップＳ２５０９）。 If “r” is equal to or less than “R” (step S2508: NO), the process returns to step S2505. On the other hand, when “r” becomes larger than “R” (step S2508: Yes), the compound classification device 100 executes a mother nucleus comparison table creation process for creating a mother nucleus comparison table 1100 (step S2509).

つぎに、化合物分類装置１００は、置換基比較テーブル１７００を作成する置換基比較テーブル作成処理を実行する（ステップＳ２５１０）。そして、化合物分類装置１００は、母核比較テーブル１１００内の各項目の同一フラグに基づいて、化合物群Ｍ１〜ＭＲを分類する（ステップＳ２５１１）。 Next, the compound classification device 100 executes a substituent comparison table creation process for creating a substituent comparison table 1700 (step S2510). Then, the compound classification device 100 classifies the compound groups M1 to MR based on the same flag of each item in the mother nucleus comparison table 1100 (step S2511).

つぎに、化合物分類装置１００は、分類した分類結果に基づいて、作成した母核比較テーブル１１００と置換基比較テーブル１７００とをマージして比較リストを作成する（ステップＳ２５１２）。そして、化合物分類装置１００は、作成した比較リストを出力して（ステップＳ２５１３）、本フローチャートによる一連の処理を終了する。 Next, the compound classification device 100 merges the created mother nucleus comparison table 1100 and the substituent comparison table 1700 based on the classified classification result to create a comparison list (step S2512). Then, the compound classification device 100 outputs the created comparison list (step S2513), and ends a series of processes according to this flowchart.

これにより、分類対象となる化合物Ｍ１〜ＭＲの特徴を比較するための比較リストを出力することができる。なお、ステップＳ２５１３において、化合物分類装置１００は、母核比較テーブル１１００の記憶内容と置換基比較テーブル１７００の記憶内容とを出力することにしてもよい。 Thereby, the comparison list for comparing the characteristics of the compounds M1 to MR to be classified can be output. In step S2513, the compound classification device 100 may output the storage contents of the mother nucleus comparison table 1100 and the storage contents of the substituent comparison table 1700.

＜化合物名分割処理の具体的処理手順＞
つぎに、図２５のステップＳ２５０６に示した化合物名分割処理の具体的な処理手順について説明する。図２６は、化合物名分割処理の具体的処理手順の一例を示すフローチャートである。 <Specific treatment procedure for compound name separation treatment>
Next, a specific processing procedure of the compound name division processing shown in step S2506 in FIG. 25 will be described. FIG. 26 is a flowchart illustrating an example of a specific processing procedure of the compound name division processing.

図２６のフローチャートにおいて、まず、化合物分類装置１００は、化合物名Ｎｒの第ｉ階層の母核分割処理を実行する（ステップＳ２６０１）。なお、第ｉ階層の「ｉ」は初期状態では「ｉ＝１」である。 In the flowchart of FIG. 26, first, the compound classification device 100 executes the mother nucleus partitioning process of the i-th hierarchy of the compound name Nr (step S2601). Note that “i” in the i-th layer is “i = 1” in the initial state.

つぎに、化合物分類装置１００は、第ｊ置換基の「ｊ」を「ｊ＝１」として（ステップＳ２６０２）、化合物Ｍｒの第ｉ階層の第ｊ置換基を選択する（ステップＳ２６０３）。そして、化合物分類装置１００は、選択した第ｊ置換基が複合置換基か否かを判断する（ステップＳ２６０４）。 Next, the compound classification device 100 sets “j” of the j-th substituent to “j = 1” (step S2602), and selects the j-th substituent of the i-th layer of the compound Mr (step S2603). Then, the compound classification device 100 determines whether or not the selected j-th substituent is a composite substituent (step S2604).

ここで、第ｊ置換基が複合置換基ではない場合（ステップＳ２６０４：Ｎｏ）、化合物分類装置１００は、第ｊ置換基の置換基分割処理を実行する（ステップＳ２６０５）。つぎに、化合物分類装置１００は、第ｊ置換基の「ｊ」をインクリメントして（ステップＳ２６０６）、「ｊ」が「ｍ」より大きくなったか否かを判断する（ステップＳ２６０７）。 Here, when the j-th substituent is not a composite substituent (step S2604: No), the compound classification device 100 executes a substituent division process for the j-th substituent (step S2605). Next, the compound classification device 100 increments “j” of the j-th substituent (step S2606), and determines whether “j” is greater than “m” (step S2607).

ここで、「ｊ」が「ｍ」以下の場合（ステップＳ２６０７：Ｎｏ）、ステップＳ２６０３に戻る。一方、「ｊ」が「ｍ」より大きくなった場合（ステップＳ２６０７：Ｙｅｓ）、化合物名分割処理を終了し、化合物名分割処理を呼び出したステップへ戻る。第１階層の化合物名分割処理が終わった場合には、図２５に示したステップＳ２５０７に移行する。 If “j” is equal to or less than “m” (step S2607: NO), the process returns to step S2603. On the other hand, when “j” becomes larger than “m” (step S2607: Yes), the compound name division process is terminated, and the process returns to the step that called the compound name division process. When the first layer compound name division processing is completed, the process proceeds to step S2507 shown in FIG.

また、ステップＳ２６０４において、第ｊ置換基が複合置換基の場合（ステップＳ２６０４：Ｙｅｓ）、化合物分類装置１００は、第（ｉ＋１）階層の化合物名Ｎｒとして、第ｊ置換基の化合物名を設定する（ステップＳ２６０８）。そして、化合物分類装置１００は、第（ｉ＋１）階層の化合物Ｍｒの化合物名分割処理を実行して（ステップＳ２６０９）、ステップＳ２６０６に移行する。 In step S2604, when the jth substituent is a composite substituent (step S2604: Yes), the compound classification device 100 sets the compound name of the jth substituent as the compound name Nr in the (i + 1) th layer. (Step S2608). Then, the compound classification device 100 executes the compound name division process for the compound Mr in the (i + 1) -th layer (step S2609), and proceeds to step S2606.

これにより、第ｊ置換基が複合置換基の場合、第ｊ置換基の化合物名を第（ｉ＋１）階層の化合物Ｍｒの化合物名として化合物名分割処理を再帰的に実行することができる。 Thereby, when the j-th substituent is a composite substituent, the compound name splitting process can be recursively executed with the compound name of the j-th substituent as the compound name of the compound Mr in the (i + 1) -th layer.

つぎに、図２６のステップＳ２６０１に示した母核分割処理の具体的な処理手順について説明する。図２７および図２８は、母核分割処理の具体的処理手順の一例を示すフローチャートである。 Next, a specific processing procedure of the nucleus dividing process shown in step S2601 of FIG. 26 will be described. FIG. 27 and FIG. 28 are flowcharts showing an example of a specific processing procedure of the nucleus dividing process.

図２７のフローチャートにおいて、まず、化合物分類装置１００は、母核Ｂｋの「ｋ」を「ｋ＝１」とする（ステップＳ２７０１）。つぎに、化合物分類装置１００は、構造式ＤＢ２３０の中から母核Ｂｋの化合物名を選択する（ステップＳ２７０２）。そして、化合物分類装置１００は、母核Ｂｋの化合物名の文字数ｔを特定する（ステップＳ２７０３）。文字数ｔの長い方を優先的に選択する。 In the flowchart of FIG. 27, first, the compound classification device 100 sets “k” of the mother nucleus Bk to “k = 1” (step S2701). Next, the compound classification device 100 selects the compound name of the mother nucleus Bk from the structural formula DB 230 (step S2702). Then, the compound classification device 100 specifies the number of characters t of the compound name of the mother nucleus Bk (step S2703). The longer character number t is preferentially selected.

つぎに、化合物分類装置１００は、化合物Ｍｒの化合物名Ｎｒの末尾からｔ文字の文字列と、母核Ｂｋの化合物名とが一致するか否かを判断する（ステップＳ２７０４）。ここで、母核Ｂｋの化合物名と一致する場合（ステップＳ２７０４：Ｙｅｓ）、化合物分類装置１００は、分割テーブル８００内の化合物名Ｎｒの末尾からｔ文字の文字列の直前に第ｉ階層の区切り記号を挿入する（ステップＳ２７０５）。 Next, the compound classification device 100 determines whether or not the character string of t characters from the end of the compound name Nr of the compound Mr matches the compound name of the mother nucleus Bk (step S2704). Here, when the compound name matches the compound name of the mother nucleus Bk (step S2704: Yes), the compound classification device 100 separates the i-th layer from the end of the compound name Nr in the division table 800 immediately before the character string of t characters. A symbol is inserted (step S2705).

つぎに、化合物分類装置１００は、化合物Ｍｒの化合物名Ｎｒのうち母核Ｂｋの化合物名を除く残余の文字列を「数字−文字列」の組に分割する（ステップＳ２７０６）。そして、化合物分類装置１００は、各組の文字列を先頭から順番に第１〜第ｍ置換基を表す文字列とする（ステップＳ２７０７）。つぎに、化合物分類装置１００は、各組の数字を先頭から順番に第１〜第ｍ置換基の結合位置を表す文字列として（ステップＳ２７０８）、図２６に示したステップＳ２６０２に移行する。 Next, the compound classification device 100 divides the remaining character string excluding the compound name of the mother nucleus Bk in the compound name Nr of the compound Mr into “number-character string” pairs (step S2706). Then, the compound classification device 100 sets each set of character strings as the character strings representing the first to m-th substituents in order from the top (step S2707). Next, the compound classification device 100 converts each set of numbers as a character string indicating the bonding position of the first to m-th substituents in order from the top (step S2708), and proceeds to step S2602 shown in FIG.

また、ステップＳ２７０４において、母核Ｂｋの化合物名と不一致の場合（ステップＳ２７０４：Ｎｏ）、化合物分類装置１００は、母核Ｂｋの「ｋ」をインクリメントして（ステップＳ２７０９）、「ｋ」が「Ｋ」より大きくなったか否かを判断する（ステップＳ２７１０）。 In step S2704, when the compound name does not match the compound name of the mother nucleus Bk (step S2704: No), the compound classification apparatus 100 increments “k” of the mother nucleus Bk (step S2709). It is determined whether or not it has become larger than “K” (step S2710).

ここで、「ｋ」が「Ｋ」以下の場合（ステップＳ２７１０：Ｎｏ）、ステップＳ２７０２に戻る。一方、「ｋ」が「Ｋ」より大きくなった場合（ステップＳ２７１０：Ｙｅｓ）、図２８に示すステップＳ２８０１に移行する。 If “k” is equal to or less than “K” (step S2710: NO), the process returns to step S2702. On the other hand, when “k” becomes larger than “K” (step S2710: YES), the process proceeds to step S2801 shown in FIG.

図２８のフローチャートにおいて、まず、化合物分類装置１００は、化合物Ｍｒの化合物名Ｎｒを「数字−文字列」の組に分割する（ステップＳ２８０１）。そして、化合物分類装置１００は、各組の文字列を先頭から順番に第１〜第ｍ置換基を表す文字列とする（ステップＳ２８０２）。 In the flowchart of FIG. 28, first, the compound classification device 100 divides the compound name Nr of the compound Mr into “number-character string” pairs (step S2801). Then, the compound classification device 100 sets each set of character strings as the character strings representing the first to mth substituents in order from the top (step S2802).

つぎに、化合物分類装置１００は、各組の数字を先頭から順番に第１〜第ｍ置換基の結合位置を表す文字列とする（ステップＳ２８０３）。そして、化合物分類装置１００は、置換基Ｃｐの「ｐ」を「ｐ＝１」として（ステップＳ２８０４）、構造式ＤＢ２３０の中から置換基Ｃｐの化合物名を選択する（ステップＳ２８０５）。 Next, the compound classification device 100 sets each set of numbers as a character string representing the bonding position of the first to m-th substituents in order from the top (step S2803). Then, the compound classification device 100 sets “p” of the substituent Cp to “p = 1” (step S2804), and selects the compound name of the substituent Cp from the structural formula DB230 (step S2805).

つぎに、化合物分類装置１００は、置換基Ｃｐの化合物名の文字数ｓを特定する（ステップＳ２８０６）。そして、化合物分類装置１００は、第ｍ置換基を表す文字列の先頭からｓ文字の文字列と、置換基Ｃｐの化合物名とが一致するか否かを判断する（ステップＳ２８０７）。 Next, the compound classification device 100 specifies the number of characters s of the compound name of the substituent Cp (step S2806). Then, the compound classification device 100 determines whether or not the character string of the s character from the beginning of the character string representing the m-th substituent matches the compound name of the substituent Cp (step S2807).

ここで、置換基Ｃｐの化合物名と一致する場合（ステップＳ２８０７：Ｙｅｓ）、化合物分類装置１００は、第ｉ階層の母核を表す文字列を、第ｍ置換基を表す文字列のうち先頭からｓ文字を除く残余の文字列とする（ステップＳ２８０８）。つぎに、化合物分類装置１００は、第ｍ置換基を表す文字列を、第ｍ置換基を表す文字列の先頭からｓ文字の文字列とする（ステップＳ２８０９）。 Here, when the compound name matches the compound name of the substituent Cp (step S2807: YES), the compound classification device 100 converts the character string representing the parent nucleus of the i-th hierarchy from the beginning of the character strings representing the m-th substituent. The remaining character string excluding the s character is set (step S2808). Next, the compound classification device 100 sets the character string representing the m-th substituent as a character string of s characters from the beginning of the character string representing the m-th substituent (step S2809).

そして、化合物分類装置１００は、分割テーブル８００内の化合物名Ｎｒの第ｍ置換基を表す文字列の先頭からｓ文字の文字列の直後に第ｉ階層の区切り記号を挿入して（ステップＳ２８１０）、図２６に示したステップＳ２６０２に移行する。もし、第ｍ置換基と母核の間に文字が残っている場合には、母核に含めるようにしてもよい。 Then, the compound classification device 100 inserts a delimiter in the i-th layer immediately after the character string of s characters from the beginning of the character string representing the m-th substituent of the compound name Nr in the division table 800 (step S2810). The process proceeds to step S2602 shown in FIG. If a character remains between the m-th substituent and the mother nucleus, it may be included in the mother nucleus.

また、ステップＳ２８０７において、置換基Ｃｐの化合物名と不一致の場合（ステップＳ２８０７：Ｎｏ）、化合物分類装置１００は、置換基Ｃｐの「ｐ」をインクリメントして（ステップＳ２８１１）、「ｐ」が「Ｐ」より大きくなったか否かを判断する（ステップＳ２８１２）。 In step S2807, if the compound name does not match the compound name of the substituent Cp (step S2807: No), the compound classification device 100 increments “p” of the substituent Cp (step S2811), and “p” is “ It is determined whether or not it has become larger than “P” (step S2812).

ここで、「ｐ」が「Ｐ」以下の場合（ステップＳ２８１２：Ｎｏ）、ステップＳ２８０５に戻る。一方、「ｐ」が「Ｐ」より大きくなった場合（ステップＳ２８１２：Ｙｅｓ）、化合物分類装置１００は、化合物Ｍｒの第ｉ階層の母核の化合物名が不明であることを示す母核不明フラグを設定して（ステップＳ２８１３）、図２６に示したステップＳ２６０２に移行する。 If “p” is equal to or less than “P” (step S2812: NO), the process returns to step S2805. On the other hand, when “p” becomes larger than “P” (step S2812: Yes), the compound classification device 100 indicates that the compound name of the mother nucleus of the i-th layer of the compound Mr is unknown. Is set (step S2813), and the process proceeds to step S2602 shown in FIG.

これにより、化合物Ｍｒの第ｉ階層の母核の化合物名を特定して、分割テーブル８００内の化合物名Ｎｒの第ｉ階層の母核の化合物名の直前に第ｉ階層の区切り記号を挿入することができる。なお、第（ｉ＋１）階層において、化合物Ｍｒの化合物名Ｎｒの末尾からｔ文字との一致判定を行う対象となる化合物は、例えば、構造式ＤＢ２３０内の置換基Ｃｐの化合物名となる。 Thus, the compound name of the parent nucleus of the i-th hierarchy of the compound Mr is specified, and the delimiter of the i-th hierarchy is inserted immediately before the compound name of the parent nucleus of the i-th hierarchy of the compound name Nr in the partition table 800. be able to. Note that, in the (i + 1) th layer, the compound that is subjected to determination of coincidence with the t character from the end of the compound name Nr of the compound Mr is, for example, the compound name of the substituent Cp in the structural formula DB230.

つぎに、図２６のステップＳ２６０５に示した置換基分割処理の具体的な処理手順について説明する。図２９は、置換基分割処理の具体的処理手順の一例を示すフローチャートである。 Next, a specific processing procedure of the substituent dividing process shown in step S2605 in FIG. 26 will be described. FIG. 29 is a flowchart illustrating an example of a specific processing procedure of substituent group splitting processing.

図２９のフローチャートにおいて、まず、化合物分類装置１００は、第ｊ置換基を表す文字列に倍数接頭辞があるか否かを判断する（ステップＳ２９０１）。ここで、倍数接頭辞がない場合（ステップＳ２９０１：Ｎｏ）、ステップＳ２９０５に移行する。 In the flowchart of FIG. 29, first, the compound classification device 100 determines whether or not there is a multiple prefix in the character string representing the jth substituent (step S2901). If there is no multiple prefix (step S2901: NO), the process proceeds to step S2905.

一方、倍数接頭辞がある場合（ステップＳ２９０１：Ｙｅｓ）、化合物分類装置１００は、分割テーブル８００内の第ｊ置換基の結合位置を表す文字列「数字，数字，…，数字−」の「，」を「−」に変換し（ステップＳ２９０２）、２番目以降の数字の直前に「−」を挿入する（ステップＳ２９０３）。 On the other hand, when there is a multiple prefix (step S2901: Yes), the compound classification device 100 uses “,” in the character string “number, number,..., Number −” indicating the bonding position of the jth substituent in the partition table 800. "Is converted to"-"(step S2902), and"-"is inserted immediately before the second and subsequent numbers (step S2903).

そして、化合物分類装置１００は、分割テーブル８００内の第ｊ置換基を表す文字列から倍数接頭辞を削除して（ステップＳ２９０４）、倍数接頭辞が削除された削除後の文字列を「−−」の間に挿入する（ステップＳ２９０５）。 Then, the compound classification device 100 deletes the multiple prefix from the character string representing the jth substituent in the division table 800 (step S2904), and displays the deleted character string from which the multiple prefix is deleted as “-”. ”(Step S2905).

つぎに、化合物分類装置１００は、置換基Ｃｐの「ｐ」を「ｐ＝１」として（ステップＳ２９０６）、構造式ＤＢ２３０の中から置換基Ｃｐの化合物名を選択する（ステップＳ２９０７）。 Next, the compound classification apparatus 100 sets “p” of the substituent Cp to “p = 1” (step S2906), and selects the compound name of the substituent Cp from the structural formula DB 230 (step S2907).

そして、化合物分類装置１００は、置換基を表す文字列と置換基Ｃｐの化合物名とが一致するか否かを判断する（ステップＳ２９０８）。なお、ここでの置換基を表す文字列は、第ｊ置換基を表す文字列、または、ステップＳ２９０４において第ｊ置換基を表す文字列から倍数接頭辞が削除された削除後の文字列である。 Then, the compound classification device 100 determines whether or not the character string representing the substituent matches the compound name of the substituent Cp (step S2908). Here, the character string representing the substituent is a character string representing the j-th substituent or a character string after deletion in which the multiple prefix is deleted from the character string representing the j-th substituent in step S2904. .

ここで、置換基Ｃｐの化合物名と一致する場合（ステップＳ２９０８：Ｙｅｓ）、化合物分類装置１００は、置換基を表す文字列の直後に区切り記号を挿入して（ステップＳ２９０９）、図２６に示したステップＳ２６０６に移行する。 Here, when the compound name matches the compound name of the substituent Cp (step S2908: Yes), the compound classification device 100 inserts a delimiter immediately after the character string representing the substituent (step S2909), and is shown in FIG. The process proceeds to step S2606.

また、ステップＳ２９０８において、置換基Ｃｐの化合物名と不一致の場合（ステップＳ２９０８：Ｎｏ）、化合物分類装置１００は、置換基Ｃｐの「ｐ」をインクリメントして（ステップＳ２９１０）、「ｐ」が「Ｐ」より大きくなったか否かを判断する（ステップＳ２９１１）。 In step S2908, when the compound name does not match the compound name of the substituent Cp (step S2908: No), the compound classification device 100 increments “p” of the substituent Cp (step S2910), and “p” is “ It is determined whether or not it has become larger than “P” (step S2911).

ここで、「ｐ」が「Ｐ」以下の場合（ステップＳ２９１１：Ｎｏ）、ステップＳ２９０７に戻る。一方、「ｐ」が「Ｐ」より大きくなった場合（ステップＳ２９１１：Ｙｅｓ）、化合物分類装置１００は、化合物Ｍｒの第ｉ階層の第ｊ置換基の化合物名が不明であることを示す置換基不明フラグを設定して（ステップＳ２９１２）、ステップＳ２９０９に移行する。 If “p” is equal to or less than “P” (step S2911: NO), the process returns to step S2907. On the other hand, when “p” is larger than “P” (step S2911: Yes), the compound classification device 100 indicates that the compound name of the j-th substituent in the i-th layer of the compound Mr is unknown. An unknown flag is set (step S2912), and the process proceeds to step S2909.

これにより、化合物Ｍｒの第ｉ階層の第ｊ置換基の化合物名を特定して、分割テーブル８００内の化合物名Ｎｒの第ｉ階層の第ｊ置換基の化合物名の直後に区切り記号を挿入することができる。また、第ｊ置換基を表す文字列に倍数接頭辞が含まれる場合、第ｊ置換基を表す文字列および第ｊ置換基の結合位置を表す文字列を展開することができる。 Thus, the compound name of the j-th substituent of the i-th layer of the compound Mr is specified, and a delimiter is inserted immediately after the compound name of the j-th substituent of the i-th layer of the compound name Nr in the partition table 800. be able to. Further, when a multiple prefix is included in the character string representing the jth substituent, the character string representing the jth substituent and the character string representing the bonding position of the jth substituent can be expanded.

＜母核比較テーブル作成処理の具体的処理手順＞
つぎに、図２５のステップＳ２５０９に示した母核比較テーブル作成処理の具体的な処理手順について説明する。図３０は、母核比較テーブル作成処理の具体的処理手順の一例を示すフローチャートである。 <Specific processing procedure for creating a nucleus comparison table>
Next, a specific processing procedure of the mother nucleus comparison table creation process shown in step S2509 of FIG. 25 will be described. FIG. 30 is a flowchart illustrating an example of a specific processing procedure of the mother nucleus comparison table creation process.

図３０のフローチャートにおいて、まず、化合物分類装置１００は、第ｉ階層の「ｉ」を「ｉ＝１」として（ステップＳ３００１）、分割テーブル８００を参照して、化合物Ｍ１〜ＭＲの第ｉ階層の母核の化合物名を母核比較テーブル１１００に登録する（ステップＳ３００２）。 In the flowchart of FIG. 30, first, the compound classification apparatus 100 sets “i” in the i-th layer to “i = 1” (step S3001), refers to the division table 800, and in the i-th layer of the compounds M1 to MR. The compound name of the mother nucleus is registered in the mother nucleus comparison table 1100 (step S3002).

つぎに、化合物分類装置１００は、分割テーブル８００を参照して、化合物Ｍ１〜ＭＲの第ｉ階層の母核に結合する第１〜第ｍ置換基の結合位置を母核比較テーブル１１００に登録する（ステップＳ３００３）。そして、化合物分類装置１００は、化合物Ｍ１〜ＭＲの第ｉ階層の母核に結合する置換基数を母核比較テーブル１１００に登録する（ステップＳ３００４）。 Next, the compound classification apparatus 100 refers to the division table 800 and registers the bonding positions of the first to m-th substituents bonded to the i-th layer mother nucleus of the compounds M1 to MR in the mother nucleus comparison table 1100. (Step S3003). Then, the compound classification device 100 registers the number of substituents bound to the i-th layer mother nucleus of the compounds M1 to MR in the mother nucleus comparison table 1100 (step S3004).

つぎに、化合物分類装置１００は、化合物Ｍ１〜ＭＲの第ｉ階層の母核の構造式に含まれる炭素数および母核の構造の種類を特定して、特定した母核の構造式に含まれる炭素数および母核の構造の種類を母核比較テーブル１１００に登録する（ステップＳ３００５）。 Next, the compound classification apparatus 100 specifies the number of carbons and the type of the structure of the mother nucleus included in the structural formula of the i-th layer mother nucleus of the compounds M1 to MR, and is included in the identified structural formula of the mother nucleus. The number of carbon atoms and the structure type of the mother nucleus are registered in the mother nucleus comparison table 1100 (step S3005).

そして、化合物分類装置１００は、基本構造となる化合物Ｍ１と各化合物Ｍ２〜ＭＲとの間で、母核比較テーブル１１００に登録された各項目の項目値を比較することにより、項目値が一致する項目の同一フラグに「１」を設定する（ステップＳ３００６）。 And the compound classification | category apparatus 100 matches the item value by comparing the item value of each item registered into the mother nucleus comparison table 1100 between the compound M1 used as a basic structure, and each compound M2-MR. “1” is set to the same flag of the item (step S3006).

つぎに、化合物分類装置１００は、第ｉ階層の「ｉ」をインクリメントして（ステップＳ３００７）、「ｉ」が「ｎ」より大きくなったか否かを判断する（ステップＳ３００８）。ここで、「ｉ」が「ｎ」以下の場合（ステップＳ３００８：Ｎｏ）、ステップＳ３００２に戻る。一方、「ｉ」が「ｎ」より大きくなった場合（ステップＳ３００８：Ｙｅｓ）、図２５に示したステップＳ２５１０に移行する。 Next, the compound classification device 100 increments “i” in the i-th layer (step S3007), and determines whether “i” is greater than “n” (step S3008). If “i” is equal to or less than “n” (step S3008: No), the process returns to step S3002. On the other hand, when “i” becomes larger than “n” (step S3008: Yes), the process proceeds to step S2510 shown in FIG.

これにより、化合物Ｍ１〜ＭＲの第ｉ階層の母核の化合物名、第１〜第ｍ置換基の結合位置、置換基数、炭素数、構造の種類を母核比較テーブル１１００に登録することができる。なお、化合物分類装置１００は、例えば、ステップＳ３００８の後、母核比較テーブル１１００内の各項目の同一フラグに基づいて、化合物群Ｍ１〜ＭＲを分類することにより、母核比較テーブル１１００内の各化合物Ｍ１〜ＭＲのレコードを並び替えることにしてもよい。 Thereby, the compound name of the mother nucleus of the i-th hierarchy of the compounds M1 to MR, the bonding position of the first to mth substituents, the number of substituents, the number of carbons, and the type of structure can be registered in the mother nucleus comparison table 1100. . Note that the compound classification apparatus 100 classifies the compound groups M1 to MR, for example, based on the same flag of each item in the mother nucleus comparison table 1100 after step S3008, so that each of the compounds in the mother nucleus comparison table 1100 is classified. The records of the compounds M1 to MR may be rearranged.

＜置換基比較テーブル作成処理の具体的処理手順＞
つぎに、図２５のステップＳ２５１０に示した置換基比較テーブル作成処理の具体的な処理手順について説明する。図３１は、置換基比較テーブル作成処理の具体的処理手順の一例を示すフローチャートである。 <Specific Processing Procedure of Substituent Comparison Table Creation Processing>
Next, a specific processing procedure of the substituent comparison table creation processing shown in step S2510 of FIG. 25 will be described. FIG. 31 is a flowchart illustrating an example of a specific processing procedure of the substituent comparison table creation processing.

図３１のフローチャートにおいて、まず、化合物分類装置１００は、第ｉ階層の「ｉ」を「ｉ＝１」として（ステップＳ３１０１）、分割テーブル８００を参照して、化合物Ｍ１〜ＭＲの第ｉ階層の第１〜第ｍ置換基の化合物名を置換基比較テーブル１７００に登録する（ステップＳ３１０２）。 In the flowchart of FIG. 31, first, the compound classification apparatus 100 sets “i” in the i-th layer to “i = 1” (step S3101), refers to the division table 800, and stores the i-th layer in the compounds M1 to MR. The compound names of the first to mth substituents are registered in the substituent comparison table 1700 (step S3102).

つぎに、化合物分類装置１００は、分割テーブル８００を参照して、化合物Ｍ１〜ＭＲの第ｉ階層の母核に結合する第１〜第ｍ置換基の結合位置を置換基比較テーブル１７００に登録する（ステップＳ３１０３）。そして、化合物分類装置１００は、化合物Ｍ１〜ＭＲの第ｉ階層の第１〜第ｍ置換基の構造式に含まれる炭素数を特定して、特定した第１〜第ｍ置換基の構造式に含まれる炭素数を置換基比較テーブル１７００に登録する（ステップＳ３１０４）。 Next, the compound classification device 100 refers to the division table 800 and registers the bonding positions of the first to m-th substituents bonded to the mother nucleus of the i-th hierarchy of the compounds M1 to MR in the substituent comparison table 1700. (Step S3103). Then, the compound classification device 100 identifies the number of carbon atoms contained in the structural formulas of the first to m-th substituents in the i-th layer of the compounds M1 to MR, and uses the identified structural formulas of the first to m-th substituents. The number of carbons contained is registered in the substituent comparison table 1700 (step S3104).

つぎに、化合物分類装置１００は、基本構造となる化合物Ｍ１と各化合物Ｍ２〜ＭＲとの間で、置換基比較テーブル１７００に登録された各項目の項目値を比較することにより、項目値が一致する項目の同一フラグに「１」を設定する（ステップＳ３１０５）。 Next, the compound classification device 100 compares the item values of each item registered in the substituent comparison table 1700 between the compound M1 as the basic structure and each of the compounds M2 to MR, thereby matching the item values. “1” is set to the same flag of the item to be executed (step S3105).

そして、化合物分類装置１００は、第ｉ階層の「ｉ」をインクリメントして（ステップＳ３１０６）、「ｉ」が「ｎ」より大きくなったか否かを判断する（ステップＳ３１０７）。ここで、「ｉ」が「ｎ」以下の場合（ステップＳ３１０７：Ｎｏ）、ステップＳ３１０２に戻る。一方、「ｉ」が「ｎ」より大きくなった場合（ステップＳ３１０７：Ｙｅｓ）、図２５に示したステップＳ２５１１に移行する。 Then, the compound classification device 100 increments “i” in the i-th layer (step S3106), and determines whether “i” is greater than “n” (step S3107). If “i” is equal to or less than “n” (step S3107: NO), the process returns to step S3102. On the other hand, when “i” becomes larger than “n” (step S3107: Yes), the process proceeds to step S2511 shown in FIG.

これにより、化合物Ｍ１〜ＭＲの第ｉ階層の第１〜第ｍ置換基の化合物名、結合位置、炭素数を置換基比較テーブル１７００に登録することができる。なお、化合物分類装置１００は、例えば、ステップＳ３１０７の後、置換基比較テーブル１７００内の各項目の同一フラグに基づいて、化合物群Ｍ１〜ＭＲを分類することにより、置換基比較テーブル１７００内の各化合物Ｍ１〜ＭＲのレコードを並び替えることにしてもよい。 As a result, the compound names, bonding positions, and carbon numbers of the 1st to mth substituents in the i-th layer of the compounds M1 to MR can be registered in the substituent comparison table 1700. In addition, the compound classification apparatus 100 classifies the compound groups M1 to MR, for example, based on the same flag of each item in the substituent comparison table 1700 after step S3107, so that each component in the substituent comparison table 1700 is classified. The records of the compounds M1 to MR may be rearranged.

以上説明したように、実施の形態にかかる化合物分類装置１００によれば、構造式ＤＢ２３０を参照して、化合物群Ｍ１〜ＭＲの各化合物Ｍｒの化合物名Ｎｒの中から、各化合物Ｍｒの母核を表す文字列を検出することができる。そして、化合物分類装置１００によれば、各化合物Ｍｒの母核に基づいて、化合物群Ｍ１〜ＭＲを分類することができる。 As described above, according to the compound classification apparatus 100 according to the embodiment, with reference to the structural formula DB 230, the mother nucleus of each compound Mr is selected from the compound names Nr of each compound Mr of the compound groups M1 to MR. Can be detected. And according to the compound classification device 100, the compound groups M1 to MR can be classified based on the mother nucleus of each compound Mr.

これにより、化合物群Ｍ１〜ＭＲの中から化合物の基礎となる部分構造を表す母核が同一の化合物の集合を判別することができる。この結果、例えば、化合物群Ｍ１〜ＭＲのうちの母核が同一の化合物同士の類似性や差分を判断し易くすることができる。 Thereby, it is possible to discriminate a set of compounds having the same mother nucleus representing the partial structure serving as the basis of the compound from the compound groups M1 to MR. As a result, for example, it is possible to easily determine the similarity or difference between compounds having the same mother nucleus in the compound groups M1 to MR.

また、化合物分類装置１００によれば、各化合物Ｍｒの化合物名Ｎｒのうち母核を表す文字列を除く残余の文字列の中から、各化合物Ｍｒの置換基を表す文字列を抽出することができる。そして、化合物分類装置１００によれば、さらに、各化合物Ｍｒの置換基に基づいて、化合物群Ｍ１〜ＭＲを分類することができる。 In addition, according to the compound classification apparatus 100, a character string representing a substituent of each compound Mr can be extracted from the remaining character strings excluding the character string representing the mother nucleus in the compound name Nr of each compound Mr. it can. And according to the compound classification apparatus 100, the compound groups M1-MR can be further classified based on the substituent of each compound Mr.

これにより、化合物群Ｍ１〜ＭＲの中から化合物の系統や命名に使う部分構造を表す置換基が同一の化合物の集合を判別することができる。この結果、例えば、化合物群Ｍ１〜ＭＲのうち母核が同一の化合物の集合のうちの、置換基が同一の化合物同士の類似性や差分を判断し易くすることができる。 Thereby, it is possible to discriminate a set of compounds having the same substituent from the compound groups M1 to MR that represent the partial structure used for naming and naming the compounds. As a result, for example, it is possible to easily determine the similarity or difference between compounds having the same substituent in the set of compounds having the same mother nucleus in the compound groups M1 to MR.

また、化合物分類装置１００によれば、さらに、各化合物Ｍｒの置換基数に基づいて、化合物群Ｍ１〜ＭＲを分類することができる。これにより、各化合物Ｍｒの母核に結合する置換基の数を特定でき、化合物群Ｍ１〜ＭＲの中から化合物全体の構造の成り立ちが類似する化合物の集合を判別することができる。この結果、例えば、化合物群Ｍ１〜ＭＲのうち母核が同一の化合物の集合のうちの、化合物全体の構造の成り立ちが類似する化合物同士の類似性や差分を判断し易くすることができる。 Further, according to the compound classification apparatus 100, the compound groups M1 to MR can be classified based on the number of substituents of each compound Mr. Thereby, the number of substituents bonded to the mother nucleus of each compound Mr can be specified, and a set of compounds having similar structures of the entire compound can be identified from the compound groups M1 to MR. As a result, for example, it is possible to easily determine the similarity or difference between compounds having similar structures in the entire compound in a group of compounds having the same mother nucleus in the compound groups M1 to MR.

また、化合物分類装置１００によれば、各化合物Ｍｒの化合物名Ｎｒのうち母核を表す文字列を除く残余の文字列の中から、各化合物Ｍｒの母核に結合する置換基の結合位置を表す文字列を抽出することができる。そして、化合物分類装置１００によれば、さらに、各化合物Ｍｒの母核に結合する置換基の結合位置に基づいて、化合物群Ｍ１〜ＭＲを分類することができる。 Further, according to the compound classification apparatus 100, the bonding position of the substituent bonded to the mother nucleus of each compound Mr is determined from the remaining character strings excluding the character string representing the mother nucleus in the compound name Nr of each compound Mr. The character string to represent can be extracted. And according to the compound classification device 100, the compound groups M1 to MR can be further classified based on the bonding position of the substituent bonded to the mother nucleus of each compound Mr.

これにより、各化合物Ｍｒの母核の構造式に含まれるどの炭素に置換基が結合しているかを特定でき、化合物群Ｍ１〜ＭＲの中から化合物全体の構造の成り立ちが類似する化合物の集合を判別することができる。この結果、例えば、化合物群Ｍ１〜ＭＲのうち母核が同一の化合物の集合のうちの、化合物全体の構造の成り立ちが類似する化合物同士の類似性や差分を判断し易くすることができる。 As a result, it is possible to specify which carbon contained in the structural formula of the mother nucleus of each compound Mr is bonded to the substituent, and to collect a set of compounds having a similar structure of the entire compound from the compound groups M1 to MR. Can be determined. As a result, for example, it is possible to easily determine the similarity or difference between compounds having similar structures in the entire compound in a group of compounds having the same mother nucleus in the compound groups M1 to MR.

また、化合物分類装置１００によれば、構造式ＤＢ２３０を参照して、各化合物Ｍｒの母核を表す文字列に対応する母核の構造の種類を特定し、さらに、各化合物Ｍｒの母核の構造の種類に基づいて、化合物群Ｍ１〜ＭＲを分類することができる。これにより、各化合物Ｍｒの母核の構造の種類を特定でき、化合物群Ｍ１〜ＭＲの中から母核の構造が類似する化合物の集合を判別することができる。 Further, according to the compound classification apparatus 100, the structural type DB 230 is referred to, the type of the structure of the mother nucleus corresponding to the character string representing the mother nucleus of each compound Mr is specified, and the mother nucleus of each compound Mr is further determined. The compound groups M1 to MR can be classified based on the type of structure. Thereby, the type of the structure of the mother nucleus of each compound Mr can be specified, and a set of compounds having a similar structure of the mother nucleus can be determined from the compound groups M1 to MR.

また、化合物分類装置１００によれば、構造式ＤＢ２３０を参照して、各化合物Ｍｒの母核を表す文字列に対応する構造式に含まれる炭素数を特定し、さらに、各化合物Ｍｒの母核の炭素数に基づいて、化合物群Ｍ１〜ＭＲを分類することができる。これにより、各化合物Ｍｒの母核の炭素数から親水性や疎水性などの化学的特性を判断することができる。 Further, according to the compound classification device 100, the structural formula DB 230 is referred to, the number of carbons contained in the structural formula corresponding to the character string representing the mother nucleus of each compound Mr is specified, and the mother nucleus of each compound Mr is further determined. The compound groups M1 to MR can be classified based on the number of carbon atoms. Thereby, chemical characteristics, such as hydrophilicity and hydrophobicity, can be judged from the carbon number of the mother nucleus of each compound Mr.

また、化合物分類装置１００によれば、構造式ＤＢ２３０を参照して、各化合物Ｍｒの置換基を表す文字列に対応する構造式に含まれる炭素数を特定し、さらに、各化合物Ｍｒの置換基の炭素数に基づいて、化合物群Ｍ１〜ＭＲを分類することができる。 In addition, according to the compound classification apparatus 100, the structural formula DB 230 is referred to, the number of carbons included in the structural formula corresponding to the character string representing the substituent of each compound Mr is specified, and the substituent of each compound Mr is further determined. The compound groups M1 to MR can be classified based on the number of carbon atoms.

これにより、各化合物Ｍｒの各置換基の炭素数から親水性や疎水性などの化学的特性を判断でき、化合物群Ｍ１〜ＭＲの中から化学的特性が類似する化合物の集合を判別することができる。この結果、例えば、化合物群Ｍ１〜ＭＲのうち母核が同一の化合物の集合のうちの、化学的特性が類似する化合物同士の類似性や差分を判断し易くすることができる。 Thereby, chemical characteristics such as hydrophilicity and hydrophobicity can be determined from the number of carbon atoms of each substituent of each compound Mr, and a set of compounds having similar chemical characteristics can be determined from the compound groups M1 to MR. it can. As a result, for example, it is possible to easily determine the similarity or difference between compounds having similar chemical characteristics in a set of compounds having the same mother nucleus in the compound groups M1 to MR.

また、化合物分類装置１００によれば、化合物群Ｍ１〜ＭＲのうち、基本構造となる化合物の母核を表す文字列と他の化合物の母核を表す文字列とを比較することにより、化合物群Ｍ１〜ＭＲを分類することができる。 Moreover, according to the compound classification apparatus 100, the compound group M1-MR is compared with the character string showing the mother nucleus of the compound which becomes a basic structure, and the character string showing the mother nucleus of another compound among the compound groups M1 to MR. M1 to MR can be classified.

これにより、化合物群Ｍ１〜ＭＲの中から化合物の基礎となる部分構造を表す母核が、基本構造となる化合物と同一の化合物の集合を判別することができる。この結果、例えば、化合物群Ｍ１〜ＭＲのうちの基本構造となる化合物と母核が同一の化合物同士の類似性や差分を判断し易くすることができる。 Thereby, it is possible to discriminate a set of compounds in which the mother nucleus representing the partial structure serving as the basis of the compound from the compound groups M1 to MR is the same as the compound serving as the basic structure. As a result, for example, it is possible to easily determine the similarity or difference between compounds having the same mother nucleus as the compound having the basic structure in the compound groups M1 to MR.

また、化合物分類装置１００によれば、化合物群Ｍ１〜ＭＲのうち、基本構造となる化合物の置換基を表す文字列と他の化合物の置換基を表す文字列とを比較することにより、化合物群Ｍ１〜ＭＲを分類することができる。 Moreover, according to the compound classification apparatus 100, the compound group M1-MR is compared with the character string showing the substituent of the compound used as a basic structure, and the character string showing the substituent of another compound among the compound groups M1-MR. M1 to MR can be classified.

これにより、化合物群Ｍ１〜ＭＲの中から化合物の系統や命名に使う部分構造を表す置換基が、基本構造となる化合物と同一の化合物の集合を判別することができる。この結果、例えば、化合物群Ｍ１〜ＭＲのうち基本構造となる化合物と母核が同一の化合物の集合のうちの、基本構造となる化合物と置換基が同一の化合物同士の類似性や差分を判断し易くすることができる。 Thereby, it is possible to discriminate a set of compounds having the same structure as the compound in which the substituents representing the partial structure used for the system and nomenclature of the compounds from the compound groups M1 to MR become the basic structure. As a result, for example, the similarity or difference between compounds having the same basic structure and compounds having the same substituent in the group of compounds having the same mother nucleus as the compound having the basic structure in the compound groups M1 to MR is determined. Can be made easier.

また、化合物分類装置１００によれば、化合物群Ｍ１〜ＭＲのうち、基本構造となる化合物の置換基数と他の化合物の置換基数とを比較することにより、化合物群Ｍ１〜ＭＲを分類することができる。 Further, according to the compound classification apparatus 100, the compound groups M1 to MR can be classified by comparing the number of substituents of the compound serving as the basic structure with the number of substituents of other compounds among the compound groups M1 to MR. it can.

これにより、化合物群Ｍ１〜ＭＲの中から化合物の置換基数が、基本構造となる化合物と同一の化合物の集合を判別することができる。この結果、例えば、化合物群Ｍ１〜ＭＲのうち基本構造となる化合物と母核が同一の化合物の集合のうちの、基本構造となる化合物と置換基数が同一の化合物同士の類似性や差分を判断し易くすることができる。 Thereby, it is possible to discriminate a set of compounds in which the number of substituents of the compound is the same as the compound having the basic structure from the compound groups M1 to MR. As a result, for example, the similarity or difference between compounds having the same number of substituents as the compound having the basic structure in the set of compounds having the same mother nucleus as the compound having the basic structure in the compound groups M1 to MR is determined. Can be made easier.

また、化合物分類装置１００によれば、化合物群Ｍ１〜ＭＲのうち、基本構造となる化合物の母核に結合する置換基の結合位置と他の化合物の母核に結合する置換基の結合位置とを比較することにより、化合物群Ｍ１〜ＭＲを分類することができる。 In addition, according to the compound classification apparatus 100, among the compound groups M1 to MR, the bonding position of the substituent bonded to the mother nucleus of the compound that is the basic structure and the bonding position of the substituent bonded to the mother nucleus of the other compound are The compound groups M1 to MR can be classified by comparing.

これにより、化合物群Ｍ１〜ＭＲの中から化合物の母核に結合する置換基の結合位置が、基本構造となる化合物と同一の化合物の集合を判別することができる。この結果、例えば、化合物群Ｍ１〜ＭＲのうち基本構造となる化合物と母核が同一の化合物の集合のうちの、基本構造となる化合物と母核に結合する置換基の結合位置が同一の化合物同士の類似性や差分を判断し易くすることができる。 Thereby, it is possible to discriminate a set of compounds in which the bonding position of the substituent bonded to the mother nucleus of the compound from the compound groups M1 to MR is the same as the compound serving as the basic structure. As a result, for example, in the group of compounds having the same basic structure as the compound having the basic structure in the compound groups M1 to MR, the compound having the same binding position of the substituent having the basic structure and the bond to the mother nucleus. It is possible to easily determine the similarity and difference between each other.

また、化合物分類装置１００によれば、化合物群Ｍ１〜ＭＲのうち、基本構造となる化合物の母核の構造の種類と他の化合物の母核の構造の種類とを比較することにより、化合物群Ｍ１〜ＭＲを分類することができる。これにより、化合物群Ｍ１〜ＭＲの中から化合物の母核の構造の種類が、基本構造となる化合物と同一の化合物の集合を判別することができる。 Moreover, according to the compound classification apparatus 100, the compound group M1-MR is compared with the type of the structure of the mother nucleus of the compound that is the basic structure and the type of the structure of the mother nucleus of the other compound. M1 to MR can be classified. Thereby, it is possible to discriminate a set of compounds having the same structure as the basic structure of the compound from the compound groups M1 to MR.

また、化合物分類装置１００によれば、化合物群Ｍ１〜ＭＲのうち、基本構造となる化合物の母核の炭素数と他の化合物の母核の炭素数とを比較することにより、化合物群Ｍ１〜ＭＲを分類することができる。これにより、化合物群Ｍ１〜ＭＲの中から化合物の母核の炭素数が、基本構造となる化合物と同一の化合物の集合を判別することができる。 Moreover, according to the compound classification apparatus 100, the compound groups M1 to M1 are compared by comparing the number of carbons of the mother nucleus of the compound serving as the basic structure and the number of carbons of the mother nucleus of the other compound among the compound groups M1 to MR. MR can be classified. Thereby, it is possible to discriminate a set of compounds in which the number of carbon atoms of the mother nucleus of the compound is the same as that of the compound having the basic structure from among the compound groups M1 to MR.

また、化合物分類装置１００によれば、化合物群Ｍ１〜ＭＲのうち、基本構造となる化合物の第ｊ置換基の炭素数と他の化合物の第ｊ置換基の炭素数とを比較することにより、化合物群Ｍ１〜ＭＲを分類することができる。 Further, according to the compound classification apparatus 100, by comparing the carbon number of the jth substituent of the compound that is the basic structure in the compound groups M1 to MR with the carbon number of the jth substituent of the other compound, The compound groups M1 to MR can be classified.

これにより、化合物群Ｍ１〜ＭＲの中から化合物の第ｊ置換基の炭素数が、基本構造となる化合物と同一の化合物の集合を判別することができる。この結果、例えば、化合物群Ｍ１〜ＭＲのうち基本構造となる化合物と母核が同一の化合物の集合のうちの、基本構造となる化合物と第ｊ置換基の化学的特性が類似する化合物同士の類似性や差分を判断し易くすることができる。 Thereby, it is possible to discriminate a set of compounds in which the number of carbon atoms of the j-th substituent of the compound is the same as that of the compound serving as the basic structure from among the compound groups M1 to MR. As a result, for example, among the group of compounds having the same mother nucleus as the compound having the basic structure in the compound groups M1 to MR, the compounds having the basic structure and compounds having similar chemical properties of the jth substituent It is possible to easily determine similarity and difference.

また、化合物分類装置１００によれば、第ｊ置換基が別の置換基を含む複合置換基か否かを判定し、第ｊ置換基が複合置換基の場合、第ｊ置換基を表す文字列を、化合物Ｍｒの化合物名Ｎｒに設定することができる。そして、化合物分類装置１００によれば、構造式ＤＢ２３０を参照して、新たに設定された化合物Ｍｒの化合物名Ｎｒの中から、化合物Ｍｒの母核を表す文字列を検出することができる。 Further, according to the compound classification apparatus 100, it is determined whether or not the jth substituent is a composite substituent containing another substituent. If the jth substituent is a composite substituent, a character string representing the jth substituent. Can be set to the compound name Nr of the compound Mr. According to the compound classification apparatus 100, a character string representing the mother nucleus of the compound Mr can be detected from the compound name Nr of the newly set compound Mr with reference to the structural formula DB 230.

これにより、化合物Ｍｒの複合置換基を新たな分類対象として、上記検出部７０２、抽出部７０３、特定部７０４等の処理が実行され、第ｊ置換基を表す文字列を解析することができる。この結果、例えば、各化合物Ｍｒの第２階層の母核に基づいて、新たな分類対象として設定された複合置換基群を分類することができる。 As a result, the detection unit 702, the extraction unit 703, the specifying unit 704, and the like are executed with the compound substituent of the compound Mr as a new classification target, and the character string representing the jth substituent can be analyzed. As a result, for example, the composite substituent group set as a new classification target can be classified based on the parent nucleus of the second hierarchy of each compound Mr.

これらのことから、実施の形態にかかる化合物分類装置、化合物分類プログラムおよび化合物分類方法によれば、化合物群Ｍ１〜ＭＲを、化合物Ｍｒの各階層の母核や置換基の特徴に基づいて分類することができる。これにより、ユーザは、化合物群Ｍ１〜ＭＲのうち共通の特徴を有する化合物同士で比較することが可能となり、化合物同士の類似性や差分を判断し易くなる。また、ユーザは、化合物Ｍｒの各階層の母核や置換基の特徴から、化合物Ｍｒの官能基や化合物Ｍｒの全体構造を把握し易くなる。また、ユーザは、化合物Ｍｒが複合置換基を有する化合物であっても、階層ごとに母核や置換基の特徴を判別することができ、階層構造化された化合物Ｍｒの全体構造を把握し易くなる。 Therefore, according to the compound classification device, the compound classification program, and the compound classification method according to the embodiment, the compound groups M1 to MR are classified based on the mother nucleus and substituent characteristics of each layer of the compound Mr. be able to. Thereby, the user can compare the compounds having common characteristics among the compound groups M1 to MR, and can easily determine the similarity and difference between the compounds. In addition, the user can easily grasp the functional group of the compound Mr and the entire structure of the compound Mr from the features of the mother nucleus and substituents of each layer of the compound Mr. Further, even if the compound Mr is a compound having a composite substituent, the user can discriminate the characteristics of the mother nucleus and the substituent for each hierarchy, and it is easy to grasp the overall structure of the hierarchically structured compound Mr. Become.

この結果、例えば、ユーザは、各階層の母核や置換基の特徴から化合物全体の性質を判断して、化合物同士の性質を比較することにより、化合物群がどのような意図の化合物であるかを判断することができる。また、ユーザは、化合物Ｍｒの各階層の母核や置換基の特徴のうち一部の特徴が不明であっても他の特徴から化合物同士の類似性や差分を判断することができる。 As a result, for example, the user judges the properties of the entire compound from the features of the mother nucleus and substituents at each level, and compares the properties of the compounds to determine what kind of compound the compound group is intended for. Can be judged. In addition, even if some of the features of the mother nucleus and substituents of each layer of the compound Mr are unknown, the user can determine the similarity or difference between the compounds from other features.

なお、本実施の形態で説明した化合物分類方法は、予め用意されたプログラムをパーソナル・コンピュータやワークステーション等のコンピュータで実行することにより実現することができる。本化合物分類プログラムは、ハードディスク、フレキシブルディスク、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤ等のコンピュータで読み取り可能な記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行される。また、本化合物分類プログラムは、インターネット等のネットワークを介して配布してもよい。 The compound classification method described in the present embodiment can be realized by executing a prepared program on a computer such as a personal computer or a workstation. The present compound classification program is recorded on a computer-readable recording medium such as a hard disk, a flexible disk, a CD-ROM, an MO, and a DVD, and is executed by being read from the recording medium by the computer. The compound classification program may be distributed via a network such as the Internet.

上述した実施の形態に関し、さらに以下の付記を開示する。 The following additional notes are disclosed with respect to the embodiment described above.

（付記１）化合物の母核となる部分構造の名称を記憶する記憶部を参照して、分類対象となる化合物群の各々の化合物の化合物名の中から、前記各々の化合物の母核となる部分構造の名称を表す文字列を検出する検出部と、
前記検出部によって検出された前記各々の化合物の母核を表す文字列に基づいて、前記化合物群を分類する分類部と、
前記分類部によって分類された分類結果を出力する出力部と、
を有することを特徴とする化合物分類装置。 (Additional remark 1) With reference to the memory | storage part which memorize | stores the name of the partial structure used as the mother nucleus of a compound, it becomes a mother nucleus of each said compound from the compound name of each compound of the compound group used as a classification | category object A detection unit for detecting a character string representing the name of the partial structure;
A classification unit for classifying the compound group based on a character string representing a mother nucleus of each compound detected by the detection unit;
An output unit that outputs a classification result classified by the classification unit;
A compound classification apparatus comprising:

（付記２）前記各々の化合物の化合物名のうち前記各々の化合物の母核を表す文字列を除く残余の文字列の中から、前記各々の化合物の置換基となる部分構造の名称を表す文字列を抽出する抽出部をさらに有し、
前記分類部は、
さらに、前記抽出部によって抽出された前記各々の化合物の置換基を表す文字列に基づいて、前記化合物群を分類することを特徴とする付記１に記載の化合物分類装置。 (Additional remark 2) The character which represents the name of the partial structure used as the substituent of each said compound from the remaining character strings except the character string which represents the mother nucleus of each said compound among the compound names of each said compound An extractor for extracting columns;
The classification unit includes:
The compound classification apparatus according to appendix 1, wherein the compound group is further classified based on a character string representing a substituent of each compound extracted by the extraction unit.

（付記３）前記分類部は、
さらに、前記抽出部によって抽出された前記各々の化合物の置換基の数に基づいて、前記化合物群を分類することを特徴とする付記２に記載の化合物分類装置。 (Supplementary note 3)
Furthermore, the compound classification apparatus according to appendix 2, wherein the compound group is classified based on the number of substituents of each compound extracted by the extraction unit.

（付記４）前記各々の化合物の化合物名のうち前記各々の化合物の母核を表す文字列を除く残余の文字列の中から、前記各々の化合物の置換基となる部分構造の名称を表す文字列を抽出する抽出部をさらに有し、
前記分類部は、
さらに、前記抽出部によって抽出された前記各々の化合物の置換基の数に基づいて、前記化合物群を分類することを特徴とする付記１に記載の化合物分類装置。 (Additional remark 4) The character which represents the name of the partial structure used as the substituent of each said compound from the remaining character strings except the character string which represents the mother nucleus of each said compound among the compound names of each said compound An extractor for extracting columns;
The classification unit includes:
Furthermore, the compound classification apparatus according to appendix 1, wherein the compound group is classified based on the number of substituents of each compound extracted by the extraction unit.

（付記５）前記抽出部は、
前記残余の文字列の中から、前記各々の化合物の母核に結合する前記各々の化合物の置換基の結合位置を表す文字列を抽出し、
前記分類部は、
さらに、前記各々の化合物の置換基の結合位置を表す文字列に基づいて、前記化合物群を分類することを特徴とする付記２〜４のいずれか一つに記載の化合物分類装置。 (Supplementary note 5)
Extracting from the remaining character strings a character string representing the bonding position of the substituent of each compound that binds to the mother nucleus of each compound,
The classification unit includes:
The compound classification apparatus according to any one of appendices 2 to 4, wherein the compound group is further classified based on a character string representing a bonding position of a substituent of each compound.

（付記６）化合物の母核となる部分構造の名称と前記母核の構造の種類とが対応付けて前記記憶部に記憶されている場合、前記記憶部を参照して前記各々の化合物の母核を表す文字列に対応する前記母核の構造の種類を特定する特定部をさらに有し、
前記分類部は、
さらに、前記特定部によって特定された前記各々の化合物の母核の構造の種類に基づいて、前記化合物群を分類することを特徴とする付記１〜５のいずれか一つに記載の化合物分類装置。 (Additional remark 6) When the name of the partial structure used as the mother nucleus of a compound and the kind of structure of the said mother nucleus are matched and memorize | stored in the said memory | storage part, the mother of each said compound with reference to the said memory | storage part A specific part that identifies the type of structure of the mother nucleus corresponding to a character string representing a nucleus;
The classification unit includes:
Further, the compound classification apparatus according to any one of appendices 1 to 5, wherein the compound group is classified based on a type of structure of a mother nucleus of each compound identified by the identifying unit. .

（付記７）前記特定部は、
化合物の母核となる部分構造の名称と前記母核の構造式とが対応付けて前記記憶部に記憶されている場合、前記記憶部を参照して前記各々の化合物の母核を表す文字列に対応する構造式に含まれる特定の元素の元素数を特定し、
前記分類部は、
さらに、前記特定部によって特定された前記各々の化合物の母核の構造式に含まれる前記特定の元素の元素数に基づいて、前記化合物群を分類することを特徴とする付記６に記載の化合物分類装置。 (Supplementary note 7)
A character string representing the mother nucleus of each compound with reference to the storage unit when the name of the partial structure serving as the mother nucleus of the compound and the structural formula of the mother nucleus are stored in the storage unit in association with each other. Identify the number of elements in the structural formula corresponding to
The classification unit includes:
Further, the compound group is classified based on the number of elements of the specific element included in the structural formula of the mother nucleus of each compound specified by the specifying unit. Classification device.

（付記８）化合物の置換基となる部分構造の名称と前記置換基の構造式とが対応付けて前記記憶部に記憶されている場合、前記記憶部を参照して前記各々の化合物の置換基を表す文字列に対応する構造式に含まれる特定の元素の元素数を特定する特定部をさらに有し、
前記分類部は、
さらに、前記特定部によって特定された前記各々の化合物の置換基の構造式に含まれる前記特定の元素の元素数に基づいて、前記化合物群を分類することを特徴とする付記２〜５のいずれか一つに記載の化合物分類装置。 (Additional remark 8) When the name of the partial structure used as the substituent of a compound and the structural formula of the said substituent are matched and memorize | stored in the said memory | storage part, the substituent of each said compound with reference to the said memory | storage part A specific part that specifies the number of elements of a specific element included in the structural formula corresponding to the character string representing
The classification unit includes:
Further, any one of appendices 2 to 5, wherein the compound group is classified based on the number of elements of the specific element included in the structural formula of the substituent of each compound specified by the specific unit The compound classification apparatus as described in any one.

（付記９）前記化合物群のうち特定の化合物の母核を表す文字列と、前記化合物群のうち前記特定の化合物とは異なる他の化合物の母核を表す文字列とを比較する比較部をさらに有し、
前記分類部は、
前記比較部によって比較された比較結果に基づいて、前記化合物群を分類することを特徴とする付記１に記載の化合物分類装置。 (Additional remark 9) The comparison part which compares the character string showing the mother nucleus of a specific compound among the said compound groups, and the character string showing the mother nucleus of the other compound different from the said specific compound among the said compound groups. In addition,
The classification unit includes:
The compound classification apparatus according to appendix 1, wherein the compound group is classified based on a comparison result compared by the comparison unit.

（付記１０）前記各々の化合物の化合物名のうち前記各々の化合物の母核を表す文字列を除く残余の文字列の中から、前記各々の化合物の置換基となる部分構造の名称を表す文字列を抽出する抽出部をさらに有し、
前記比較部は、
さらに、前記特定の化合物の置換基を表す文字列と、前記他の化合物の置換基を表す文字列とを比較することを特徴とする付記９に記載の化合物分類装置。 (Additional remark 10) The character which represents the name of the partial structure used as the substituent of each said compound from the remaining character strings except the character string which represents the mother nucleus of each said compound among the compound names of each said compound An extractor for extracting columns;
The comparison unit includes:
Furthermore, the compound classification apparatus according to appendix 9, wherein a character string representing a substituent of the specific compound is compared with a character string representing a substituent of the other compound.

（付記１１）前記比較部は、
さらに、前記特定の化合物の置換基の数と、前記他の化合物の置換基の数とを比較することを特徴とする付記１０に記載の化合物分類装置。 (Supplementary Note 11) The comparison unit
Furthermore, the compound classification apparatus according to appendix 10, wherein the number of substituents of the specific compound is compared with the number of substituents of the other compound.

（付記１２）前記各々の化合物の化合物名のうち前記各々の化合物の母核を表す文字列を除く残余の文字列の中から、前記各々の化合物の置換基となる部分構造の名称を表す文字列を抽出する抽出部をさらに有し、
前記比較部は、
さらに、前記特定の化合物の置換基の数と、前記他の化合物の置換基の数とを比較することを特徴とする付記９に記載の化合物分類装置。 (Additional remark 12) The character showing the name of the partial structure used as the substituent of each said compound from the remaining character strings except the character string showing the mother nucleus of each said compound among the compound names of each said compound An extractor for extracting columns;
The comparison unit includes:
Furthermore, the compound classification apparatus according to appendix 9, wherein the number of substituents of the specific compound is compared with the number of substituents of the other compound.

（付記１３）前記抽出部は、
前記残余の文字列の中から、前記各々の化合物の母核に結合する前記各々の化合物の置換基の結合位置を表す文字列を抽出し、
前記比較部は、
さらに、前記特定の化合物の置換基の結合位置を表す文字列と、前記他の化合物の置換基の結合位置を表す文字列とを比較することを特徴とする付記１０〜１２のいずれか一つに記載の化合物分類装置。 (Supplementary note 13)
Extracting from the remaining character strings a character string representing the bonding position of the substituent of each compound that binds to the mother nucleus of each compound,
The comparison unit includes:
Further, any one of appendices 10 to 12, wherein the character string representing the bonding position of the substituent of the specific compound is compared with the character string representing the bonding position of the substituent of the other compound. The compound classification device according to 1.

（付記１４）化合物の母核となる部分構造の名称と前記母核の構造の種類とが対応付けて前記記憶部に記憶されている場合、前記記憶部を参照して前記各々の化合物の母核を表す文字列に対応する前記母核の構造の種類を特定する特定部をさらに有し、
前記比較部は、
さらに、前記特定部によって特定された前記特定の化合物の母核の構造の種類と、前記他の化合物の母核の構造の種類とを比較することを特徴とする付記９〜１３のいずれか一つに記載の化合物分類装置。 (Supplementary note 14) In the case where the name of the partial structure serving as the mother nucleus of the compound and the type of the structure of the mother nucleus are stored in the storage unit in association with each other, the mother of each compound is referred to with reference to the storage unit A specific part that identifies the type of structure of the mother nucleus corresponding to a character string representing a nucleus;
The comparison unit includes:
Furthermore, the type of the structure of the mother nucleus of the specific compound identified by the identification unit is compared with the type of the structure of the mother nucleus of the other compound, The compound classification apparatus described in 1.

（付記１５）前記特定部は、
化合物の母核となる部分構造の名称と前記母核の構造式とが対応付けて前記記憶部に記憶されている場合、前記記憶部を参照して前記各々の化合物の母核を表す文字列に対応する構造式に含まれる特定の元素の元素数を特定し、
前記比較部は、
さらに、前記特定の化合物の母核の構造式に含まれる前記特定の元素の元素数と、前記他の化合物の母核の構造式に含まれる前記特定の元素の元素数とを比較することを特徴とする付記１４に記載の化合物分類装置。 (Supplementary note 15)
A character string representing the mother nucleus of each compound with reference to the storage unit when the name of the partial structure serving as the mother nucleus of the compound and the structural formula of the mother nucleus are stored in the storage unit in association with each other. Identify the number of elements in the structural formula corresponding to
The comparison unit includes:
Further, comparing the number of elements of the specific element contained in the structural formula of the mother nucleus of the specific compound and the number of elements of the specific element contained in the structural formula of the mother nucleus of the other compound. 15. The compound classification device according to supplementary note 14, which is characterized.

（付記１６）化合物の置換基となる部分構造の名称と前記置換基の構造式とが対応付けて前記記憶部に記憶されている場合、前記記憶部を参照して前記各々の化合物の置換基を表す文字列に対応する構造式に含まれる特定の元素の元素数を特定する特定部をさらに有し、
前記比較部は、
さらに、前記特定の化合物の置換基の構造式に含まれる前記特定の元素の元素数と、前記他の化合物の置換基の構造式に含まれる前記特定の元素の元素数とを比較することを特徴とする付記１０〜１３のいずれか一つに記載の化合物分類装置。 (Supplementary Note 16) When the name of the partial structure that is a substituent of the compound and the structural formula of the substituent are stored in the storage unit in association with each other, the substituent of each compound is referred to the storage unit A specific part that specifies the number of elements of a specific element included in the structural formula corresponding to the character string representing
The comparison unit includes:
Furthermore, comparing the number of elements of the specific element included in the structural formula of the substituent of the specific compound and the number of elements of the specific element included in the structural formula of the substituent of the other compound. The compound classification device according to any one of Supplementary Notes 10 to 13, which is characterized by the following.

（付記１７）前記各々の化合物の置換基を表す文字列に基づいて、前記各々の化合物の置換基が別の置換基を含む複合置換基か否かを判定する判定部と、
前記判定部によって前記複合置換基であると判定された場合、前記各々の化合物の置換基を表す文字列を、前記分類対象となる化合物の化合物名に設定する設定部と、をさらに有し、
前記検出部は、
前記記憶部を参照して、前記設定部によって設定された前記分類対象となる各々の化合物の化合物名の中から前記各々の化合物の母核となる部分構造の名称を表す文字列を検出することを特徴とする付記１〜１６のいずれか一つに記載の化合物分類装置。 (Supplementary Note 17) Based on a character string representing a substituent of each compound, a determination unit that determines whether the substituent of each compound is a composite substituent containing another substituent, and
A setting unit that sets a character string representing a substituent of each compound as a compound name of the compound to be classified when the determination unit determines that the compound substituent is the compound substituent;
The detector is
Referring to the storage unit, detecting a character string representing a name of a partial structure serving as a mother nucleus of each compound from the compound names of the respective compounds to be classified set by the setting unit The compound classification device according to any one of supplementary notes 1 to 16, wherein:

（付記１８）コンピュータに、
化合物の母核となる部分構造の名称を記憶する記憶部を参照して、分類対象となる化合物群の各々の化合物の化合物名の中から、前記各々の化合物の母核となる部分構造の名称を表す文字列を検出し、
検出した前記各々の化合物の母核を表す文字列に基づいて、前記化合物群を分類し、
分類した分類結果を出力する、
処理を実行させることを特徴とする化合物分類プログラム。 (Supplementary note 18)
The name of the partial structure serving as the mother nucleus of each compound from among the compound names of the respective compounds in the group of compounds to be classified with reference to the storage unit storing the name of the partial structure serving as the mother nucleus of the compound Finds a string that represents
Classifying the compound group based on the detected character string representing the nucleus of each compound;
Output the classified result,
A compound classification program characterized by causing a process to be executed.

（付記１９）コンピュータが、
化合物の母核となる部分構造の名称を記憶する記憶部を参照して、分類対象となる化合物群の各々の化合物の化合物名の中から、前記各々の化合物の母核となる部分構造の名称を表す文字列を検出し、
検出した前記各々の化合物の母核を表す文字列に基づいて、前記化合物群を分類し、
分類した分類結果を出力する、
処理を実行することを特徴とする化合物分類方法。 (Supplementary note 19)
The name of the partial structure serving as the mother nucleus of each compound from among the compound names of the respective compounds in the group of compounds to be classified with reference to the storage unit storing the name of the partial structure serving as the mother nucleus of the compound Finds a string that represents
Classifying the compound group based on the detected character string representing the nucleus of each compound;
Output the classified result,
The compound classification method characterized by performing a process.

１００化合物分類装置
７０１受付部
７０２検出部
７０３抽出部
７０４特定部
７０５分類部
７０６比較部
７０７算出部
７０８判定部
７０９設定部
７１０作成部
７１１出力部 DESCRIPTION OF SYMBOLS 100 Compound classification | category apparatus 701 Reception part 702 Detection part 703 Extraction part 704 Specification part 705 Classification part 706 Comparison part 707 Calculation part 708 Determination part 709 Setting part 710 Creation part 711 Output part

Claims

The name of the partial structure serving as the mother nucleus of each compound from among the compound names of the respective compounds in the group of compounds to be classified with reference to the storage unit storing the name of the partial structure serving as the mother nucleus of the compound A detection unit for detecting a character string representing
A classification unit for classifying the compound group based on a character string representing a mother nucleus of each compound detected by the detection unit;
An output unit that outputs a classification result classified by the classification unit;
A compound classification apparatus comprising:

A character string representing the name of the partial structure serving as a substituent of each compound is extracted from the remaining character strings excluding the character string representing the mother nucleus of each compound among the compound names of the respective compounds. An extractor,
The classification unit includes:
The compound classification apparatus according to claim 1, wherein the compound group is further classified based on a character string representing a substituent of each compound extracted by the extraction unit.

The classification unit includes:
The compound classification apparatus according to claim 2, wherein the compound group is further classified based on the number of substituents of each compound extracted by the extraction unit.

A character string representing the name of the partial structure serving as a substituent of each compound is extracted from the remaining character strings excluding the character string representing the mother nucleus of each compound among the compound names of the respective compounds. An extractor,
The classification unit includes:
The compound classification apparatus according to claim 1, wherein the compound group is further classified based on the number of substituents of each compound extracted by the extraction unit.

The extraction unit includes:
Extracting from the remaining character strings a character string representing the bonding position of the substituent of each compound that binds to the mother nucleus of each compound,
The classification unit includes:
Furthermore, the said compound group is classify | categorized based on the character string showing the coupling | bonding position of the substituent of each said compound, The compound classification device as described in any one of Claims 2-4 characterized by the above-mentioned.

When the name of the partial structure serving as the mother nucleus of the compound and the type of the structure of the mother nucleus are stored in the storage unit in association with each other, the characters representing the mother nucleus of each compound with reference to the storage unit A specific part that identifies the type of structure of the mother nucleus corresponding to a column;
The classification unit includes:
The compound group according to claim 1, further comprising classifying the compound group based on a structure type of a mother nucleus of each compound identified by the identifying unit. apparatus.

The specific part is:
A character string representing the mother nucleus of each compound with reference to the storage unit when the name of the partial structure serving as the mother nucleus of the compound and the structural formula of the mother nucleus are stored in the storage unit in association with each other. Identify the number of elements in the structural formula corresponding to
The classification unit includes:
Furthermore, the said compound group is classified based on the element number of the said specific element contained in the structural formula of the mother nucleus of each said compound specified by the said specific part. Compound classification device.

A determination unit that determines whether or not the substituent of each compound is a composite substituent containing another substituent based on the character string representing the substituent of each compound;
A setting unit that sets a character string representing a substituent of each compound as a compound name of the compound to be classified when the determination unit determines that the compound substituent is the compound substituent;
The detector is
Referring to the storage unit, detecting a character string representing a name of a partial structure serving as a mother nucleus of each compound from the compound names of the respective compounds to be classified set by the setting unit The compound classification device according to any one of claims 1 to 7.

On the computer,
The name of the partial structure serving as the mother nucleus of each compound from among the compound names of the respective compounds in the group of compounds to be classified with reference to the storage unit storing the name of the partial structure serving as the mother nucleus of the compound Finds a string that represents
Classifying the compound group based on the detected character string representing the nucleus of each compound;
Output the classified result,
A compound classification program characterized by causing a process to be executed.

Computer
The name of the partial structure serving as the mother nucleus of each compound from among the compound names of the respective compounds in the group of compounds to be classified with reference to the storage unit storing the name of the partial structure serving as the mother nucleus of the compound Finds a string that represents
Classifying the compound group based on the detected character string representing the nucleus of each compound;
Output the classified result,
The compound classification method characterized by performing a process.