JPH09305628A

JPH09305628A - Device for generating canonized data and its method

Info

Publication number: JPH09305628A
Application number: JP8125123A
Authority: JP
Inventors: Atsushi Tomonaga; 惇朝永
Original assignee: Kureha Corp
Current assignee: Kureha Corp
Priority date: 1996-05-20
Filing date: 1996-05-20
Publication date: 1997-11-28

Abstract

PROBLEM TO BE SOLVED: To provide a canonized data generating device by which the use quantity of a storage area in a compound/reaction database system is greatly reduced. SOLUTION: Peculiar data concerning respective atoms received by an input means A and inter-atom connection pair data are given to a canocized data generating means B. Then, the canonized data generating means B generates canonized data based on these kinds of data. Canonized data generated in this way is an extremely short character, numerical and code string. Therefore, many kinds of canonizing data are preserved in a small storage area.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、化学及び生化学分
野の情報処理を行う処理装置及び処理方法に関し、特
に、化合物を構成する各原子についての各種データから
化合物の化学構造を一意的に特定する正準化データを作
成する正準化データ作成装置及び正準化データ作成方法
に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a processing apparatus and a processing method for performing information processing in the fields of chemistry and biochemistry, and particularly to uniquely specify a chemical structure of a compound from various data on each atom constituting the compound. And a canonicalized data creating method for creating canonicalized data.

【０００２】[0002]

【従来の技術】従来より、化合物情報を収載した化合物
データベースシステム、及び化合物の反応情報を収載し
た反応データベースシステムが開発されている。化合物
データベースシステムには、既存の化合物の物性や作用
などの化合物情報が収載されており、化合物の構造をキ
ーとして化合物情報にアクセスするものである。この化
合物データベースを用いれば、化合物の特性や作用を効
率良く参照することができる。また、反応データベース
システムには、既存の化合物の反応情報が収載されてお
り、化合物の構造をキーとして反応情報にアクセスする
ものである。この反応データベースシステムを用いれ
ば、合成化学の研究者が化合物の新規合成を行う際に、
既存の化合物の反応情報の中から類似する反応情報を効
率良く参照することができ、合成化学の研究では必須で
ある。2. Description of the Related Art Conventionally, a compound database system containing compound information and a reaction database system containing compound reaction information have been developed. The compound database system contains compound information such as physical properties and actions of existing compounds, and the compound information is accessed with the structure of the compound as a key. By using this compound database, it is possible to efficiently refer to the properties and actions of compounds. In addition, the reaction database system stores reaction information of existing compounds, and the reaction information is accessed using the structure of the compound as a key. Using this reaction database system, researchers in synthetic chemistry can perform new synthesis of compounds,
Since similar reaction information can be efficiently referred to from the reaction information of existing compounds, it is essential in synthetic chemistry research.

【０００３】なお、化合物データベースには、例えば、
米国ＭＤＬ社の化合物管理システム“ＭＡＣＣＳ”があ
る。また、反応データベースシステムには、例えば、米
国ＭＤＬ社の総合化学情報管理システム“ＩＳＩＳ”
や、反応情報管理システム“ＲＥＡＣＣＳ”がある。In the compound database, for example,
There is a compound management system "MACCS" of MDL, Inc. in the United States. Further, the reaction database system is, for example, a comprehensive chemical information management system "ISIS" of MDL, Inc. in the United States.
There is also a reaction information management system "REACCS".

【０００４】[0004]

【発明が解決しようとする課題】ところで、従来の化合
物／反応データベースシステムは、化合物の特性や作用
の情報と共に、化合物の構造図をディスプレイに表示さ
せる機能を有している。このように構造図を表示させる
ことによって視覚的に優れたシステムの構築が可能とな
る。ところがこの化合物の構造図は、多くの記憶領域を
占有する画像データ（ビットマップデータ）としてシス
テムに保存されるため、大容量の記憶装置が必要となり
問題であった。By the way, the conventional compound / reaction database system has a function of displaying a structural diagram of a compound on a display together with information on the property and action of the compound. By displaying the structure diagram in this way, it is possible to construct a visually excellent system. However, since the structural diagram of this compound is stored in the system as image data (bitmap data) occupying a large storage area, a large-capacity storage device is required, which is a problem.

【０００５】本発明は、化合物／反応データベースシス
テムで利用した場合にこのシステムの記憶領域の使用量
を大幅に削減させることのできる正準化データ作成装置
及び正準化データ作成方法を提供することを目的とす
る。The present invention provides a canonicalized data creating apparatus and a canonicalized data creating method which can significantly reduce the amount of storage area used in a compound / reaction database system. With the goal.

【０００６】[0006]

【課題を解決するための手段】上記課題を解決するため
に、本発明の正準化データ作成装置は、化合物を構成す
る各原子についての固有データ及び原子間の結合対デー
タの入力を受け付ける入力手段と、入力手段で受け付け
られた各データに基づいて、化合物の化学構造を一意的
に特定できる正準化データを作成する正準化データ作成
手段とを備えた正準化データ作成装置であって、正準化
データ作成手段は、各原子を等価原子ごとに別のクラス
に分類して、クラスごとに異なるクラス番号を各原子に
与える第１の処理部と、第１の処理部で各原子に与えら
れたクラス番号に基づいて、化合物の構造と一意的に対
応した正準化番号を各原子に与える第２の処理部と、第
２の処理部で各原子に与えられた正準化番号に基づい
て、正準化データを作成する第３の処理部とを備えるこ
とを特徴とする。In order to solve the above-mentioned problems, the canonicalized data creating apparatus of the present invention has an input for accepting the input of the unique data for each atom constituting the compound and the bond pair data between the atoms. And a canonical data creating means for creating canonical data capable of uniquely identifying a chemical structure of a compound based on each data received by the input means. Then, the canonicalized data creating means classifies each atom into different classes for each equivalent atom, and assigns a different class number to each atom in each class. A second processing unit that gives each atom a canonicalization number uniquely corresponding to the structure of the compound based on the class number given to the atom, and the canonical process given to each atom by the second processing unit. Canonicalized data based on the digitization number Characterized in that it comprises a third processing unit that formed.

【０００７】このような構成を有する本発明の正準化デ
ータ作成装置によれば、入力手段で受け付けられた各原
子についての固有データ及び原子間の結合対データは正
準化データ作成手段に与えられる。そして、正準化デー
タ作成手段では、これらのデータに基づいて正準化デー
タが作成される。According to the canonical data creation device of the present invention having such a configuration, the unique data for each atom and the bond pair data between the atoms accepted by the input means are given to the canonical data creation means. To be Then, the canonicalized data creating means creates canonicalized data based on these data.

【０００８】即ち、正準化データ作成手段では、まず、
第１の処理部の処理を実行し、各原子についての固有デ
ータ及び原子間の結合対データに基づいて、各原子を等
価原子ごとに別のクラスに分類する。そして、クラスご
とに異なるクラス番号を各原子に与える。次に、第２の
処理部の処理を実行し、各原子に与えられたクラス番号
及び原子間の結合対データに基づいて、化合物の構造と
一意的に対応した正準化番号を各原子に与える。さら
に、第３の処理部の処理を実行し、各原子に与えられた
正準化番号及び各原子についての固有データに基づいて
正準化データを作成する。That is, in the canonicalized data creating means, first,
The processing of the first processing unit is executed, and each atom is classified into a different class for each equivalent atom based on the unique data for each atom and the bond pair data between the atoms. Then, a different class number is given to each atom for each class. Next, the processing of the second processing unit is executed, and the canonicalization number uniquely corresponding to the structure of the compound is assigned to each atom based on the class number given to each atom and the bond pair data between the atoms. give. Further, the process of the third processing unit is executed to create the canonicalized data based on the canonicalized number given to each atom and the unique data for each atom.

【０００９】ここで、第１の処理部は、各原子に３種類
の属性（ａ_i ，ｂ_ij，ｄ_ij）を与えて、これらの属性が
一つでも異なる原子は非等価であると判定できることを
利用して、各原子を等価原子ごとに異なるクラス番号を
付与しており、３種類の属性（ａ_i ，ｂ_ij，ｄ_ij）の中
で、ａ_i は入力番号ｉの原子の種類記号であり、ｂ_ijは
入力番号ｉの原子に隣接する結合のうち、その種類記号
がｊである結合の数であり、ｄ_ijは入力番号ｉの原子か
ら最短経路により、ｊ個の結合を経て巡れる道筋の数で
あり、第２の処理部は、正準化番号を１から昇順に各原
子に与える過程において、クラス番号の優先順位が最高
である原子に正準化番号１を与え、以降正準化番号ｎま
でが付与されているとき、既に正準化番号が与えられて
いる原子で、且つまだ正準化番号が与えられていない原
子が結合している原子の中で、正準化番号が最小である
原子を選び、その原子に結合している原子で、且つまだ
正準化番号が与えられていない原子の中で、クラス番号
の優先順位が最高である原子に正準化番号ｎ＋１を与え
ており、第３の処理部は、各原子に３種類の属性（Ｐ
_i ，Ｔ_i ，Ｓ_i ）を与えて、これらの属性を一列に並べ
ることによって正準化データを作成しており、３種類の
属性（Ｐ_i ，Ｔ_i ，Ｓ_i ）の中で、Ｐ_i は正準化番号ｉ
の原子に結合し且つ正準化番号が最小の原子の正準化番
号であり、Ｔ_iは正準化番号ｉの原子と正準化番号Ｐ_i
の原子との結合の種類記号であり、Ｓ_iは正準化番号ｉ
の原子の種類記号であることが好ましい。Here, the first processing unit gives each atom three types of attributes (a _i , b _ij , d _ij ) and determines that atoms having even one of these attributes are not equivalent. Utilizing what is possible, each atom is given a different class number for each equivalent atom, and among the three types of attributes (a _i , b _ij , d _ij ), a _i is the type of atom with the input number i Is a symbol, b _ij is the number of bonds whose kind symbol is j among the bonds adjacent to the atom of input number i, and d _ij is the number of bonds from the atom of input number i by the shortest path. The second processing unit assigns the canonicalization number 1 to the atom with the highest class number priority in the process of assigning the canonicalization number to each atom in ascending order from 1. , And thereafter, up to the canonicalization number n, the atoms which have already been given canonicalization numbers, and Among the atoms to which atoms that have not been given canonicalization numbers are bonded, select the atom with the smallest canonicalization number, and the atom that is bonded to that atom, and whose canonicalization number is still Among the atoms that are not given, the atom having the highest class number priority is given the canonicalization number n + 1, and the third processing unit gives each atom three types of attributes (P
_i , T _i , S _i ) are given and these attributes are arranged in a line to create the canonical data. Among the three types of attributes (P _i , T _i , S _i ), P _i is the canonicalization number i
Is the canonicalization number of the atom that has the smallest canonicalization number and is connected to the atom of, and T _i is the atom of canonicalization number i and the canonicalization number P _i.
Is a symbol for the type of bond with the atom, and S _i is the canonicalization number i
It is preferable that the symbol is the atom type symbol.

【００１０】また、本発明の正準化データ作成方法は、
化合物を構成する各原子についての固有データ及び原子
間の結合対データに基づいて、化合物の化学構造を一意
的に特定できる正準化データを作成する正準化データ作
成方法であって、各原子を等価原子ごとに別のクラスに
分類して、クラスごとに異なるクラス番号を各原子に与
える第１のステップと、第１のステップで各原子に与え
られたクラス番号に基づいて、化合物の構造と一意的に
対応した正準化番号を各原子に与える第２のステップ
と、第２のステップで各原子に与えられた正準化番号に
基づいて、正準化データを作成する第３のステップとを
備えることを特徴とする。Further, the canonical data creation method of the present invention is
A method for creating canonical data that uniquely identifies the chemical structure of a compound, based on the unique data for each atom that constitutes the compound and the bond pair data between the atoms. Is classified into different classes for each equivalent atom, and the structure of the compound is determined based on the first step of giving each atom a different class number for each class and the class number given to each atom in the first step. The second step of giving each atom a canonicalization number uniquely corresponding to and the third step of creating the canonicalization data based on the canonicalization number given to each atom in the second step. And a step.

【００１１】ここで、第１のステップは、各原子に３種
類の属性（ａ_i ，ｂ_ij，ｄ_ij）を与えて、これらの属性
が一つでも異なる原子は非等価であると判定できること
を利用して、各原子を等価原子ごとに異なるクラス番号
を付与しており、３種類の属性（ａ_i ，ｂ_ij，ｄ_ij）の
中で、ａ_i は入力番号ｉの原子の種類記号であり、ｂ_ij
は入力番号ｉの原子に隣接する結合のうち、その種類記
号がｊである結合の数であり、ｄ_ijは入力番号ｉの原子
から最短経路により、ｊ個の結合を経て巡れる道筋の数
であり、第２のステップは、正準化番号を１から昇順に
各原子に与える過程において、クラス番号の優先順位が
最高である原子に正準化番号１を与え、以降正準化番号
ｎまでが付与されているとき、既に正準化番号が与えら
れている原子で、且つまだ正準化番号が与えられていな
い原子が結合している原子の中で、正準化番号が最小で
ある原子を選び、その原子に結合している原子で、且つ
まだ正準化番号が与えられていない原子の中で、クラス
番号の優先順位が最高である原子に正準化番号ｎ＋１を
与えており、第３のステップは、各原子に３種類の属性
（Ｐ_i ，Ｔ_i ，Ｓ_i ）を与えて、これらの属性を一列に
並べることによって正準化データを作成しており、３種
類の属性（Ｐ_i ，Ｔ_i ，Ｓ_i ）の中で、Ｐ_iは正準化番
号ｉの原子に結合し且つ正準化番号が最小の原子の正準
化番号であり、Ｔ_i は正準化番号ｉの原子と正準化番号
Ｐ_i の原子との結合の種類記号であり、Ｓ_i は正準化番
号ｉの原子の種類記号であることが好ましい。Here, the first step is to give three kinds of attributes (a _i , b _ij , di _ij ) to each atom, and to judge that atoms having any one of these attributes are not equivalent. , Each atom is given a different class number for each equivalent atom, and among the three types of attributes (a _i , b _ij , d _ij ), a _i is the type symbol of the atom with the input number i. And b _ij
Is the number of bonds whose kind symbol is j among the bonds adjacent to the atom with the input number i, and _dij is the number of paths that can pass from the atom with the input number i through j bonds by the shortest path. The second step is to give the atom having the highest class number priority in the process of giving each atom in ascending order from 1 and assigning the canonical number 1 thereafter. Up to, the atom with the canonicalization number already assigned and the atom with the canonicalization number not yet assigned has the smallest canonicalization number. Select an atom and assign the canonicalization number n + 1 to the atom that has the highest class number priority among the atoms that are bonded to that atom and have not yet been given a canonicalization number. In the third step, each atom has three types of attributes (P _i , T _i , S _i ) is given to create the canonical data by arranging these attributes in a line. Among the three types of attributes (P _i , T _i , S _i ), P _i is the canonicalization number. i is the canonicalization number of the atom that is bonded to the atom of i and has the smallest canonicalization number, and T _i is the type symbol of the bond between the atom of canonicalization number i and the atom of canonicalization number P _i . , S _i is preferably a symbol of the atom type with the canonicalization number i.

【００１２】[0012]

【発明の実施の形態】以下、本発明に係る好適な実施形
態について添付図面を参照して説明する。Preferred embodiments of the present invention will be described below with reference to the accompanying drawings.

【００１３】図１は、本発明に係る実施形態である正準
化データ作成装置１の構成を示すブロック図である。図
１に示すように、正準化データ作成装置１は、分子構造
図の画像データ１０ａを記憶する画像メモリ１０と、記
号データ１１ａなどを一時的に記憶する作業用メモリ１
１と、オペレーティングシステム（ＯＳ）２１及び正準
化データ作成プログラム２２が記憶された主記憶装置２
０と、結合表ファイル３１及び化合物情報ファイル３３
が記憶されたハードディスク装置３０とを備えている。FIG. 1 is a block diagram showing a configuration of a canonical data creating apparatus 1 according to an embodiment of the present invention. As shown in FIG. 1, the canonical data creation device 1 includes an image memory 10 for storing image data 10a of a molecular structure diagram and a work memory 1 for temporarily storing symbol data 11a and the like.
1 and a main storage device 2 in which an operating system (OS) 21 and a canonical data creation program 22 are stored
0, binding table file 31 and compound information file 33
Is stored in the hard disk device 30.

【００１４】また、正準化データ作成装置１は、分子構
造図を表示するディスプレイ４０と、手書き図形の入力
を受け付けるポインティングデバイスであるマウス５０
と、化学式等の記号データの入力を受け付けるキーボー
ド６０と、分子構造図を出力するプリンタ７０と、分子
構造図作成プログラム２２の実行等を制御するＣＰＵ８
０とを備えている。なお、ポインティングデバイスに
は、マウス５０以外に、タブレット、ディジタイザ、ラ
イトペンなどがあり、これらのいずれの装置をマウス５
０の代わりに備えてもよい。Further, the canonical data preparation system 1 includes a display 40 for displaying a molecular structure diagram and a mouse 50 as a pointing device for accepting input of handwritten figures.
A keyboard 60 for receiving symbolic data such as chemical formulas, a printer 70 for outputting a molecular structure diagram, and a CPU 8 for controlling execution of the molecular structure diagram creation program 22.
It has 0 and. In addition to the mouse 50, the pointing device includes a tablet, a digitizer, a light pen, and the like.
It may be provided instead of 0.

【００１５】正準化データ作成プログラム２２は、化合
物を構成する各原子についての固有データ及び原子間の
結合対データに基づいて正準化データを作成するプログ
ラムである。この正準化データ作成プログラム２２は、
処理を統括するメインルーチン１００と、化合物を構成
する各原子にクラス番号を与える構成原子分類ルーチン
（第１の処理部）１０１とを備えている。また、正準化
データ作成プログラム２２は、クラス番号に基づいて各
原子に正準化番号を与える正準化番号付与ルーチン（第
２の処理部）１０２と、各原子の正準化番号に基づいて
正準化データを作成する正準化データ作成ルーチン（第
３の処理部）１０３とを備えている。The canonical data creation program 22 is a program that creates canonical data based on the unique data for each atom constituting the compound and the bond pair data between the atoms. This canonical data creation program 22
A main routine 100 for supervising the processing and a constituent atom classification routine (first processing unit) 101 for giving a class number to each atom constituting the compound are provided. Further, the canonicalization data creation program 22 is based on a canonicalization number assigning routine (second processing unit) 102 for giving a canonicalization number to each atom based on a class number, and a canonicalization number for each atom. And a canonicalized data creation routine (third processing unit) 103 for creating canonicalized data.

【００１６】ハードディスク装置３０には、複数の結合
表３２を格納できる結合表ファイル３１が設けられてい
る。結合表３２には化合物を構成する各原子についての
固有データ及び原子間の結合対データが記録されてお
り、正準化データ作成プログラム２２は結合表３２を介
して、これらのデータにアクセスできる。The hard disk device 30 is provided with a binding table file 31 capable of storing a plurality of binding tables 32. The bond table 32 records unique data for each atom constituting the compound and bond pair data between the atoms, and the canonical data creation program 22 can access these data via the bond table 32.

【００１７】図２（ａ）及び図２（ｂ）に示すように、
結合表３２は、各原子についての固有データが記録され
た原子テーブル３２ａと、原子間の結合対データが記録
された原子対テーブル３２ｂとを備えている。具体的に
は、原子テーブル３２ａには、入力番号（原子の番号と
もいう）、原子の二次元座標（Ｘ座標・Ｙ座標）、元素
名（一般に元素記号が用いられるが、原子番号などの数
字であってもよい）、属性、原子数、及び結合数を書き
込む欄が設けられており（図２（ａ）参照）、原子対テ
ーブル３２ｂには、結合原子対データ、結合種（例え
ば、単結合は１、二重結合は２とする）、及び構造（各
分子が分子構造図の環状部に属するか、鎖状部に属する
かを区別する欄）を書き込む欄が設けられている（図２
（ｂ）参照）。As shown in FIGS. 2 (a) and 2 (b),
The bond table 32 includes an atom table 32a in which unique data about each atom is recorded and an atom pair table 32b in which bond pair data between atoms are recorded. Specifically, in the atom table 32a, input numbers (also referred to as atom numbers), two-dimensional coordinates (X coordinates / Y coordinates) of atoms, element names (generally element symbols are used, but numbers such as atomic numbers). Column) is provided (see FIG. 2A), and the atom pair table 32b includes the bond atom pair data and the bond type (for example, single bond). Bonds are 1 and double bonds are 2), and a column for writing the structure (a column for distinguishing whether each molecule belongs to the cyclic portion or the chain portion of the molecular structure diagram) is provided (Fig. Two
(B)).

【００１８】ここで、入力番号は、化合物を構成する各
原子をコンピュータで識別するための番号であり、図２
（ａ）の例では数字であるが、記号であっても良い。ま
た、結合原子対データは、入力番号の組合せとして表現
されるのが良い。Here, the input number is a number for identifying each atom constituting the compound with a computer, and FIG.
Although it is a number in the example of (a), it may be a symbol. Also, the bond atom pair data is preferably represented as a combination of input numbers.

【００１９】なお、正準化データを作成するためには、
上記の原子テーブル３２ａ及び原子対テーブル３２ｂの
すべてのデータが必要であるのではなく、固有データと
して原子の番号及び元素名、結合対データとして結合原
子対データ及び結合種があれば十分である。In order to create the canonicalized data,
Not all the data in the atom table 32a and the atom pair table 32b described above are required, but it is sufficient if the atomic number and element name are used as the unique data, and the bond atom pair data and the bond type are used as the bond pair data.

【００２０】また、ハードディスク装置３０には、化合
物番号とこの化合物に対応する正準化データ（カノニカ
ルデータともいう）との関係を示す一覧表が記録された
化合物情報ファイル３３が格納されている。図３に示す
ように、化合物情報ファイル３３は、化合物番号Ｃ₁ 〜
Ｃ₇ の各化合物に対応した正準化データと、化合物Ｃ₁
〜Ｃ₇ の各化合物についての参照データ（名称、文献、
物性等）とを、化合物番号Ｃ₁ 〜Ｃ₇ に対応した一覧表
として記録したファイルである。このため、化合物番号
Ｃ₁ 〜Ｃ₇ をキーとして、化合物情報ファイル３３にア
クセスすれば、化合物番号Ｃ₁ 〜Ｃ₇ の各化合物につい
ての、正準化データ及び参照データを読み出すことがで
きる。ここで、正準化データは各化合物の化学構造を一
意的に特定する複数の記号からなるデータである。The hard disk device 30 also stores a compound information file 33 in which a list showing the relationship between compound numbers and canonical data (also called canonical data) corresponding to the compounds is recorded. As shown in FIG. 3, the compound information file 33 includes compound numbers C ₁ to
Canonical data for each compound of C ₇ and compound C ₁
~ C ₇ reference data for each compound (name, literature,
And physical properties) are recorded as a list corresponding to compound numbers C _{1 to} C ₇ . Therefore, if the compound information file 33 is accessed by using the compound numbers C _{1 to} C ₇ as keys, the canonicalized data and the reference data for each compound of the compound numbers C _{1 to} C ₇ can be read. Here, the canonicalized data is data composed of a plurality of symbols that uniquely specify the chemical structure of each compound.

【００２１】なお、構成原子分類ルーチン１０１が第１
のステップに、正準化番号付与ルーチン１０２が第２の
ステップに、正準化データ作成ルーチン１０３が第３の
ステップにそれぞれ対応する。The constituent atom classification routine 101 is the first
The canonicalization number assignment routine 102 corresponds to the second step, and the canonicalization data creation routine 103 corresponds to the third step.

【００２２】次に、正準化データ作成装置１の動作の概
要について説明する。図４に示すように、操作者はマウ
ス５０又はキーボード６０を操作して、分子構造図の作
成対象となる化合物の結合表３２を結合表ファイル３１
内に作成することができる。Next, the outline of the operation of the canonical data creating device 1 will be described. As shown in FIG. 4, the operator operates the mouse 50 or the keyboard 60 to display the binding table 32 of the compound for which the molecular structure diagram is to be created in the binding table file 31.
Can be created within.

【００２３】マウス５０による入力は、マウス５０を用
いてディスプレイ４０上に化合物の分子構造図を手書き
入力するもので、入力順に定まる各原子の入力番号が、
主記憶装置２０内に作成された結合表３２の入力番号の
欄に書き込まれる。さらに、この分子構造図Ｅ₁ の各原
子の結合関係を示す結合原子対データが、結合表３２の
結合原子対の欄に書き込まれる。このように、マウス５
０による入力では、化合物を特定する結合表３２が手書
きされた分子構造図Ｅ₁ から作成される。The mouse 50 is used for handwriting the molecular structure diagram of the compound on the display 40 using the mouse 50. The input numbers of the atoms determined in the input order are
It is written in the input number column of the combination table 32 created in the main storage device 20. Further, bond atom pair data indicating the bond relationship of each atom in the molecular structure diagram E ₁ is written in the bond atom pair column of the bond table 32. In this way, mouse 5
When the input is 0, the bond table 32 for identifying the compound is created from the handwritten molecular structure diagram E ₁ .

【００２４】ここで、マウス５０による手書き分子構造
図Ｅ₁の入力について、より具体的に説明する。まず、
操作者がマウス５０をクリックすると、結合原子対を構
成する一方の原子についてのデータが入力される。次
に、操作者がマウス５０を移動させて、マウス５０を再
びクリックすると、一方の原子と結合原子対を構成する
他方の原子についてのデータが入力される。そして、こ
の他方の原子についてのデータを入力するためのクリッ
クは、続いて次のクリックがなされた時には、次の結合
原子対の一方の原子を指定するクリックと見なされる。
このように、連続した２回のクリックによって、一つの
結合原子対を指定することができる。即ち、操作者が原
子を次々とずらしながら、結合原子対の指定を続けるこ
とにより、化合物を構成する全ての結合を入力すること
ができる。The input of the handwritten molecular structure diagram E ₁ by the mouse 50 will be described more specifically. First,
When the operator clicks the mouse 50, data on one of the atoms forming the bond atom pair is input. Next, when the operator moves the mouse 50 and clicks the mouse 50 again, data on one atom and the other atom forming a bond atom pair is input. Then, the click for inputting the data for the other atom is regarded as a click for designating one atom of the next bond atom pair when the next click is subsequently made.
In this way, one bond atom pair can be designated by two consecutive clicks. That is, the operator can input all the bonds constituting the compound by continuously designating the bond atom pairs while shifting the atoms one by one.

【００２５】入力された原子についてのデータに対応し
て（例えば原子の入力順に１，２，３，・・・というよ
うに）、原子テーブル３２ａの入力番号の欄に原子の番
号が書き込まれる。また、入力された原子の結合関係に
ついてのデータは、原子対テーブル３２ｂの結合原子対
の欄に書き込まれる。さらに、操作者が原子の元素名を
入力した場合には、それは原子テーブル３２ａの元素名
の欄に書き込まれる。同様に、操作者が結合原子対を結
ぶ結合の多重度を入力した場合には、それは原子対テー
ブル３２ｂの結合種の欄に書き込まれる。元素名及び多
重度は、それが書き込まれなかった時には、それぞれ炭
素及び単結合と認識されるのがよい。なお、分子構造図
及びそれに基づく結合表は、通常、水素原子を省略して
作成される。The atom number is written in the input number column of the atom table 32a in correspondence with the input data of the atom (for example, 1, 2, 3, ... In the order of atom input). Further, the input data regarding the bond relation of the atoms is written in the column of the bond atom pair of the atom pair table 32b. Further, when the operator inputs the element name of the atom, it is written in the element name column of the atom table 32a. Similarly, when the operator inputs the multiplicity of the bond connecting the bond atom pairs, it is written in the bond type column of the atom pair table 32b. The element name and multiplicity should be recognized as carbon and single bond, respectively, when it is not written. The molecular structure diagram and the bond table based on it are usually prepared by omitting hydrogen atoms.

【００２６】また、キーボード６０による入力は、キー
ボード６０を用いて所定の化合物に対応する結合表名を
特定する記号列を入力するもので、入力された記号デー
タ１１ａに基づいて、この結合表名によって特定される
結合表３２が結合表ファイル３１から読み出される。In addition, the keyboard 60 is used to input a symbol string that specifies a binding table name corresponding to a predetermined compound using the keyboard 60. Based on the input symbol data 11a, the binding table name is input. The binding table 32 specified by is read from the binding table file 31.

【００２７】このように、マウス５０及びキーボード６
０が入力手段Ａを構成し、マウス５０とキーボード６０
とのいずれを用いても結合表３２が得られる。そして、
正準化データ作成手段Ｂである正準化データ作成プログ
ラム２２が実行されて、結合表３２の各データに基づい
て正準化データ３４が作成される。このように作成され
た正準化データ３４は、化合物の参照データと共に化合
物情報ファイル３３に書き込まれて保存される。ここ
で、結合表３２から正準化データ３４を作成して保存す
るのは、結合表３２のままで保存するよりも記憶領域を
小さくできるからである。即ち、図２に示した結合表３
２に基づいて作成した正準化データ３４は“１％１％１
−２％３％５％Ｎ／６％７／”であり、化合物の構造を
非常に短い文字・数字・記号列で表すことができる。こ
のように短い記号列を保存の対象とすれば、記憶資源を
有効に利用でき装置の小型軽量化に寄与することができ
る。Thus, the mouse 50 and the keyboard 6
0 constitutes input means A, and mouse 50 and keyboard 60
The binding table 32 is obtained by using either of the above. And
The canonicalized data creating program 22 which is the canonicalized data creating means B is executed, and the canonicalized data 34 is created based on each data of the connection table 32. The canonical data 34 thus created is written and stored in the compound information file 33 together with the compound reference data. The reason why the canonicalized data 34 is created and stored from the combination table 32 is that the storage area can be made smaller than that of the combination table 32 as it is. That is, the connection table 3 shown in FIG.
The canonical data 34 created based on 2 is “1% 1% 1
-2% 3% 5% N / 6% 7 / ", and the structure of the compound can be represented by extremely short letters, numbers, and symbol strings. The storage resources can be effectively used, and the device can be made compact and lightweight.

【００２８】また、結合表３２の結合原子対データに基
づいて、二次元座標演算処理が行われることによって、
各原子の二次元座標データを得ることができる。このよ
うに得られた二次元座標データから美的に優れた分子構
造図Ｅ₂ が作成される。作成された分子構造図Ｅ₂ は、
ディスプレイ４０に表示することやプリンタ７０から出
力することができる。Further, the two-dimensional coordinate calculation processing is performed based on the bond atom pair data in the bond table 32,
Two-dimensional coordinate data of each atom can be obtained. From the two-dimensional coordinate data thus obtained, the aesthetically superior molecular structure diagram E ₂ is created. The created molecular structure diagram E ₂ is
It can be displayed on the display 40 or output from the printer 70.

【００２９】なお、キーボード６０による入力は、前記
固有データ及び結合対データを主記憶装置２０内に作成
された結合表に直接書き込むことでも良い。また、イメ
ージスキャナーやオプティカルカードリーダー（ＯＣ
Ｒ）などの光学的に図形や文字を読み取る装置を本発明
の入力装置として用いて、結合表データの入力の受付け
を行うことでも良い。The input from the keyboard 60 may be performed by directly writing the unique data and the join pair data into the join table created in the main storage device 20. In addition, image scanners and optical card readers (OC
A device for optically reading a figure or a character such as R) may be used as the input device of the present invention to accept the input of the connection table data.

【００３０】次に、本発明に係る実施形態である正準化
データ作成方法について説明する。この作成方法には、
正準化データ作成装置１が用いられる。まず、ＯＳ２１
の制御の下で、正準化データ作成プログラム２２のメイ
ンルーチン１００が起動される。Next, a canonicalized data creating method according to an embodiment of the present invention will be described. This method is
The canonical data creation device 1 is used. First, OS21
Under the control of, the main routine 100 of the canonical data creation program 22 is started.

【００３１】図５のフローチャートに示すように、メイ
ンルーチン１００は、まず、構成原子分類ルーチン１０
１を呼び出して、化合物を構成する各原子にクラス番号
を付与する（Ｓ１０）。次に、正準化番号付与ルーチン
１０２を呼び出して、各原子に付与されたクラス番号に
基づいて各原子に正準化番号を付与する（Ｓ２０）。さ
らに、正準化データ作成ルーチン１０３を呼び出して、
各原子に付与された正準化番号に基づいて正準化データ
を作成する（Ｓ３０）。このように作成された正準化デ
ータは、化合物情報ファイル３３に書き込まれて保存さ
れる。As shown in the flow chart of FIG. 5, the main routine 100 first consists of the constituent atom classification routine 10.
1 is called to give a class number to each atom constituting the compound (S10). Next, the canonicalization number assignment routine 102 is called to assign a canonicalization number to each atom based on the class number assigned to each atom (S20). Furthermore, by calling the canonical data creation routine 103,
The canonicalization data is created based on the canonicalization number given to each atom (S30). The canonical data thus created is written and stored in the compound information file 33.

【００３２】次に、Ｓ１０で呼び出される構成原子分類
ルーチン１０１の処理について説明する。この処理は、
化合物を構成する各原子を等価原子毎に別のクラスに分
類して、属するクラスに対応するクラス番号を各原子に
与える処理である。例えば、ベンゼンの６個の炭素原子
は全て等価なので、全てに同一のクラス番号が与えられ
る。また、トルエンの７個の炭素原子は、５種類のクラ
ス番号で表現される。即ち、トルエンのオルソ位の二つ
の炭素原子、さらにメタ位の二つの炭素原子はそれぞれ
等価であり、同一のクラス番号が与えられる。Next, the processing of the constituent atom classification routine 101 called in S10 will be described. This process
This is a process of classifying each atom constituting a compound into different classes for each equivalent atom and giving a class number corresponding to the class to which each atom belongs. For example, all 6 carbon atoms of benzene are equivalent, so they are all given the same class number. The 7 carbon atoms of toluene are represented by 5 different class numbers. That is, the two carbon atoms in the ortho position of toluene and the two carbon atoms in the meta position are equivalent and are given the same class number.

【００３３】図６のフローチャートに示すように、ま
ず、結合表３２に基づいて化合物を構成する各原子に３
種類の属性（ａ_i ，ｂ_ij，ｄ_ij）をそれぞれ与える（Ｓ
１０１）。ここで、属性ａ_i は入力番号ｉの原子の種類
記号（この例では原子番号）である。また、属性ｂ_ijは
入力番号ｉの原子に隣接する結合（即ち、入力番号ｉの
原子を一方の原子とする結合）のうち、その種類記号
（この例では結合種は単結合を１、二重結合を２、三重
結合を３、芳香結合を４とした）がｊである結合の数
（ベクトル量）である。さらに、属性ｄ_ijは入力番号ｉ
の原子から最短経路によりｊ個の結合を経て巡れる道筋
の数（ベクトル量）である。As shown in the flow chart of FIG. 6, first, based on the bond table 32, 3 atoms are formed for each atom constituting the compound.
Type attributes (a _i , b _ij , d _ij ) are given (S
101). Here, the attribute a _i is the type symbol (atom number in this example) of the atom of the input number i. Further, the attribute b _ij is the type symbol (in this example, the bond type is a single bond, a single bond, or a double bond) of the bonds adjacent to the atom with the input number i (that is, the bond having the atom with the input number i as one atom). The number of bonds (vector quantity) is j, where 2 is a heavy bond, 3 is a triple bond, and 4 is an aromatic bond. Furthermore, the attribute d _ij is the input number i
It is the number of paths (vector amount) that can be traveled from the atom of through the j bonds by the shortest path.

【００３４】次に、原子ごとに属性（ａ_i ，ｂ_ij，
ｄ_ij）を並べて数字列（この例では９桁の数字列）と
し、この数字列が小さい順番にクラス番号Ｃ_i ⁰ を与え
て、各原子を複数のクラスに分類する（Ｓ１０２）。こ
こで与えられるクラス番号Ｃ_i ⁰ は０次のクラス番号で
あり、Ｓ１０３以降のループ処理で１次のクラス番号Ｃ
_i ¹、２次のクラス番号Ｃ_i ² 、…を順次求めていく。
この０次の段階でクラスの数が総原子数と等しければ、
処理を終了してもよい。Next, the attributes (a _i , b _ij ,
(d _ij ) are arranged to form a numerical sequence (in this example, a 9-digit numerical sequence), the class numbers C _i ⁰ are given in ascending order of the numerical sequence, and each atom is classified into a plurality of classes (S102). The class number C _i ⁰ given here is the 0th-order class number, and is the 1st-order class number C in the loop processing after S103.
_i ¹ , secondary class numbers C _i ² , ... Are sequentially obtained.
If the number of classes is equal to the total number of atoms in this 0th order,
The processing may be terminated.

【００３５】次に、次数ｎを１にする（Ｓ１０３）。そ
して、各原子に属性Ｖ_ij ⁿ を与える（Ｓ１０４）。属性
Ｖ_ij ⁿ は入力番号ｉの原子に結合し、次数ｎ−１おいて
クラス番号がｊである原子の数である。さらに、原子ご
とに属性（ａ_i ，ｂ_ij，ｄ_ij，Ｖ_ij ⁿ ）を並べて、この
数字列が小さい順番にクラス番号Ｃ_i ⁿ を与えて、各原
子を複数のクラスに分類する（Ｓ１０５）。そして、ク
ラスの数Ｎ_n が前のループでのクラスの数Ｎ_(n-1) と等
しいか調べて、等しい場合には処理を終了する。また
は、クラスの数Ｎ_n が総原子数と等しいか調べて、等し
い場合には処理を終了する（Ｓ１０６）。いずれも等し
くない場合には、ｎに１を加えて処理をＳ１０４に戻す
（Ｓ１０７）。Ｓ１０２とＳ１０５の処理では、数字列
が小さい順にクラス番号を与えているが、大きい順にク
ラス番号を与えてもよい。Next, the order n is set to 1 (S103). Then, the attribute V _ij ⁿ is given to each atom (S104). The attribute V _ij ⁿ is the number of atoms having a class number j in the order n−1 that is bonded to the atom with the input number i. Furthermore, the attributes (a _i , b _ij , d _ij , V _ij ⁿ ) are arranged for each atom, and the class numbers C _i ⁿ are given in the ascending order of the number sequence to classify each atom into a plurality of classes (S105). ). Then, check whether the number N _n of classes is equal to the number of classes N _(n-1) in the previous loop, the process is terminated if equal. Alternatively, it is checked whether the number N _{n of} classes is equal to the total number of atoms, and if they are equal, the process ends (S106). If they are not equal, 1 is added to n and the process returns to S104 (S107). In the processes of S102 and S105, the class numbers are given in ascending order of the number string, but the class numbers may be given in ascending order.

【００３６】次に、３，５−ジメチル−２，３，４，５
−テトラハイドロピリジンを例にして構成原子分類ルー
チン１０１のステップごとの処理を詳細に説明する。Next, 3,5-dimethyl-2,3,4,5
The step-by-step processing of the constituent atom classification routine 101 will be described in detail using tetrahydropyridine as an example.

【００３７】まず、Ｓ１０１の処理が実行される。この
処理を実行する際には、図７（ａ）（ｂ）に示すような
データが既に結合表３２に書き込まれており、この結合
表３２に書き込まれた各データに基づいて、各原子に３
種類の属性（ａ_i ，ｂ_ij，ｄ_ij）を与える。ここで、こ
の結合表３２に記録された入力番号は、図８に示すよう
に各原子が手書き入力された順番に与えられる任意の数
字である。First, the process of S101 is executed. When this processing is executed, the data as shown in FIGS. 7A and 7B has already been written in the binding table 32, and based on each data written in this binding table 32, the data is written in each atom. Three
The type attributes (a _i , b _ij , d _ij ) are given. Here, the input numbers recorded in the bond table 32 are arbitrary numbers given in the order in which each atom is handwritten and input, as shown in FIG.

【００３８】属性ａ_i は次のように求められる。上述し
たように属性ａ_i は入力番号ｉの原子の種類記号であ
る。ここで、結合表３２には各原子の元素記号が記録さ
れており、これらの元素記号から原子の種類記号（原子
番号）を求めることができる。従って、元素記号を結合
表３２から読み出すことによって、この元素記号に対応
した属性ａ_i を得ることができる。その結果、ａ₁ ，ａ
₂ ，ａ₄ 〜ａ₈ ＝６（炭素の原子番号）、ａ₃ ＝７（窒
素の原子番号）がそれぞれ得られる。The attribute a _i is obtained as follows. As described above, the attribute a _i is the type symbol of the atom with the input number i. Here, the element symbol of each atom is recorded in the bond table 32, and the type symbol (atomic number) of the atom can be obtained from these element symbols. Therefore, by reading the element symbol from the bond table 32, the attribute a _i corresponding to this element symbol can be obtained. As a result, a ₁ , a
_{_{_{2, a 4 ~a 8 = 6}}} ( atomic number of carbon), a ₃ = 7 (nitrogen atomic number) is obtained, respectively.

【００３９】また、属性ｂ_ijは次のように求められる。
上述したように属性ｂ_ijは入力番号ｉの原子に隣接する
結合のうち、その種類記号がｊである結合の数である。
結合表３２には各原子の結合種が記録されており、この
結合種を結合表３２から読み出すことによって属性ｂ_ij
を得ることができる。その結果、ｂ_1j＝（３，０，０，
０）、ｂ_2j＝（１，１，０，０）、ｂ_3j＝（１，１，
０，０）、ｂ_4j＝（２，０，０，０）、ｂ_5j＝（３，
０，０，０）、ｂ_6j＝（２，０，０，０）、ｂ_7j＝
（１，０，０，０）、ｂ_8j＝（１，０，０，０）がそれ
ぞれ得られる。Further, the attribute b _ij is obtained as follows.
As described above, the attribute b _ij is the number of bonds whose kind symbol is j among the bonds adjacent to the atom of the input number i.
The bond type of each atom is recorded in the bond table 32, and the attribute b _{ij is} read by reading the bond type from the bond table 32.
Can be obtained. As a result, b _1j = (3,0,0,
0), b _2j = (1,1,0,0), b _3j = (1,1,
0,0), b _4j = (2,0,0,0), b _5j = (3
0,0,0), b _6j = (2,0,0,0), b _7j =
(1,0,0,0) and _b8j = (1,0,0,0) are obtained, respectively.

【００４０】具体的には、属性ｂ_ijは図９（ａ）（ｂ）
に示す参照テーブル１２を用いて求められる。参照テー
ブル１２は２つの原子の結合関係を示す配列Ｄ（ｘ，
ｙ）として構成されており、結合表３２の結合原子対及
び結合種のデータに基づいて作成される。ここで、ｘは
結合原子対データの１番目の原子の番号、ｙは２番目の
原子の番号であり、例えばｘ＝２、ｙ＝３の座標位置に
は、結合種の２が丸で囲って示されている。即ち、結合
原子対が示す配列要素に結合種ｊが書き込まれて、参照
テーブル１２が作成されている。Specifically, the attribute b _ij is shown in FIGS. 9 (a) and 9 (b).
It is obtained using the reference table 12 shown in FIG. The reference table 12 is an array D (x, which indicates the bond relationship of two atoms.
y) and is created based on the bond atom pair and bond species data in the bond table 32. Here, x is the number of the first atom in the bond atom pair data, and y is the number of the second atom. For example, at the coordinate position of x = 2 and y = 3, the bond type 2 is circled. Is shown. That is, the bond type j is written in the array element indicated by the bond atom pair, and the reference table 12 is created.

【００４１】この参照テーブル１２を用いた属性ｂ_ijの
抽出は次のように行われる。まず、参照テーブル１２の
配列要素のうち、ｘ＝１又はｙ＝１を満たす配列要素
（図９（ａ）において斜線で示した配列要素）を探索し
て、この配列要素に書き込まれたデータ（結合種）ｊを
抽出する。その結果、Ｄ（１，２）＝１、Ｄ（１，６）
＝１、Ｄ（１，８）＝１が得られる。このように得られ
た３つの配列要素のデータｊは全て１なので、ｂ₁₁＝３
となる。また、データｊが２以上の配列要素は得られな
いので、ｂ₁₂〜ｂ₁₄＝０となる。The extraction of the attribute b _ij using this reference table 12 is performed as follows. First, of the array elements of the reference table 12, an array element that satisfies x = 1 or y = 1 (array element indicated by diagonal lines in FIG. 9A) is searched, and the data written in this array element ( Bond type) j is extracted. As a result, D (1,2) = 1, D (1,6)
= 1 and D (1,8) = 1 are obtained. Since the data j of the three array elements thus obtained are all 1, b ₁₁ = 3
Becomes Further, since array elements having data j of 2 or more cannot be obtained, b _{12 to} b ₁₄ = 0.

【００４２】次に、参照テーブル１２の配列要素のう
ち、Ｘ＝２又はＹ＝２を満たす配列要素（図９（ｂ）に
おいて斜線で示した配列要素）を探索して、この配列要
素に書き込まれたデータを抽出する。その結果、Ｄ
（１，２）＝１、Ｄ（２，３）＝２が得られる。このよ
うに得られた配列要素のデータｊは１，２で、それぞれ
１つなので、ｂ₂₁＝ｂ₂₂＝１となる。また、データｊが
３以上の配列要素は得られないので、ｂ₂₃＝ｂ₂₄＝０と
なる。Next, of the array elements of the reference table 12, an array element satisfying X = 2 or Y = 2 (array element shown by hatching in FIG. 9B) is searched and written in this array element. Extracted data. As a result, D
(1,2) = 1 and D (2,3) = 2 are obtained. The data j of the array element thus obtained is 1 and 2, and each is one, so b ₂₁ = b ₂₂ = 1. Further, since array elements having data j of 3 or more cannot be obtained, b ₂₃ = b ₂₄ = 0.

【００４３】ｉ＝３〜８についても同様に処理すること
により、図１０に示す属性ｂ_ij（ｉ＝１〜８、ｊ＝１〜
４）が得られる。By performing the same processing for i = 3 to 8, the attribute b _ij (i = 1 to 8, j = 1 to 1) shown in FIG.
4) is obtained.

【００４４】さらに、属性ｄ_ijは次のように求められ
る。上述したように属性ｄ_ijは入力番号ｉの原子から最
短経路によりｊ個の結合を経て巡れる道筋の数である。
即ち、図８の分子構造図に基づいて説明すると、入力番
号１の原子から１個の結合を経て巡れる道筋は、（入力
番号１〜入力番号２）、（入力番号１〜入力番号６）、
（入力番号１〜入力番号８）の計３本である。また、入
力番号１の原子から２個の結合を経て巡れる道筋は、
（入力番号１〜入力番号２〜入力番号３）、（入力番号
１〜入力番号６〜入力番号５）の計２本である。Further, the attribute d _ij is obtained as follows. As described above, the attribute d _ij is the number of routes that can pass from the atom with the input number i through the j bonds by the shortest path.
That is, to explain based on the molecular structure diagram of FIG. 8, the paths that can pass from the atom of input number 1 through one bond are (input number 1 to input number 2), (input number 1 to input number 6) ,
There are a total of three (input number 1 to input number 8). In addition, the route that can pass from the atom of input number 1 through two bonds is
There are a total of two (input number 1 to input number 2 to input number 3) and (input number 1 to input number 6 to input number 5).

【００４５】さらに、入力番号１の原子から最短経路に
より３個の結合を経て巡れる道筋は、（入力番号１〜入
力番号２〜入力番号３〜入力番号４）、（入力番号１〜
入力番号６〜入力番号５〜入力番号４）、（入力番号１
〜入力番号６〜入力番号５〜入力番号７）の計３本であ
る。さらにまた、入力番号１の原子から最短経路により
４個の結合を経て巡れる道筋は０本である。図８を見れ
ば、入力番号１の原子から４個の結合を経て巡れる道筋
は存在する。例えば、（入力番号１〜入力番号２〜入力
番号３〜入力番号４〜入力番号５）の道筋である。しか
しながら、入力番号１（出発点）から入力番号５（到着
点）に至る道筋として、入力番号６を経由して２個の結
合で巡れる道筋がある。従って、例示の道筋は出発点か
ら到着点に至る最短経路ではない。以上の処理の結果、
ｄ_1j＝（３，２，３，０）が得られる。Further, the routes that can be traveled from the atom of input number 1 through the three bonds by the shortest path are (input number 1 to input number 2 to input number 3 to input number 4), (input number 1 to input number 1 to
Input number 6 to input number 5 to input number 4), (input number 1
-Input number 6-input number 5-input number 7) are three in total. Furthermore, the number of paths that can be traveled from the atom of input number 1 through the four bonds by the shortest path is 0. Looking at FIG. 8, there is a route that can be traveled from the atom of input number 1 through four bonds. For example, the route is (input number 1-input number 2-input number 3--input number 4-input number 5). However, as a route from the input number 1 (starting point) to the input number 5 (arriving point), there is a route that can be looped by two couplings via the input number 6. Therefore, the illustrated route is not the shortest route from the departure point to the arrival point. As a result of the above processing,
We obtain d _1j = (3,2,3,0).

【００４６】同様の処理によって、ｄ_2j＝（２，３，
２，２）、ｄ_3j＝（２，２，４，０）、ｄ_4j＝（２，
３，２，２）、ｄ_5j＝（３，２，３，０）、ｄ_6j＝
（２，４，２，０）、ｄ_7j＝（１，２，２，３）、ｄ_8j
＝（１，２，２，３）がそれぞれ得られる。By the same processing, d _2j = (2,3,
2,2), d _3j = (2,2,4,0), d _4j = (2
3,2,2), d _5j = (3,2,3,0), d _6j =
(2, 4, 2, 0), d _7j = (1, 2, 2, 3), d _8j
= (1,2,2,3) are obtained respectively.

【００４７】具体的には、属性ｄ_ijも属性ｂ_ijと同様
に、参照テーブル１２を参照して求められる。この参照
テーブル１２を参照した属性ｄ_ijの抽出はｉ＝１、ｉ＝
２、…の順番で行われる。まず、属性ｄ_1j（ｉ＝１）が
抽出される。Specifically, the attribute d _ij is also obtained by referring to the reference table 12 like the attribute b _ij . The extraction of the attribute d _ij with reference to the reference table 12 is i = 1, i =
It is performed in the order of 2, ... First, the attribute d _1j (i = 1) is extracted.

【００４８】属性ｄ_1j（ｉ＝１）の抽出は、参照テーブ
ル１２の配列要素のうち、Ｘ＝１又はＹ＝１を満たす配
列要素（図１１（ａ）において斜線で示した配列要素）
を探索して、データが書き込まれた配列要素を抽出す
る。そして、抽出された配列要素に結合経路数として１
を書き込む。その結果、Ｄ（１，２）、Ｄ（１，６）、
Ｄ（１，８）に結合経路数１が書き込まれる（図１１
（ａ）では結合経路数を三角形で囲んで示す）。The extraction of the attribute d _1j (i = 1) is performed by arranging the array elements satisfying X = 1 or Y = 1 among the array elements of the reference table 12 (array elements shown by hatching in FIG. 11A).
Is searched to extract the array element in which the data is written. Then, 1 is set as the number of connection paths in the extracted array element.
Write. As a result, D (1,2), D (1,6),
The coupling path number 1 is written in D (1,8) (FIG. 11).
In (a), the number of bond paths is shown by enclosing it in a triangle.

【００４９】次に、結合経路数１が書き込まれた各配列
要素の添字Ｓ＝（１，２），（１，６），（１，８）を
抽出する。これらの添字Ｓから前段の抽出処理で利用し
た１を除き、Ｓ＝２，６，８を得る。このようにして得
られたＳ＝２，６，８に基づいて、Ｘ＝２，６，８又は
Ｙ＝２，６，８を満たす配列要素（図１１（ｂ）におい
て斜線で示した配列要素）を探索し、データが書き込ま
れて且つ結合経路数が書き込まれていない配列要素を抽
出する。そして、抽出された配列要素に結合経路数とし
て２を書き込む。その結果、Ｄ（２，３）、Ｄ（５，
６）に結合経路数２が書き込まれる。Next, the subscripts S = (1,2), (1,6), (1,8) of each array element in which the coupling path number 1 is written are extracted. From these subscripts S, S = 2, 6, 8 is obtained by excluding 1 used in the extraction process of the previous stage. Based on S = 2,6,8 thus obtained, array elements satisfying X = 2,6,8 or Y = 2,6,8 (array elements shaded in FIG. 11B) ) Is searched, and the array element in which the data is written and the bond path number is not written is extracted. Then, 2 is written in the extracted array element as the number of coupling paths. As a result, D (2,3), D (5,
The number of coupling paths of 2 is written in 6).

【００５０】さらに、結合経路数２が書き込まれた各配
列要素の添字Ｓ＝（２，３），（５，６）を抽出する。
これらの添字Ｓから前段の抽出処理で利用した２，６を
除き、Ｓ＝３，５を得る。このようにして得られたＳ＝
３，５に基づいて、Ｘ＝３，５又はＹ＝３，５を満たす
配列要素（図１２（ｃ）において斜線で示した配列要
素）を探索し、データが書き込まれて且つ結合経路数が
書き込まれていない配列要素を抽出する。そして、抽出
された配列要素に結合経路数として３を書き込む。その
結果、Ｄ（３，４）、Ｄ（４，５）、Ｄ（５，７）に結
合経路数３が書き込まれる。Further, the subscripts S = (2,3), (5,6) of each array element in which the number of coupling paths 2 is written are extracted.
From these subscripts S, 2 and 6 used in the previous extraction process are removed, and S = 3 and 5 are obtained. Thus obtained S =
Based on 3, 5, array elements satisfying X = 3,5 or Y = 3,5 (array elements shaded in FIG. 12C) are searched, and data is written and the number of coupling paths is Extract unwritten array elements. Then, 3 is written as the number of coupling paths in the extracted array element. As a result, the coupling path number 3 is written in D (3,4), D (4,5) and D (5,7).

【００５１】以上の処理によって、全ての配列要素に結
合経路数が書き込まれる。その結果、結合経路数１の配
列要素が３個、結合経路数２の配列要素が２個、結合経
路数３の配列要素が３個、結合経路数４の配列要素が０
個となり、ｄ_1j＝（３，２，３，０）が得られる。By the above processing, the number of coupling paths is written in all array elements. As a result, there are three array elements with the number of bond paths of 1, three array elements with the number of bond paths of two, three array elements with the number of bond paths of three, and zero array elements with the number of bond paths of four.
As a result, d _1j = (3,2,3,0) is obtained.

【００５２】次に、属性ｄ_2j（ｉ＝２）が抽出される。
属性ｄ_2j（ｉ＝２）の抽出は、参照テーブル１２の配列
要素のうち、Ｘ＝２又はＹ＝２を満たす配列要素（図１
３（ａ）において斜線で示した配列要素）を探索して、
データが書き込まれた配列要素を抽出する。そして、抽
出された配列要素に結合経路数として１を書き込む。そ
の結果、Ｄ（１，２）、Ｄ（２，３）に結合経路数１が
書き込まれる（図１３（ａ）では結合経路数を三角形で
囲んで示す）。Next, the attribute d _2j (i = 2) is extracted.
The extraction of the attribute d _2j (i = 2) is performed by selecting an array element satisfying X = 2 or Y = 2 among the array elements of the reference table 12 (see FIG. 1).
3 (a), the array elements indicated by diagonal lines are searched,
Extract the array element in which the data was written. Then, 1 is written as the number of coupling paths in the extracted array element. As a result, the bond path number 1 is written in D (1, 2) and D (2, 3) (in FIG. 13A, the bond path number is enclosed by a triangle).

【００５３】次に、結合経路数１が書き込まれた各配列
要素の添字Ｓ＝（１，２），（２，３）を抽出する。こ
れらの添字Ｓから前段の抽出処理で利用した２を除き、
Ｓ＝１，３を得る。このようにして得られたＳ＝１，３
に基づいて、Ｘ＝１，３又はＹ＝１，３を満たす配列要
素（図１３（ｂ）において斜線で示した配列要素）を探
索し、データが書き込まれて且つ結合経路数が書き込ま
れていない配列要素を抽出する。そして、抽出された配
列要素に結合経路数として２を書き込む。その結果、Ｄ
（１，６）、Ｄ（１，８）、Ｄ（３，４）に結合経路数
２が書き込まれる。Next, the subscripts S = (1, 2), (2, 3) of each array element in which the coupling path number 1 is written are extracted. Except for 2 used in the previous extraction process from these subscripts S,
We obtain S = 1,3. S = 1,3 obtained in this way
Based on the above, an array element satisfying X = 1,3 or Y = 1,3 (array element shown by hatching in FIG. 13B) is searched, and data is written and the number of coupling paths is written. Extract missing array elements. Then, 2 is written in the extracted array element as the number of coupling paths. As a result, D
The coupling path number 2 is written in (1,6), D (1,8), and D (3,4).

【００５４】さらに、結合経路数２が書き込まれた各配
列要素の添字Ｓ＝（１，６），（１，８），（３，４）
を抽出する。これらの添字Ｓから前段の抽出処理で利用
した１，３を除き、Ｓ＝４，６，８を得る。このように
して得られたＳ＝４，６，８に基づいて、Ｘ＝４，６，
８又はＹ＝４，６，８を満たす配列要素（図１４（ｃ）
において斜線で示した配列要素）を探索し、データが書
き込まれて且つ結合経路数が書き込まれていない配列要
素を抽出する。そして、抽出された配列要素に結合経路
数として３を書き込む。その結果、Ｄ（４，５）、Ｄ
（５，６）に結合経路数３が書き込まれる。Further, the subscripts S = (1,6), (1,8), (3,4) of each array element in which the number of coupling paths 2 is written.
To extract. From these subscripts S, S = 4, 6, and 8 are obtained by excluding 1 and 3 used in the previous extraction process. Based on S = 4,6,8 thus obtained, X = 4,6,
8 or array elements satisfying Y = 4, 6, 8 (FIG. 14 (c))
The array element indicated by the diagonal line in (1) is searched, and the array element to which the data is written and the number of bond paths is not written is extracted. Then, 3 is written as the number of coupling paths in the extracted array element. As a result, D (4,5), D
The number of coupling paths of 3 is written in (5, 6).

【００５５】さらにまた、結合経路数３が書き込まれた
各配列要素の添字Ｓ＝（４，５），（５，６）を抽出す
る。これらの添字Ｓから前段の抽出処理で利用した４，
６を除き、Ｓ＝５，５（即ち、Ｓ＝５が２重に適用され
る。）を得る。このようにして得られたＳ＝５，５に基
づいてＸ＝５又はＹ＝５を満たす配列要素（図１４
（ｄ）において斜線で示した配列要素）を探索して、デ
ータが書き込まれて且つ結合経路数が書き込まれていな
い配列要素を抽出する。そして、抽出された配列要素に
結合経路数として４を書き込む。その結果、Ｄ（５，
７）に結合経路数４が２つ書き込まれる。Furthermore, the subscripts S = (4,5), (5, 6) of each array element in which the coupling path number 3 is written are extracted. From these subscripts S, which were used in the previous extraction process 4,
With the exception of 6, we get S = 5,5 (ie S = 5 is applied twice). Based on S = 5,5 thus obtained, array elements satisfying X = 5 or Y = 5 (see FIG.
(D), the array elements indicated by diagonal lines are searched to extract array elements for which data is written and the number of coupling paths is not written. Then, 4 is written as the number of coupling paths in the extracted array element. As a result, D (5,
The number 4 of coupling paths is written in 7).

【００５６】以上の処理によって、全ての配列要素に結
合経路数が書き込まれる。その結果、結合経路数１の配
列要素が２個、結合経路数２の配列要素が３個、結合経
路数３の配列要素が２個、結合経路数４の配列要素が２
個となり、ｄ_2j＝（２，３，２，２）が得られる。By the above processing, the number of coupling paths is written in all array elements. As a result, there are two array elements with the number of bond paths of 1, three array elements with the number of bond paths of two, two array elements with the number of bond paths of three, and two array elements with the number of bond paths of four.
As a result, d _2j = (2,3,2,2) is obtained.

【００５７】そして、属性ｄ_ijのその他（ｉ＝３〜８）
のそれぞれについて、同様の処理を全ての配列要素に結
合経路数が書き込まれるまで続けると、図１０に示すｄ
_ij（ｉ＝１〜８、ｊ＝１〜４）が得られる。以上説明し
たＳ１０１の処理によって、３，５−ジメチル−２，
３，４，５−テトラハイドロピリジンを構成する各原子
に３種類の属性（ａ_i ，ｂ_ij，ｄ_ij）が与えられた。Others of the attribute _dij (i = 3 to 8)
10 is repeated until the number of bond paths is written in all the array elements, the d shown in FIG.
_ij (i = 1 to 8, j = 1 to 4) are obtained. By the process of S101 described above, 3,5-dimethyl-2,
Three kinds of attributes ( _ai , _bij , _dij ) were given to each atom constituting 3,4,5-tetrahydropyridine.

【００５８】次に、Ｓ１０２の処理が実行される。上述
したように、Ｓ１０２では原子ごとに属性（ａ_i ，
ｂ_ij，ｄ_ij）を並べて９桁の数字列とし、この数字列が
小さい順番にクラス番号Ｃ_i ⁰ を与えて、各原子を複数
のクラスに分類している。ここで与えられるクラス番号
Ｃ_i ⁰ は入力番号ｉの原子の０次のクラス番号である。Next, the process of S102 is executed. As described above, in S102, the attribute (a _i ,
b _ij and d _ij ) are arranged to form a 9-digit number sequence, and the class numbers C _i ⁰ are given in ascending order of the number sequence to classify each atom into a plurality of classes. The class number C _i ⁰ given here is the 0th-order class number of the atom with the input number i.

【００５９】Ｓ１０２の処理を具体的に説明すると、入
力番号１の原子の数字列は“６３０００３２３０”であ
り、入力番号２の原子の数字列は“６１１００２３２
２”である。以下順番に、“７１１００２２４０”、
“６２０００２３２２”、“６３０００３２３０”、
“６２０００２４２０”、“６１０００１２２３”、
“６１０００１２２３”となる。The process of S102 will be described in detail. The number sequence of the atom with the input number 1 is "630003230", and the number sequence of the atom with the input number 2 is "61100232".
2 ". In the following order," 711002240 ",
"620002322", "630003230",
"620002420", "610001223",
It becomes "610001223".

【００６０】その結果、入力番号７及び８の原子の数字
列が最小となり、これらの原子にクラス番号Ｃ₇ ⁰ ＝Ｃ
₈ ⁰ ＝１が与えられる。以下、数字列が小さい順に、入
力番号２の原子にクラス番号Ｃ₂ ⁰ ＝２が与えられて、
入力番号４の原子にクラス番号Ｃ₄ ⁰ ＝３が与えられ
る。また、入力番号６の原子にクラス番号Ｃ₆ ⁰ ＝４が
与えられて、入力番号１及び５の原子にクラス番号Ｃ₁
⁰ ＝Ｃ₅ ⁰ ＝５が与えられる。さらに、入力番号３の原
子にクラス番号Ｃ₃ ⁰ ＝６が与えられる（図１５（ａ）
参照）。このようにして各原子は６つのクラスに分類さ
れて、クラスの数Ｎ₀ は６となる。[0060] As a result, the numeric string of the atoms of input numbers 7 and 8 is minimized, the class number C ₇ to these atoms ⁰ = C
₈ ⁰ = 1 is given. Hereafter, the class number C ₂ ⁰ = 2 is given to the atom with the input number 2 in ascending order of the number string,
Class number C ₄ ⁰ = 3 is given to the atoms input number 4. Also, the class number C ₆ ⁰ = 4 is given to the atom of input number 6, class number C ₁ to atom input number 1 and 5
^⁰ = C ₅ ⁰ = ₅ is given. Further, the class number C ₃ ⁰ = 6 is given to the atoms input number 3 (Fig. 15 (a)
reference). In this way, each atom is classified into 6 classes, and the number of classes N ₀ is 6.

【００６１】次に、Ｓ１０３の処理が実行されて、次元
ｎを１にする。Next, the processing of S103 is executed to set the dimension n to 1.

【００６２】さらに、Ｓ１０４の処理が実行される。上
述したように、Ｓ１０４では各原子に属性Ｖ_ij ¹⁽ⁿ⁼¹⁾を
与えている。ここで、属性Ｖ_ij ⁿ は入力番号ｉの原子に
結合し、クラス番号がｊである原子の数である。即ち、
図１５（ａ）の分子構造図に基づいて説明すると、入力
番号１の原子に結合した原子の入力番号は２，６，８で
あり、これらの原子のクラス番号はＣ₂ ⁰ ＝２、Ｃ₆ ⁰
＝４、Ｃ₈ ⁰ ＝１である。その結果、ｊ＝１，２，４の
属性Ｖ_1j ¹ に１が書き込まれて、Ｖ_1j ¹ ＝（１，１，
０，１，０，０）が得られる。Further, the processing of S104 is executed. As described above, the attribute V _ij ^{1 (n = 1)} is given to each atom in S104. Here, the attribute V _ij ⁿ is the number of atoms having a class number j that is bonded to the atom with the input number i. That is,
Explaining based on the molecular structure diagram of FIG. 15 (a), the input numbers of atoms bonded to the atom of input number 1 are 2, 6 and 8, and the class numbers of these atoms are C ₂ ⁰ = 2, C ₆ ⁰
= 4 and C ₈ ⁰ = 1. As a result, 1 is written in the attribute V _1j ¹ for j = 1, 2, 4, and V _1j ¹ = (1,1,
0,1,0,0) is obtained.

【００６３】また、入力番号２の原子に結合した原子の
入力番号は１，３であり、これらの原子のクラス番号は
Ｃ₁ ⁰ ＝５、Ｃ₃ ⁰ ＝６である。その結果、ｊ＝５，６
の属性Ｖ_2j ¹ に１が書き込まれて、Ｖ_2j ¹ ＝（０，０，
０，０，１，１）が得られる。入力番号３〜８の原子に
ついても同様に処理することにより、Ｖ_3j ¹ ＝（０，
１，１，０，０，０）、Ｖ_4j ¹ ＝（０，０，０，０，
１，１）、Ｖ_5j ¹ ＝（１，０，１，１，０，０）、Ｖ_6j
¹ ＝（０，０，０，０，２，０）、Ｖ_7j ¹ ＝（０，０，
０，０，１，０）、Ｖ_8j ¹ ＝（０，０，０，０，１，
０）がそれぞれ得られる。The input numbers of the atoms bonded to the input number 2 are 1, 3 and the class numbers of these atoms are C ₁ ⁰ = 5 and C ₃ ⁰ = 6. As a result, j = 5,6
1 is written in the attribute V _2j ¹ of V _2j ¹ = (0,0,
0,0,1,1) is obtained. By similarly processing the atoms of input numbers 3 to 8, V _3j ¹ = (0,
^{1, 1,} 0, 0, 0), V _4j ¹ = (0, 0, 0, 0,
1, 1), V _5j ¹ = (1, 0, 1, 1, 0, 0), V _6j
¹ = (0,0,0,0,2,0), V _7j ¹ = (0,0,
0,0,1,0), V _8j ¹ = (0,0,0,0,1,
0) is obtained respectively.

【００６４】具体的には、属性Ｖ_ij ⁿ は図９（ａ）
（ｂ）に示す参照テーブル１２を用いて求められる。こ
の参照テーブル１２を用いた属性Ｖ_ij ¹ の抽出はｉ＝
１、ｉ＝２、…の順番で行われる。まず、属性Ｖ_1j ¹
（ｉ＝１）が抽出される。属性Ｖ_1j ¹（ｉ＝１）の抽出
は、参照テーブル１２の配列要素のうち、ｘ＝１又はｙ
＝１を満たす配列要素（図９（ａ）において斜線で示し
た配列要素）を探索して、データが書き込まれた配列要
素の添字Ｓ＝（１，２），（１，６），（１，８）を抽
出する。これらの添字Ｓからｉ＝１を除き、Ｓ＝２，
６，８を得る。このようにして得られたＳの値を０次の
クラス番号Ｃ_i ⁰ に代入して、Ｃ₂ ⁰ ＝２、Ｃ₆ ⁰＝
４、Ｃ₈ ⁰ ＝１を得る。そして、ｊ＝１，２，４の属性
Ｖ_1j ¹ に１を書き込むことにより、Ｖ_1j ¹ ＝（１，１，
０，１，０，０）が求められる。Specifically, the attribute V _ij ⁿ is as shown in FIG.
It is obtained using the reference table 12 shown in (b). The extraction of the attribute V _ij ¹ using this reference table 12 is i =
1, i = 2, ... First, the attribute V _1j ¹
(I = 1) is extracted. The extraction of the attribute V _1j ¹ (i = 1) is performed by _selecting x = 1 or y from the array elements of the reference table 12.
= 1 is searched for an array element (array element indicated by diagonal lines in FIG. 9A), and the subscript S = (1, 2), (1, 6), (1 , 8). Excluding i = 1 from these subscripts S, S = 2
Get 6,8. Substituting the value of S thus obtained for the 0th-order class number C _i ⁰ , C ₂ ⁰ = 2, C ₆ ⁰ =
4, to obtain a C ₈ ⁰ = 1. Then, by writing ¹ to the attribute V _1j ¹ for j = 1, 2, 4, V _1j ¹ = (1,1,
0,1,0,0) is required.

【００６５】次に、属性Ｖ_2j ¹ （ｉ＝２）が抽出され
る。属性Ｖ_2j ¹ （ｉ＝２）の抽出は、参照テーブル１２
の配列要素のうち、Ｘ＝２又はＹ＝２を満たす配列要素
（図９（ｂ）において斜線で示した配列要素）を探索し
て、データが書き込まれた配列要素の添字Ｓ＝（１，
２），（２，３）を抽出する。これらの添字Ｓからｉ＝
２を除き、Ｓ＝１，３を得る。このようにして得られた
Ｓの値を０次のクラス番号Ｃ_i ⁰ に代入して、Ｃ₁ ⁰ ＝
５、Ｃ₃ ⁰ ＝６を得る。そして、ｊ＝５，６の属性Ｖ_2j
¹ に１を書き込むことにより、Ｖ_2j ¹ ＝（０，０，０，
０，１，１）が求められる。Next, the attribute V _2j ¹ (i = 2) is extracted. The extraction of the attribute V _2j ¹ (i = 2) is performed in the reference table 12
Of the array elements of X = 2 or Y = 2 (array elements shaded in FIG. 9B) are searched, and the subscript S = (1,
2), (2, 3) are extracted. From these subscripts S i =
Except for 2, we get S = 1,3. Substituting the value of S thus obtained for the 0th-order class number C _i ⁰ , C ₁ ⁰ =
5, to obtain a C ₃ ⁰ = 6. Then, the attribute V _{2j with} j = 5 and 6
By writing ¹ to 1, V _2j ¹ = (0, 0, 0,
0,1,1) is required.

【００６６】ｉ＝３〜８についても同様に処理すること
により、図１６に示す属性Ｖ_ij ¹ （ｉ＝１〜８、ｊ＝１
〜６）が得られる。By performing similar processing for i = 3 to 8, the attribute V _ij ¹ (i = 1 to 8, j = 1 shown in FIG. 16 is obtained.
~ 6) is obtained.

【００６７】次に、Ｓ１０５の処理が実行される。上述
したように、Ｓ１０５では原子ごとに属性（Ｃ_i ^n-1 ，
Ｖ_ij ⁿ ）を並べて、この数字列が小さい順番にクラス番
号Ｃ_i ⁿ を与えて、各原子を複数のクラスに分類してい
る。Next, the processing of S105 is executed. As described above, in S105, the attribute (C _i ^n-1 ,
V _ij ⁿ ) are arranged and the class numbers C _i ⁿ are given in the ascending order of the number sequence to classify each atom into a plurality of classes.

【００６８】具体的には、入力番号１の原子の数字列は
“５１１０１００”であり、入力番号２の原子の数字列
は“２００００１１”である。以下順番に、“６０１１
０００”、“３００００１１”、“５１０１１００”、
“４００００２０”、“１００００１０”、“１０００
０１０”となる。Specifically, the number sequence of the atom with the input number 1 is "5110100", and the number sequence of the atom with the input number 2 is "2000011". In the following order, "6011
000 ”,“ 3000011 ”,“ 5101100 ”,
"4000020", "1000010", "1000"
010 ".

【００６９】その結果、入力番号７及び８の原子の数字
列が最小となり、これらの原子にクラス番号Ｃ₇ ¹ ＝Ｃ
₈ ¹ ＝１が与えられる。以下、数字列が小さい順に、入
力番号２の原子にクラス番号Ｃ₂ ¹ ＝２が与えられて、
入力番号４の原子にクラス番号Ｃ₄ ¹ ＝３が与えられ
る。さらに、入力番号６の原子にクラス番号Ｃ₆ ¹ ＝４
が与えられて、入力番号５の原子にクラス番号Ｃ₅ ¹ ＝
５が与えられる。さらにまた、入力番号１の原子にクラ
ス番号Ｃ₁ ¹ ＝６が与えられて、入力番号３の原子にク
ラス番号Ｃ₃ ¹ ＝７が与えられる。このようにして各原
子は７つのクラスに分類されて、クラスの数Ｎ₁ は７と
なる。As a result, the number string of the atoms with the input numbers 7 and 8 becomes the minimum, and the class number C ₇ ¹ = C is assigned to these atoms.
₈ ¹ = 1 is given. Hereafter, the class number C ₂ ¹ = 2 is given to the atom with the input number 2 in ascending order of the number string,
Class number C ₄ ¹ = 3 is given to the atoms input number 4. Further, the atom with the input number 6 has the class number C ₆ ¹ = 4.
Is given, the class number C ₅ ¹ =
5 is given. Furthermore, the atom with the input number 1 is given the class number C ₁ ¹ = 6, and the atom with the input number 3 is given the class number C ₃ ¹ = 7. In this way, each atom is classified into seven classes, and the number of classes N ₁ is 7.

【００７０】次に、Ｓ１０６の処理を実行して、クラス
の数Ｎ_n がＮ_(n-1) と等しいか調べて、等しい場合には
処理を終了する。また、クラスの数Ｎ_n が総原子数と等
しいか調べて、等しい場合には処理を終了する。ここで
は、クラスの数Ｎ₁ が７で、クラスの数Ｎ₀ が６なの
で、Ｎ₁ とＮ₀ とは等しくない。また、総原子数は８な
ので、クラスの数Ｎ₁ と総原子数とは等しくない。この
ようにいずれも等しくないので、Ｓ１０７の処理を実行
してｎを２とする。Next, the process of S106 is executed to check whether the number N _{n of} classes is equal to N _(n-1) . If they are equal, the process is terminated. Also, it is checked whether the number N _{n of} classes is equal to the total number of atoms, and if they are equal, the process is terminated. Here, the number N ₁ of classes is 7 and the number N _{0 of} classes is 6, so N ₁ and N ₀ are not equal. Moreover, since the total number of atoms is 8, the number of classes N ₁ is not equal to the total number of atoms. Thus, since they are not equal to each other, the process of S107 is executed to set n to 2.

【００７１】さらに、Ｓ１０４に戻って各原子に属性Ｖ
_ij ² を与える。その結果、図１７に示すように、Ｖ_1j ²
＝（１，１，０，１，０，０，０）、Ｖ_2j ² ＝（０，
０，０，０，０，１，１）、Ｖ_3j ² ＝（０，１，１，
０，０，０，０）、Ｖ_4j ² ＝（０，０，０，０，１，
０，１）、Ｖ_5j ² ＝（１，０，１，１，０，０，０）、
Ｖ_6j ² ＝（０，０，０，０，１，１，０）、Ｖ_7j ² ＝
（０，０，０，０，１，０，０）、Ｖ_8j ² ＝（０，０，
０，０，０，１，０）が得られる。Further, returning to S104, the attribute V is assigned to each atom.
give _ij ² . As a result, as shown in FIG. 17, V _1j ²
= (1,1,0,1,0,0,0), V _2j ² = (0,
0,0,0,0,1,1), V _3j ² = (0,1,1,
0,0,0,0), V _4j ² = (0,0,0,0,1,
0,1), V _5j ² = (1,0,1,1,0,0,0),
V _6j ² = (0,0,0,0,1,1,0), V _7j ² =
(0,0,0,0,1,0,0), V _8j ² = (0,0,
0,0,0,1,0) is obtained.

【００７２】そして、Ｓ１０５の処理を実行して、各原
子にクラス番号Ｃ_i ² を与える。その結果、図１５
（ｃ）に示すように、Ｃ₁ ² ＝７、Ｃ₂ ² ＝３、Ｃ₃ ²
＝８、Ｃ₄ ² ＝４、Ｃ₅ ² ＝６、Ｃ₆ ² ＝５、Ｃ₇ ² ＝
２、Ｃ₈ ² ＝１が得られる。このようにして各原子は８
つのクラスに分類されて、クラスの数Ｎ₂ は８となる。
クラスの数Ｎ₂ ＝８は総原子数と等しいので、Ｓ１０６
の判定によって処理を終了する。Then, the process of S105 is executed to give each atom a class number C _i ² . As a result, FIG.
As shown in (c), C ₁ ² = 7, C ₂ ² = 3, C ₃ ²
= 8, C ₄ ² = 4, C ₅ ² = 6, C ₆ ² = 5, C ₇ ² =
2, C ₈ ² = 1 is obtained. In this way each atom is 8
As a result of being classified into one class, the number N ₂ of classes becomes 8.
Since the number of classes N ₂ = 8 is equal to the total number of atoms, S106
The process is ended by the judgment of.

【００７３】次に、図１８のフローチャートを用いて、
図５のＳ２０で呼び出される正準化番号付与ルーチン１
０２の処理について説明する。ここで正準化番号とは、
化合物の構造によって一意的に定まる各原子の番号であ
る。即ち、分子構造図を手書き入力することによって与
えられる入力番号は、入力する順番が異なることによっ
て変わる任意的な番号である。これに対して正準化デー
タ３４は、化合物の構造にのみ依存した一意的なデータ
でなければならない。このため、任意的な入力番号から
一意的な正準化データ３４を直接作成することは困難で
ある。そこで、正準化データ作成プログラム２２では、
入力番号を一旦正準化番号に変換して、この一意的な正
準化番号に基づいて正準化データ３４を作成することに
より、円滑な正準化データ３４の作成を可能にしてい
る。Next, using the flowchart of FIG.
Canonical numbering routine 1 called in S20 of FIG.
The process of 02 will be described. Here, the canonicalization number is
It is the number of each atom that is uniquely determined by the structure of the compound. That is, the input number given by handwriting the molecular structure diagram is an arbitrary number that changes depending on the input order. On the other hand, the canonicalized data 34 must be unique data that depends only on the structure of the compound. Therefore, it is difficult to directly create the unique canonicalized data 34 from an arbitrary input number. Therefore, in the canonical data creation program 22,
The input number is once converted into a canonicalization number, and the canonicalized data 34 is created based on this unique canonicalized number, so that the canonicalized data 34 can be created smoothly.

【００７４】正準化番号付与ルーチン１０２の処理は、
まず、変数ｋに１を与える（Ｓ２０１）。次に、構成原
子分類ルーチン１０１で得られた最終クラス番号Ｃ_i ^f
を調べて、最大の原子に正準化番号ｋ（ここではｋ＝
１）を与える（Ｓ２０２）。クラス番号が最大の原子が
複数個ある場合には、これらの原子の中から任意の原子
を選び、この原子に正準化番号ｋを与える。そして、全
ての原子に正準化番号が与えられた後に処理を終了する
（Ｓ２０３）。The process of the canonicalization number assigning routine 102 is as follows.
First, 1 is given to the variable k (S201). Next, the final class number C _i ^f obtained by the constituent atom classification routine 101
And find the canonicalization number k (where k =
1) is given (S202). When there are a plurality of atoms having the largest class number, an arbitrary atom is selected from these atoms and a canonicalization number k is given to this atom. Then, after the canonicalization numbers are given to all the atoms, the processing is ended (S203).

【００７５】次に、変数ｋに１を加えて（Ｓ２０４）、
正準化番号の決まった原子（以下、既決原子という）の
中から、まだ正準化番号が決まっていない原子（以下、
未決原子という）が結合している既決原子を抽出する
（Ｓ２０５）。そして、抽出された既決原子が複数ある
か判定して（Ｓ２０６）、抽出された既決原子が複数あ
る場合には、これらの既決原子の中で正準化番号が最小
の既決原子を選択する（Ｓ２０７）。そして、選択され
た既決原子に結合している未決原子の中からクラス番号
Ｃ_i ^f が最大の未決原子を抽出して、この未決原子の正
準化番号をｋとする（Ｓ２０８）。なお、クラス番号Ｃ
_i ^f が最大の既決原子が複数ある場合には、これらの既
決原子の中から任意に選択する。Next, 1 is added to the variable k (S204),
Atoms whose canonical numbers have been determined (hereinafter referred to as “determined atoms”) have no canonical numbers yet (hereinafter,
The determined atoms to which the undetermined atoms are bound are extracted (S205). Then, it is determined whether or not there are a plurality of extracted fixed atoms (S206), and when there are a plurality of extracted fixed atoms, the fixed atom with the smallest canonicalization number is selected from among these fixed atoms ( S207). Then, the undecided atom having the largest class number C _i ^f is extracted from the undecided atoms bonded to the selected decided atom, and the canonicalization number of this undecided atom is set to k (S208). In addition, class number C
^If there are multiple decided atoms with the largest _i ^f , select among these decided atoms.

【００７６】また、Ｓ２０６で既決原子が１つであると
判定された場合には、この既決原子に結合している未決
原子の中からクラス番号Ｃ_i ^f が最大の未決原子を選択
して、この未決原子に正準化番号ｋを与える（Ｓ２０
９）。Ｓ２０８及びＳ２０９の処理が終了した後にＳ２
０３に処理を戻し、全ての原子に正準化番号が与えられ
るまで、Ｓ２０３〜Ｓ２０９のループを繰り返す。When it is determined in S206 that the number of the decided atom is one, the undecided atom having the largest class number C _i ^f is selected from the undecided atoms bonded to the decided atom, A canonicalization number k is given to this undetermined atom (S20
9). After the processing of S208 and S209 is completed, S2
The process is returned to 03, and the loop of S203 to S209 is repeated until the canonicalization numbers are given to all the atoms.

【００７７】次に、正準化番号付与ルーチン１０２の処
理について、３，５−ジメチル−２，３，４，５−テト
ラハイドロピリジンを用いた具体例を説明する。まず、
Ｓ２０１の処理で変数ｋに１を与えて、次にＳ２０２の
処理を行う。Ｓ２０２の処理では、入力番号３の原子が
Ｃ₃ ^f ＝８で最大なので、入力番号３の原子に正準化番
号ｋ＝１を与える。次に、Ｓ２０４の処理で変数ｋを２
にして、Ｓ２０５の処理で入力番号３の原子を既決原子
として抽出する。Next, with respect to the process of the canonicalization number assigning routine 102, a specific example using 3,5-dimethyl-2,3,4,5-tetrahydropyridine will be described. First,
1 is given to the variable k in the process of S201, and then the process of S202 is performed. In the process of S202, the atom with the input number 3 is C ₃ ^f = 8, which is the maximum, so the atom with the input number 3 is given the canonicalization number k = 1. Next, the variable k is set to 2 in the process of S204.
Then, the atom of input number 3 is extracted as a determined atom in the process of S205.

【００７８】このように抽出された既決原子は１つだけ
なので、次にＳ２０９の処理を行う。入力番号３の原子
に結合している未決原子は入力番号２，４の原子なの
で、これらの原子の中からクラス番号Ｃ_i ^f が最大の原
子を選択する。即ち、入力番号２の原子のクラス番号は
Ｃ₂ ^f ＝３で、入力番号４の原子のクラス番号はＣ₄ ^f
＝４である。このため、入力番号４の原子を選択して、
この原子に正準化番号ｋ＝２を与える。Since there is only one determined atom thus extracted, the processing of S209 is performed next. Since the undetermined atoms bound to the atom with the input number 3 are the atoms with the input numbers 2 and 4, the atom with the largest class number C _i ^f is selected from these atoms. That is, the class number of the atom of input number 2 is C ₂ ^f = 3, and the class number of the atom of input number 4 is C ₄ ^f
= 4. Therefore, select the atom with input number 4,
The canonicalization number k = 2 is given to this atom.

【００７９】次に、Ｓ２０４の処理に戻って変数ｋを３
にして、Ｓ２０５の処理で入力番号３，４の原子を既決
原子として抽出する。このように抽出された既決原子は
複数あるので、次にＳ２０７の処理を行い、抽出された
既決原子の中から正準化番号が最小の原子を選択する。
即ち、入力番号３の原子の正準化番号は１で、入力番号
４の原子の正準化番号は２である。このため、入力番号
３の原子を選択する。そして、Ｓ２０８の処理を行い、
入力番号３の原子に結合した入力番号２の原子に正準化
番号ｋ＝３を与える。Next, returning to the processing of S204, the variable k is set to 3
Then, in the process of S205, the atoms having the input numbers 3 and 4 are extracted as the determined atoms. Since there are a plurality of determined atoms thus extracted, the process of S207 is performed next, and the atom having the smallest canonicalization number is selected from the extracted determined atoms.
That is, the atom of input number 3 has a canonicalization number of 1, and the atom of input number 4 has a canonicalization number of 2. Therefore, the atom with the input number 3 is selected. Then, the process of S208 is performed,
The canonicalization number k = 3 is given to the atom of input number 2 which is bonded to the atom of input number 3.

【００８０】さらに、Ｓ２０４の処理に戻って変数ｋを
４にして、Ｓ２０５の処理で入力番号２，４の原子を既
決原子として抽出する。このように抽出された既決原子
は複数あるので、次にＳ２０７の処理を行い、抽出され
た既決原子の中から正準化番号が最小の原子を選択す
る。即ち、入力番号２の原子の正準化番号は３で、入力
番号４の原子の正準化番号は２である。このため、入力
番号４の原子を選択する。そして、Ｓ２０８の処理を行
い、入力番号４の原子に結合した入力番号５の原子に正
準化番号ｋ＝４を与える。Further, returning to the processing of S204, the variable k is set to 4, and the atoms of the input numbers 2 and 4 are extracted as the determined atoms in the processing of S205. Since there are a plurality of determined atoms thus extracted, the process of S207 is performed next, and the atom having the smallest canonicalization number is selected from the extracted determined atoms. That is, the atom of input number 2 has a canonicalization number of 3, and the atom of input number 4 has a canonicalization number of 2. Therefore, the atom with the input number 4 is selected. Then, the processing of S208 is performed, and the canonicalization number k = 4 is given to the atom of the input number 5 which is bonded to the atom of the input number 4.

【００８１】同様の処理を繰り返すことにより、入力番
号１の原子に正準化番号５が、入力番号６の原子に正準
化番号６がそれぞれ与えられる。また、入力番号７の原
子に正準化番号７が、入力番号８の原子に正準化番号８
がそれぞれ与えられる。By repeating the same process, the atom having the input number 1 is given the canonicalization number 5, and the atom having the input number 6 is given the canonicalization number 6. The atom with the input number 7 has the canonicalization number 7 and the atom with the input number 8 has the canonicalization number 8
Are given respectively.

【００８２】その後、Ｓ２０３の処理を行い、この段階
では全ての原子の正準化番号が求められているので処理
を終了する。その結果、図１９に示すような正準化番号
が得られる。After that, the process of S203 is performed, and since the canonicalization numbers of all the atoms have been obtained at this stage, the process ends. As a result, the canonicalization number as shown in FIG. 19 is obtained.

【００８３】本発明の構成原子分類ルーチン１０１と正
準化番号付与ルーチン１０２は、原子への番号付与（Ｓ
１０２，Ｓ１０５，Ｓ２０２，Ｓ２０８，Ｓ２０９）と
いう処理と、原子の選択（Ｓ２０２，Ｓ２０７，Ｓ２０
８，Ｓ２０９）という判断とを含んでいる。そして、番
号付与にあたって昇順で行うか降順で行うか、また原子
の選択にあたって数字の大小の大きいものを優先するか
小さいものを優先するかは、化合物の構造と一意的に対
応する正準化データを作成するという課題が達成される
範囲で、プログラム作成者の自由な選択である（負の数
学を用いるという選択を含めて）。The constituent atom classification routine 101 and the canonicalization number assignment routine 102 of the present invention assign numbers to atoms (S
102, S105, S202, S208, S209) and atom selection (S202, S207, S20).
8, S209). Whether or not the numbers are assigned in ascending or descending order, and in selecting atoms, the one with the larger or the smaller number is given priority, the canonical data uniquely corresponding to the structure of the compound. To the extent that the task of creating is achieved, it is a free choice of the programmer (including the choice of using negative mathematics).

【００８４】従って、本発明における「クラス番号の優
先順位が最高である」とは、上述の意味であって、必ず
しも数字の大きな方を選択することを意味しない。但
し、正準化番号を付与する過程で、本例のように既決原
子から順に右左右左・・・と交互に番号が付与されるア
ルゴリズムを採用するのが好ましい。Therefore, in the present invention, "the class number has the highest priority" has the above-mentioned meaning, and does not necessarily mean that the one with the larger number is selected. However, in the process of assigning the canonicalization number, it is preferable to employ an algorithm in which numbers are assigned alternately from right to left, left, ...

【００８５】さらに、本発明の第２の処理部又は第２の
ステップにおいて、正準化番号を１から各原子に昇順に
番号を付けるやり方は、等差数列的（｛ｐ＋ｑ（ｎ−
１）｜ｎ＝１，２，３・・・ｎ_max｝）であればよく、
「１」は基準となる数学（等差数列の初項ｐ）という意
味であリ、１そのものである必要はない。また、公差ｑ
も１でなくてもよい。Further, in the second processing unit or the second step of the present invention, the method of assigning the canonicalization number to each atom in ascending order from 1 is the arithmetic progression ({p + q (n-
1) | n = 1,2,3 ... n _max }),
"1" means the reference mathematics (the first term p of the arithmetic progression), and does not have to be 1 itself. Also, the tolerance q
Does not have to be 1.

【００８６】さらにまた、本発明の第２の処理部又は第
２のステップにおいて、正準化番号を初項ｐ（通常、総
原子数とされる）から降順（負の公差ｑ）に各原子に与
えることもできる。この場合には、正準化番号ｎの次に
は、既に正準化番号が与えられている原子で、且つ正準
化番号がまだ与えられていない原子が結合している原子
の中で、正準化番号が最大である原子を選び、その原子
に結合している原子で、且つ正準化番号がまだ与えられ
ていない原子の中で、クラス番号の優先順位が最高であ
る原子に次の正準化番号ｎ＋ｑ（ｑは負なので、実際に
はｎ−１のような値である）が与えられる。降順の場合
には、さらに、属性Ｐ_iの定義は、正準化番号ｉの原子
に結合し且つ正準化番号が最大の原子の正準化番号、と
改められるのがよい。Furthermore, in the second processing unit or the second step of the present invention, the canonicalization number is changed from the first term p (usually the total number of atoms) in descending order (negative tolerance q) to each atom. Can also be given to. In this case, next to the canonicalization number n, among the atoms to which atoms that have already been given canonicalization numbers and which have not yet been given canonicalization numbers are bonded, Select the atom with the highest canonicalization number and select the atom that has the highest class number priority among the atoms that are bonded to that atom and have not yet been given a canonicalization number. Canonicalization number n + q (since q is negative, it is actually a value like n−1). In the case of the descending order, the definition of the attribute P _i may be further amended to be the canonicalization number of the atom that has the largest canonicalization number and is connected to the atom of the canonicalization number i.

【００８７】次に、図２０のフローチャートを用いて、
Ｓ３０で呼び出される正準化データ作成ルーチン１０３
の処理について説明する。この処理は、まず、図２１に
示すように入力番号を正準化番号に置き換えて、結合表
３２を書き替える（Ｓ３０１）。そして、この結合表３
２に基づいて、各原子に対して３種類のデータ（Ｐ_i，
Ｔ_i ，Ｓ_i ）を求める（Ｓ３０２）。ここで、Ｐ_i は正
準化番号ｉ（ｉ＞１）の原子に結合して、番号が最小で
ある原子の正準化番号である。また、Ｔ_i は正準化番号
ｉ（ｉ＞１）の原子と正準化番号Ｐ_i の原子との結合の
種類記号（この例では単結合は一、二重結合は＝、三重
結合は＃、芳香結合は％など）である。さらに、Ｓ_i は
正準化番号ｉ（ｉ＞０）の原子の種類記号（この例では
元素記号）である。Next, using the flowchart of FIG. 20,
Canonicalized data creation routine 103 called in S30
Will be described. In this process, first, as shown in FIG. 21, the input number is replaced with the canonicalized number, and the connection table 32 is rewritten (S301). And this combination table 3
Based on 2, three types of data (P _i ,
T _i , S _i ) is calculated (S302). Here, P _i is the canonicalization number of the atom having the smallest number by being bonded to the atom having the canonicalization number i (i> 1). Further, T _i is a kind symbol of a bond between an atom having a canonicalization number i (i> 1) and an atom having a canonicalization number P _i (in this example, one single bond, a double bond =, and a triple bond is #, Aroma bond is%, etc.). Further, S _i is a kind symbol (element symbol in this example) of the atom having the canonicalization number i (i> 0).

【００８８】具体的には、まず、正準化番号１の原子の
元素記号について、原子テーブル３２ａを参照して調べ
る。その結果、Ｓ₁ ＝“Ｎ”が得られる。次に、正準化
番号２の原子と結合した原子について、原子対テーブル
３２ｂを参照して調べる。その結果、正準化番号１，４
の原子が得られる。これらの原子の中で最小の正準化番
号は１なので、Ｐ₂ ＝１となる。そして、正準化番号２
の原子と正準化番号１の原子との結合は単結合なので、
Ｔ₂ ＝“−”となる。さらに、原子テーブル３２ａを参
照することにより、Ｓ₂ ＝“Ｃ”が得られる。Specifically, first, the element symbol of the atom of canonicalization number 1 is examined with reference to the atom table 32a. As a result, S ₁ = “N” is obtained. Next, the atom bonded to the atom having the canonicalization number 2 is examined with reference to the atom pair table 32b. As a result, canonicalization numbers 1, 4
To obtain the atom. Since the smallest canonicalization number is 1 among these atoms, P ₂ = 1. And canonicalization number 2
Since the bond between the atom of and the atom of canonical number 1 is a single bond,
T ₂ = “−”. Further, S ₂ = “C” is obtained by referring to the atom table 32a.

【００８９】次に、正準化番号３の原子と結合した原子
について、原子対テーブル３２ｂを参照して調べる。そ
の結果、正準化番号１，５の原子が得られる。これらの
原子の中で最小の正準化番号は１なので、Ｐ₃ ＝１とな
る。そして、正準化番号３の原子と正準化番号１の原子
との結合は二重結合なので、Ｔ₃ ＝“＝”となる。さら
に、原子テーブル３２ａを参照することにより、Ｓ₃ ＝
“Ｃ”が得られる。以下同様に処理を行うことにより、
Ｐ₄ ＝２、Ｐ₅ ＝３、Ｐ₆ ＝４、Ｐ₇ ＝４、Ｐ₈ ＝５、
Ｔ₄ 〜Ｔ₈ ＝“−”、Ｓ₄ 〜Ｓ₈ ＝“Ｃ”が得られる。Next, the atom bonded to the atom of canonicalization number 3 is examined with reference to the atom pair table 32b. As a result, atoms having canonicalization numbers 1 and 5 are obtained. Since the smallest canonicalization number is 1 among these atoms, P ₃ = 1. Then, since the bond between the atom having the canonicalization number 3 and the atom having the canonicalization number 1 is a double bond, T ₃ = “=”. Further, by referring to the atom table 32a, S ₃ =
"C" is obtained. By performing the same process below,
P ₄ = 2, P ₅ = 3, P ₆ = 4, P ₇ = 4, P ₈ = 5,
_{_{T 4 ~T 8 = "-"}} , S 4 ~S 8 = "C" is obtained.

【００９０】次に、Ｓ３０２の処理でＴ_i を求めた際に
参照されなかった結合原子対を抽出する（Ｓ３０３）。
この処理は原子対テーブル３２ｂを参照して行う。その
結果、正準化番号５の原子と正準化番号６の原子との結
合原子対が抽出される。そして、抽出された結合原子対
に対して３種類のデータ（Ｒ¹ _j ，Ｒ² _j ，Ｈ_j ）を求
める（Ｓ３０４）。ここで、Ｒ¹ _j ，Ｒ² _j はその結合
を構成する２つの原子の正準化番号である。また、Ｈ_j
はその結合の種類記号（この例ではＴ_i と同じものを用
いる）である。なお、Ｒ¹ _j とＲ² _j とは、Ｒ¹ _j ＞Ｒ
² _j の関係を満たすものとする。また、別の結合原子対
（Ｒ¹ _k ，Ｒ² _k ）とは、Ｒ¹ _j ≦Ｒ¹ _k の関係を満た
すか、Ｒ¹ _j ＝Ｒ¹ _k で且つＲ² _j ＜Ｒ² _k の関係を満
たすものとする。Next, a bond atom pair not referred to when T _i is obtained in the process of S302 is extracted (S303).
This processing is performed with reference to the atom pair table 32b. As a result, the bond atom pair of the atom of canonicalization number 5 and the atom of canonicalization number 6 is extracted. Then, three types of data (R ¹ _j , R ² _j , H _j ) are obtained for the extracted bond atom pairs (S304). Here, R ¹ _j and R ² _j are the canonicalization numbers of the two atoms forming the bond. Also, H _j
Is the type symbol of the bond (in this example the same as T _i is used). R ¹ _j and R ² _j are R ¹ _j > R
The relationship of ² _j is satisfied. Further, with another bond atom pair (R ¹ _k , R ² _k ), the relation of R ¹ _j ≦ R ¹ _k is satisfied, or the relation of R ¹ _j = R ¹ _k and R ² _j <R ² _k is satisfied. Shall be satisfied.

【００９１】以上の処理によって、図２２に示す正準化
木構造データが作成できた。By the above processing, the canonicalized tree structure data shown in FIG. 22 can be created.

【００９２】次に、Ｓ３０２及びＳ３０４の処理で求め
た各データを一列に並べて、正準化データを作成する
（Ｓ３０５）。即ち、原子の種類記号及び結合の種類記
号と異なる区切り記号Ｆを定義して、Ｓ３０２及びＳ３
０４の処理で求めた各データを以下のように並べる。Ｓ₁ 、Ｐ₂ 、Ｔ₂ 、Ｓ₂ 、Ｐ₃ 、Ｔ₃ 、Ｓ₃ 、Ｐ₄ 、Ｔ
₄ 、Ｓ₄ 、……、Ｐ_N、Ｔ_N 、Ｓ_N 、Ｆ、Ｒ¹ ₁ 、Ｈ
₁ 、Ｒ² ₁ 、Ｆ、Ｒ¹ ₂ 、Ｈ₂ 、Ｒ² ₂ 、……、Ｆ、Ｒ
¹ _M 、Ｈ_M 、Ｒ² _M 、Ｆここで、Ｎは総原子数であり、ＭはＳ３０４の抽出され
た結合原子対の総数である。Next, the data obtained in the processing of S302 and S304 are arranged in a line to create canonicalized data (S305). That is, a delimiter F different from the atom type symbol and the bond type symbol is defined, and S302 and S3 are defined.
The respective data obtained in the processing of 04 are arranged as follows. S ₁ , P ₂ , T ₂ , S ₂ , P ₃ , T ₃ , S ₃ , P ₄ , T
₄ , S ₄ , ..., P _N , T _N , S _N , F, R ¹¹ ₁ , H
_{^{_{1, R 2 1, F,}}} R 1 2, H 2, R 2 2, ......, F, R
¹ _M , H _M , R ² _M , F where N is the total number of atoms and M is the total number of extracted bond atom pairs in S304.

【００９３】このようにして得られたデータ列は、化合
物の構造と一意的に対応する正準化データである。具体
的には、区切り記号Ｆを“／”として、得られたデータ
を所定の順番に並べると、 “Ｎ１＝Ｃ１＝Ｃ２−Ｃ３−Ｃ４−Ｃ４−Ｃ５−Ｃ／５
−６／” が得られる。そして、この正準化データは化合物情報フ
ァイル３３に書き込まれて保存される（Ｓ３０６）。そ
の後、処理は終了する。The data string thus obtained is the canonical data uniquely corresponding to the structure of the compound. Specifically, if the delimiter F is "/" and the obtained data are arranged in a predetermined order, "N1 = C1 = C2-C3-C4-C4-C5-C / 5" is obtained.
-6 / "is obtained. Then, the canonical data is written and stored in the compound information file 33 (S306). After that, the process ends.

【００９４】なお、本発明は上記実施形態に限定される
ことなく、本発明の趣旨から逸脱しない範囲内におい
て、例えば以下のように変更することも可能である。The present invention is not limited to the above-described embodiment, but may be modified as follows, for example, without departing from the spirit of the present invention.

【００９５】（１）上記実施形態では、正準化データと
して原子の種類記号Ｓ_i を含めたデータ列を用いている
が、最も出現頻度が高い原子の種類記号（通常は炭素の
Ｃ）をデータ列から除いてもよい。即ち、上述した正準
化データから炭素Ｃの記号を省略することにより、 “Ｎ１−１＝２−３−４−４−５−／５−６／” が得られる。このようにしてデータ列を短くすることに
より、化合物情報ファイル３３に書き込まれるデータ量
を削減することができる。(1) In the above embodiment, the data string including the atom type symbol S _i is used as the canonical data, but the atom type symbol with the highest frequency of occurrence (usually carbon C) is used. It may be excluded from the data string. That is, "N1-1 = 2-3-4-4-5- / 5-6 /" is obtained by omitting the carbon C symbol from the above canonicalized data. By thus shortening the data string, the amount of data written in the compound information file 33 can be reduced.

【００９６】（２）正準化番号付与ルーチン１０２で
は、Ｓ２０９の処理でクラス番号Ｃ_i ^fが最大の未決原
子が複数選択された場合に、次の処理を追加してもよ
い。(2) In the canonicalization number assignment routine 102, the following process may be added when a plurality of undecided atoms having the largest class number C _i ^f are selected in the process of S209.

【００９７】（ａ）クラス番号Ｃ_i ^f が最大の未決原子
が環状構造部分に属していない場合には、複数の未決原
子の中から任意の未決原子を選択して、この未決原子の
正準化番号をｋとする。その後に処理をＳ２０３に戻
す。(A) When the undecided atom with the largest class number C _i ^f does not belong to the cyclic structure part, an arbitrary undecided atom is selected from a plurality of undecided atoms, and the canonical atom of this undecided atom is selected. Let the chemical number be k. After that, the process returns to S203.

【００９８】（ｂ）クラス番号Ｃ_i ^f が最大の未決原子
が環状構造部分に属している場合には、Ｓ２０９で選択
された未決原子（以下、候補原子という）とこれらの候
補原子に結合した既決原子との結合を切断した構造につ
いて、各候補原子に対して次のベクトル量を定義する。(B) When the undecided atom with the largest class number C _i ^f belongs to the cyclic structure portion, the undecided atom selected in S209 (hereinafter referred to as the candidate atom) is bonded to these candidate atoms. The following vector quantity is defined for each candidate atom in the structure in which the bond with the determined atom is broken.

【００９９】ｍ_ik：候補原子ｉと、正準化番号ｋである
原子間の最小結合数予め、この属性について優先順位を定めておき、最も優
先順位の高い原子ｉを選択して、その原子の正準化番号
をｋとする。その後に処理をＳ２０３に戻す。M _ik : the minimum number of bonds between the candidate atom i and the atom with the canonicalization number k. Priorities are set in advance for this attribute, the atom i with the highest priority is selected, and that atom is selected. Let the canonicalization number of k be k. After that, the process returns to S203.

【０１００】ここで、原子の属性値に基づく原子の選択
基準を例示する。まず、スカラー量についてはその大小
による。また、ベクトル量については２つのベクトル
ｉ，ｋの要素がＶ_ij，Ｖ_kjのとき、Ｖ_ij≠Ｖ_kjである要
素の中で最小のｊにおける大小を優先順位の判定基準と
する。このような判定基準を用いることにより、属性ｂ
_ij，ｄ_ij，Ｖ_ij ⁿ ，ｍ_ijの優先順位を定めることができ
る。また複数の属性によって優先順位が定まる場合に
は、予め、属性間にも優先順位を定めておき、優先順位
の高い属性での判定を優先するのが好ましい。Here, the selection criteria of atoms based on the attribute values of the atoms will be exemplified. First, the amount of scalar depends on its size. Regarding the vector amount, when the elements of the two vectors i and k are V _ij and V _kj , the size of the smallest j among the elements of V _ij ≠ V _kj is used as the criterion for determining the priority order. By using such a criterion, the attribute b
_{_{_{^{ij, d ij, V ij n}}}} , it is possible to prioritize the m _ij. Further, when the priority order is determined by a plurality of attributes, it is preferable to set the priority order among the attributes in advance and give priority to the determination with the attribute having the higher priority order.

【０１０１】なお、上記の本発明にかかる正準化データ
作成方法によって図２３（ａ）に示すＣ₆₀分子の正準化
データを求めたところ、Ｃ₆₀分子の構造を一意的に特定
する正準化データ（図２３（ｂ））を僅か１．５秒で得
ることができた。これに対して、原子を等価原子毎に分
類する処理過程を経ていないモルガン・アルゴリズムに
よって、同一性能の情報処理装置を用いてＣ₆₀分子の正
準化データを求めたところ、正準化データを得るのに５
５０秒を要した。従って、本発明において上記本発明に
かかる正準化データ作成手段及び方法を採用すれば、本
発明にかかる情報処理の速度が大幅に向上する。When the canonical data of the C ₆₀ molecule shown in FIG. 23 (a) was obtained by the above canonical data creating method according to the present invention, the canonical data uniquely identifying the structure of the C ₆₀ molecule was obtained. The normalized data (FIG. 23 (b)) could be obtained in only 1.5 seconds. On the other hand, when the canonical data of the C ₆₀ molecule was obtained by the Morgan algorithm which did not go through the process of classifying the atoms into equivalent atoms, the canonical data of the C ₆₀ molecule was obtained. 5 to get
It took 50 seconds. Therefore, if the canonical data creating means and method according to the present invention are adopted in the present invention, the speed of information processing according to the present invention is significantly improved.

【０１０２】[0102]

【発明の効果】以上詳細に説明したように、本発明の正
準化データ作成装置であれば、入力手段で受け付けられ
た各原子についての固有データ及び原子間の結合対デー
タは正準化データ作成手段に与えられる。そして、正準
化データ作成手段では、これらのデータに基づいて正準
化データが作成される。また、本発明の正準化データ作
成方法であれば、化合物を構成する各原子についての固
有データ及び原子間の結合対データに基づいて正準化デ
ータが作成される。As described in detail above, in the canonical data generation device of the present invention, the unique data for each atom and the bond pair data between atoms accepted by the input means are canonical data. Given to the means of creation. Then, the canonicalized data creating means creates canonicalized data based on these data. Further, according to the canonical data creation method of the present invention, the canonical data is created based on the unique data for each atom constituting the compound and the bond pair data between the atoms.

【０１０３】このように、本発明の正準化データ作成装
置及び正準化データ作成方法で作成された正準化データ
は、非常に短い文字・数字・記号列であり、少ない記憶
領域で正準化データを保存することができる。このた
め、本発明の正準化データ作成装置及び正準化データ作
成方法を化合物／反応データベースシステムで利用すれ
ば、化合物／反応データベースシステムの記憶領域の使
用量を大幅に削減させることができる。As described above, the canonicalized data created by the canonicalized data creating apparatus and the canonicalized data creating method of the present invention is a very short string of characters, numbers and symbols, and can be created in a small storage area. Normalized data can be saved. Therefore, if the canonical data creating device and the canonical data creating method of the present invention are used in a compound / reaction database system, the amount of storage area used in the compound / reaction database system can be significantly reduced.

[Brief description of drawings]

【図１】正準化データ作成装置の構成を示すブロック図
である。FIG. 1 is a block diagram showing a configuration of a canonical data creation device.

【図２】（ａ）は、結合表の原子テーブルの内容を示す
図である。（ｂ）は、結合表の原子対テーブルの内容を
示す図である。FIG. 2A is a diagram showing the contents of an atom table of a binding table. (B) is a figure which shows the content of the atom pair table of a connection table.

【図３】化合物情報ファイルのデータ内容を示す図であ
る。FIG. 3 is a diagram showing data contents of a compound information file.

【図４】正準化データ作成装置の動作の概要を示す概略
図である。FIG. 4 is a schematic diagram showing an outline of the operation of the canonical data creation device.

【図５】メインルーチンの処理の概要を示すフローチャ
ートである。FIG. 5 is a flowchart showing an outline of processing of a main routine.

【図６】構成原子分類ルーチンの処理の概要を示すフロ
ーチャートである。FIG. 6 is a flowchart showing an outline of processing of a constituent atom classification routine.

【図７】（ａ）は、結合表の原子テーブルの内容を示す
図である。（ｂ）は、結合表の原子対テーブルの内容を
示す図である。FIG. 7A is a diagram showing the contents of an atom table of a binding table. (B) is a figure which shows the content of the atom pair table of a connection table.

【図８】３，５−ジメチル−２，３，４，５−テトラハ
イドロピリジンを構成する各原子と入力番号との関係を
示す図である。FIG. 8 is a diagram showing a relationship between each atom constituting 3,5-dimethyl-2,3,4,5-tetrahydropyridine and an input number.

【図９】（ａ）（ｂ）は、参照テーブルのデータ内容を
示す図である。9A and 9B are diagrams showing data contents of a reference table.

【図１０】３，５−ジメチル−２，３，４，５−テトラ
ハイドロピリジンを構成する各原子に与えられた３種類
の属性（ａ_i ，ｂ_ij，ｄ_ij）を示す図である。FIG. 10 is a diagram showing three types of attributes (a _i , b _ij , d _ij ) given to each atom constituting 3,5-dimethyl-2,3,4,5-tetrahydropyridine.

【図１１】（ａ）（ｂ）は、参照テーブルのデータ内容
を示す図である。11A and 11B are views showing data contents of a reference table.

【図１２】（ｃ）は、参照テーブルのデータ内容を示す
図である。FIG. 12C is a diagram showing data contents of a reference table.

【図１３】（ａ）（ｂ）は、参照テーブルのデータ内容
を示す図である。13A and 13B are diagrams showing data contents of a reference table.

【図１４】（ｃ）（ｄ）は、参照テーブルのデータ内容
を示す図である。14 (c) and (d) are views showing data contents of a reference table.

【図１５】（ａ）（ｂ）（ｃ）は、３，５−ジメチル−
２，３，４，５−テトラハイドロピリジンを構成する各
原子とクラス番号との関係を示す図である。15 (a), (b) and (c) are 3,5-dimethyl-.
It is a figure which shows the relationship between each atom which comprises 2,3,4,5-tetrahydropyridine, and a class number.

【図１６】３，５−ジメチル−２，３，４，５−テトラ
ハイドロピリジンを構成する各原子に与えられた属性Ｖ
_ij ¹ を示す図である。FIG. 16 is an attribute V assigned to each atom constituting 3,5-dimethyl-2,3,4,5-tetrahydropyridine.
It is a figure which shows _ij ¹ .

【図１７】３，５−ジメチル−２，３，４，５−テトラ
ハイドロピリジンを構成する各原子に与えられた属性Ｖ
_ij ² を示す図である。FIG. 17 is an attribute V assigned to each atom constituting 3,5-dimethyl-2,3,4,5-tetrahydropyridine.
It is a figure which shows _ij ² .

【図１８】正準化番号付与ルーチンの処理の概要を示す
フローチャートである。FIG. 18 is a flowchart showing an outline of processing of a canonicalization number assignment routine.

【図１９】３，５−ジメチル−２，３，４，５−テトラ
ハイドロピリジンを構成する各原子と正準化番号との関
係を示す図である。FIG. 19 is a diagram showing a relationship between each atom constituting 3,5-dimethyl-2,3,4,5-tetrahydropyridine and a canonicalization number.

【図２０】正準化データ作成ルーチンの処理の概要を示
すフローチャートである。FIG. 20 is a flowchart showing an outline of processing of a canonicalized data creation routine.

【図２１】（ａ）は、結合表の原子テーブルの内容を示
す図である。（ｂ）は、結合表の原子対テーブルの内容
を示す図である。FIG. 21 (a) is a diagram showing the contents of an atom table of a binding table. (B) is a figure which shows the content of the atom pair table of a connection table.

【図２２】正準化木構造データのデータ内容を示す図で
ある。FIG. 22 is a diagram showing the data content of canonicalized tree structure data.

【図２３】（ａ）は、Ｃ₆₀分子の分子構造を示す図であ
る。（ｂ）は、Ｃ₆₀分子の正準化データを示す図であ
る。FIG. 23 (a) is a diagram showing a molecular structure of a C ₆₀ molecule. (B) is a diagram showing a canonical data C ₆₀ molecules.

[Explanation of symbols]

１…正準化データ作成装置、１０…画像メモリ、１１…
作業用メモリ、２０…主記憶装置、２１…オペレーティ
ングシステム、２２…正準化データ作成プログラム、３
０…ハードディスク装置、３１…結合表ファイル、３２
…結合表、３３…化合物情報ファイル、３４…正準化デ
ータ、４０…ディスプレイ、５０…マウス、６０…キー
ボード、７０…プリンタ、８０…ＣＰＵ、１００…メイ
ンルーチン、１０１…構成原子分類ルーチン（第１の処
理部）、１０２…正準化番号付与ルーチン（第２の処理
部）、１０３…正準化データ作成ルーチン（第３の処理
部）、Ａ…入力手段、Ｂ…正準化データ作成手段。1 ... Canonicalized data creating device, 10 ... Image memory, 11 ...
Working memory, 20 ... Main storage device, 21 ... Operating system, 22 ... Canonicalized data creation program, 3
0 ... Hard disk device, 31 ... Binding table file, 32
... bond table, 33 ... compound information file, 34 ... canonical data, 40 ... display, 50 ... mouse, 60 ... keyboard, 70 ... printer, 80 ... CPU, 100 ... main routine, 101 ... constituent atom classification routine (first 1 processing unit), 102 ... Canonical number assigning routine (second processing unit), 103 ... Canonical data creation routine (3rd processing unit), A ... Input means, B ... Canonical data creation means.

Claims

[Claims]

1. A chemical structure of a compound is uniquely defined based on input means for accepting input of unique data for each atom constituting a compound and bond pair data between atoms, and each data accepted by the input means. A canonical data creating device having canonical data creating means for creating canonically specified canonical data, wherein the canonical data creating means is a class for each atom of different equivalent atoms. A first processing unit that gives each atom a different class number for each class, and a unique structure of the compound based on the class number given to each atom in the first processing unit. A second processing unit that gives a corresponding canonicalization number to each atom, and a third processing unit that creates the canonicalization data based on the canonicalization number given to each atom by the second processing unit. And a processing unit. Canonical data creating apparatus.

2. The first processing unit gives each atom three kinds of attributes (a _i , b _ij , d _ij ),
Utilizing the fact that it is possible to determine that atoms having different even one of these attributes are not equivalent, each atom is given a different class number for each equivalent atom, and the three types of attributes (a _i , b _ij , d _ij ), a _i is the symbol of the atom of the input number i, and b _ij is the input number i
Is the number of bonds whose type symbol is j among the bonds adjacent to the atom, and d _ij is the number of routes that can pass from the atom with the input number i through j bonds by the shortest path, The second processing unit assigns the canonicalization number 1 to the atom having the highest priority of the class number in the process of assigning the canonicalization number to each atom in ascending order from 1, and thereafter, up to the canonicalization number n. Is given, the canonicalization number is the smallest among the atoms to which atoms that have already been given canonicalization numbers and which have not yet been given canonicalization numbers are bound. Select an atom, and give the canonicalization number n + 1 to the atom that has the highest priority of the class number among the atoms that are bonded to the atom and have not yet been given the canonicalization number. The third processing unit includes three types of attributes (P _i , T _i , S _i ),
The canonicalized data is created by arranging these attributes in a line. Among the three types of attributes (P _i , T _i , S _i ), P _i is an atom with a canonicalization number i. Is the canonicalization number of the atom that is bonded and has the smallest canonicalization number, T _i is the type symbol of the bond between the atom of canonicalization number i and the atom of canonicalization number P _i , and S _i is The canonical data creation device according to claim 1, wherein the canonical code is the atom type symbol of the canonicalization number i.

3. A canonical data creation method for creating canonical data capable of uniquely identifying the chemical structure of the compound based on the unique data for each atom constituting the compound and the bond pair data between the atoms. Where each atom is classified into a different class for each equivalent atom, and a first step of giving each atom a different class number for each class, and a class number given to each atom in the first step The second step of giving each atom a canonicalization number uniquely corresponding to the structure of the compound, and the canonical number given to each atom in the second step, And a third step of creating canonical data.

4. The first step is to give each atom three kinds of attributes (a _i , b _ij , d _ij ),
Utilizing the fact that it is possible to determine that atoms having different even one of these attributes are not equivalent, each atom is given a different class number for each equivalent atom, and the three types of attributes (a _i , b _ij , d _ij ), a _i is the symbol of the atom of the input number i, and b _ij is the input number i
Is the number of bonds whose type symbol is j among the bonds adjacent to the atom, and d _ij is the number of routes that can pass from the atom with the input number i through j bonds by the shortest path, In the second step, in the process of giving canonical numbers to atoms in ascending order from 1, the atom having the highest priority of the class number is given the canonical number 1, and thereafter the canonical numbers n are assigned. When attached, the atom that has already been given a canonicalization number and has the smallest canonicalization number among the atoms to which atoms that have not yet been given a canonicalization number are bound. And assigns the canonicalization number n + 1 to the atom that has the highest priority of the class number among the atoms that are bonded to the atom and have not yet been given the canonicalization number. In the third step, each atom has three kinds of attributes (P _i , T _i , S _i ),
The canonicalized data is created by arranging these attributes in a line. Among the three types of attributes (P _i , T _i , S _i ), P _i is an atom with a canonicalization number i. Is the canonicalization number of the atom that is bonded and has the smallest canonicalization number, T _i is the type symbol of the bond between the atom of canonicalization number i and the atom of canonicalization number P _i , and S _i is The canonical data creation method according to claim 3, wherein the atom type symbol of the canonicalization number i is used.