JPH0863350A

JPH0863350A - Program analysis method and device therefor

Info

Publication number: JPH0863350A
Application number: JP19808694A
Authority: JP
Inventors: Toshihiko Oda; 利彦小田
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1994-08-23
Filing date: 1994-08-23
Publication date: 1996-03-08

Abstract

PURPOSE: To easily support an analyzer in his understanding of the program contents by acquiring the concept terms from the names of program identifiers. CONSTITUTION: The concept terms are acquired from the names of program identifiers and a concept term dictionary is prepared, and the similarity is measured among these concept terms and a concept network is prepared as an image that shows the relation among the concept terms. Then a cluster analysis is applied to a routine based on the degree of distribution of concept terms so that an analyzer can fix the module constitution of a program. Furthermore the analyzer can construct an information model of the program and therefore the concept terms can deal with the objects, attributes and links. Thus the concept terms can be acquired from the identifier names reflecting the program contents as the information on the abstract levels that show the program contents.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、プログラムから抽象レ
ベルの情報を抽出するプログラム解析方法及び装置に関
する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a program analysis method and apparatus for extracting abstract level information from a program.

【０００２】[0002]

【従来の技術】一般的に、プログラムの安全な保守作業
や、プログラムから再利用可能なソフトウェア部品を探
すドメイン分析や、プログラムの修正などを良好に実現
するためには、プログラムの目的や仕様や設計や動作な
どを理解する必要がある。2. Description of the Related Art Generally, in order to perform a safe maintenance work of a program, a domain analysis for finding a reusable software component from the program, and a modification of the program in a good manner, the purpose and specifications of the program and It is necessary to understand the design and operation.

【０００３】そして、このようにプログラムを理解する
ためには、プログラム開発時に作成された仕様や設計に
関するドキュメントを参照することが一般的である。し
かし、ドキュメントの記載内容が十分でなかったり、プ
ログラムの変更に対応してドキュメントが更新されてい
ないと、ドキュメントからプログラムを良好に理解する
ことは困難である。In order to understand the program as described above, it is general to refer to the document concerning the specifications and design created during the program development. However, it is difficult to understand the program well from the document unless the description of the document is sufficient or the document is not updated in response to the change of the program.

【０００４】このような場合、プログラムの分析者は、
プログラムを直接解析して理解することになるが、これ
は作業が煩雑で能率が悪い。In such a case, the program analyst
You will need to analyze the program directly to understand it, but this is cumbersome and inefficient.

【０００５】このような課題の解決を目的としたプログ
ラム解析装置としては、ソースコードを解析してプログ
ラムの理解を支援するＤＥＳＩＲＥシステムがある。こ
のＤＥＳＩＲＥシステムは、ドメインモデルと名前付け
との情報を保有しており、プログラムのコメントやプロ
グラムの名前付けなどのインフォーマルな情報を利用し
て、近似的探索やファジー探索で自身のドメインモデル
と名前付けとのマッピングを行なう。このようにして、
プログラムのドメインを同定し、プログラムから抽象概
念を探しだす。As a program analysis device for solving such a problem, there is a DESIRE system which analyzes a source code and supports the understanding of the program. This DESIRE system holds information about domain models and naming, and uses informal information such as program comments and program naming to establish its own domain model through approximate search and fuzzy search. Mapping with naming. In this way,
Identify the domain of the program and search the program for abstractions.

【０００６】[0006]

【発明が解決しようとする課題】上述したＤＥＳＩＲＥ
システムは、保有しているドメインモデルと名前付けと
のマッピングでプログラムからドメインを同定するよう
になっているが、その効果は十分ではなく、他の手法が
要望されている。DISCLOSURE OF THE INVENTION Problems to be Solved by the Invention
The system is designed to identify the domain from the program by mapping the domain model it has and the naming, but the effect is not sufficient, and another method is required.

【０００７】[0007]

【課題を解決するための手段】請求項１記載の発明は、
変数やルーチンや型やマクロなどの識別子が設定されて
モジュールの階層構造で形成されたプログラムから、そ
の内容理解のための情報を獲得するプログラム解析方法
において、プログラムの識別子の名前から概念用語を獲
得するようにした。According to the first aspect of the present invention,
In a program analysis method that acquires information for understanding the contents of a program that has a hierarchical structure of modules with variables, routines, types, macros, and other identifiers set, acquires a conceptual term from the name of the program identifier. I decided to do it.

【０００８】請求項２記載の発明は、請求項１記載の発
明において、プログラムの識別子の名前に多用される省
略表現が本来の表現と共に事前に設定された概念用語辞
書を設け、この概念用語辞書を参照してプログラムの識
別子の名前から省略表現を検出し、この検出した省略表
現を本来の表現に復元して概念用語とするようにした。According to a second aspect of the present invention, in the first aspect of the invention, a conceptual term dictionary in which abbreviated expressions frequently used for the names of program identifiers are preset together with the original expressions is provided, and the conceptual term dictionary is provided. , The abbreviation is detected from the name of the program identifier, and the detected abbreviation is restored to the original expression to be used as a conceptual term.

【０００９】請求項３記載の発明は、請求項１記載の発
明において、変数の名前から取り出した字句の並びを名
詞的な概念用語とし、ルーチンの名前から取り出した字
句の並びを名詞的な概念用語と動詞的な概念用語との組
とするようにした。According to a third aspect of the present invention, in the first aspect of the present invention, the sequence of lexical phrases extracted from the name of the variable is a noun concept term, and the lexical sequence extracted from the name of the routine is a noun conceptual concept. I tried to use a set of terms and verb-like conceptual terms.

【００１０】請求項４記載の発明は、請求項３記載の発
明において、名詞的な概念用語の字句の並びの接続詞や
前置詞を判別して係り受け関係を認識し、この係り受け
関係に従って字句の並びを名詞句の通常表現に並び替え
るようにした。According to a fourth aspect of the invention, in the invention according to the third aspect, the connectives and prepositions of the lexical sequences of the noun-like conceptual terms are distinguished to recognize the dependency relations, and the lexical relations are recognized in accordance with the dependency relations. The order is changed to the normal expression of noun phrases.

【００１１】請求項５記載の発明は、請求項３記載の発
明において、プログラムの識別子の名前に多用される動
詞が事前に設定された概念用語辞書を設け、ルーチンの
名前から獲得した字句の並びから前記概念用語辞書に設
定された動詞を検出すると、この字句を動詞的な概念用
語とすると共に他の字句を名詞的な概念用語とするよう
にした。According to a fifth aspect of the invention, in the third aspect of the invention, a concept term dictionary in which a verb frequently used for the name of the program identifier is preset is provided, and the lexical sequence obtained from the routine name is arranged. When a verb set in the conceptual term dictionary is detected from this, this lexical phrase is made to be a verb-like conceptual term, and other lexical phrases are made to be noun-like conceptual terms.

【００１２】請求項６記載の発明は、請求項１記載の発
明において、概念用語の類似度をプログラムの式や代入
文における概念用語の同時出現という事象に基づいて測
定し、この類似度により多次元尺度法で複数の概念用語
の関係を示す画像を生成するようにした。According to a sixth aspect of the present invention, in the invention according to the first aspect, the similarity of the concept terms is measured based on the phenomenon of simultaneous appearance of the concept terms in an expression of a program or an assignment statement. An image showing the relationship between a plurality of conceptual terms is generated by the dimension scaling method.

【００１３】請求項７記載の発明は、請求項１又は６記
載の発明において、概念用語の類似度をプログラムにお
ける型と型付けされたものとの関係という事象に基づい
て測定し、この類似度により多次元尺度法で複数の概念
用語の関係を示す画像を生成するようにした。According to a seventh aspect of the present invention, in the first or sixth aspect of the invention, the similarity of the conceptual terms is measured based on the phenomenon of the relation between the type and the typed one in the program. An image showing the relationship between a plurality of conceptual terms was generated by the multidimensional scaling method.

【００１４】請求項８記載の発明は、請求項１，６又は
７記載の発明において、概念用語の類似度をプログラム
のファイル構成における概念用語の同時出現という事象
に基づいて測定し、この類似度により多次元尺度法で複
数の概念用語の関係を示す画像を生成するようにした。The invention according to claim 8 is the invention according to claim 1, 6 or 7, wherein the similarity of the conceptual terms is measured based on the phenomenon of simultaneous appearance of the conceptual terms in the file structure of the program. In this way, an image showing the relationship between a plurality of conceptual terms is generated by the multidimensional scaling method.

【００１５】請求項９記載の発明は、請求項１記載の発
明において、ルーチンの外部参照の名前から獲得した概
念用語を隠蔽する効果の尺度をルーチンの類似度として
定義し、この類似度によりクラスタ分析で概念用語の隠
蔽に基づくルーチンの階層的クラスタを構築するように
した。According to a ninth aspect of the present invention, in the first aspect of the invention, a measure of the effect of hiding the conceptual term acquired from the name of the external reference of the routine is defined as the similarity of the routine, and the cluster is defined by the similarity. The analysis was made to build a hierarchical cluster of routines based on the concealment of conceptual terms.

【００１６】請求項１０記載の発明は、請求項９記載の
発明において、ルーチンにおける概念用語の分布の度合
をルーチン空間におけるエントロピとして計量化し、二
つのルーチンをクラスタ化した場合のエントロピの減少
量として二つのルーチンの類似度を測定するようにし
た。According to a tenth aspect of the invention, in the invention of the ninth aspect, the degree of distribution of the conceptual terms in the routine is quantified as an entropy in the routine space, and as a reduction amount of the entropy when the two routines are clustered. I tried to measure the similarity between two routines.

【００１７】請求項１１記載の発明は、請求項１記載の
発明において、プログラムの情報モデルを構築するオブ
ジェクトと属性とリンクとに概念用語を対応させるよう
にした。According to an eleventh aspect of the present invention, in the first aspect of the invention, conceptual terms are made to correspond to an object, an attribute, and a link that construct an information model of a program.

【００１８】請求項１２記載の発明は、請求項１１記載
の発明において、概念用語をオブジェクトに同定するル
ールとして、上位のモジュールから獲得した概念用語は
オブジェクトの候補とするようにした。According to the twelfth aspect of the invention, in the eleventh aspect of the invention, as a rule for identifying a conceptual term in an object, the conceptual term acquired from a higher-order module is set as an object candidate.

【００１９】請求項１３記載の発明は、請求項１１又は
１２記載の発明において、概念用語をオブジェクトに同
定するルールとして、複合型の名前から獲得した概念用
語はオブジェクトの候補とするようにした。According to a thirteenth aspect of the present invention, in the invention according to the eleventh aspect or the twelfth aspect, as a rule for identifying a conceptual term in an object, the conceptual term acquired from the name of the composite type is set as a candidate for the object.

【００２０】請求項１４記載の発明は、請求項１１，１
２又は１３記載の発明において、概念用語をオブジェク
トに同定するルールとして、複数のファイルに適用され
た大域変数の名前から獲得した概念用語はオブジェクト
の候補とするようにした。The invention according to claim 14 is the invention according to claims 11 and 1.
In the invention described in 2 or 13, as a rule for identifying a conceptual term in an object, the conceptual term acquired from the names of global variables applied to a plurality of files is set as a candidate for the object.

【００２１】請求項１５記載の発明は、請求項１１，１
２，１３又は１４記載の発明において、概念用語を属性
に同定するルールとして、複合型の名前から獲得した概
念用語をオブジェクトとした場合に、複合型のサブフィ
ールドの名前から獲得した概念用語を属性の候補とする
ようにした。The invention according to claim 15 is the invention according to claims 11 and 1.
In the invention described in 2, 13, or 14, as a rule for identifying a concept term as an attribute, when a concept term acquired from a composite type name is an object, the concept term acquired from a composite type subfield name is an attribute. I made it a candidate.

【００２２】請求項１６記載の発明は、請求項１１，１
２，１３，１４又は１５記載の発明において、概念用語
をリンクに同定するルールとして、ルーチンの名前から
獲得した動詞的な概念用語をリンクの候補とし、このリ
ンクの方向性はルーチンの内部で行なわれた引数の操作
に関する情報と概念用語辞書に事前に設定された動詞の
格支配情報とに基づいて決定するようにした。The invention according to claim 16 is the invention according to claims 11 and 1.
In the invention described in 2, 13, 14 or 15, as a rule for identifying a conceptual term in a link, a verb-like conceptual term acquired from the name of a routine is used as a link candidate, and the direction of this link is performed within the routine. The decision is made based on the information about the operation of the specified argument and the verb case control information preset in the conceptual term dictionary.

【００２３】請求項１７記載の発明は、変数やルーチン
や型やマクロなどの識別子が設定されてモジュールの階
層構造で形成されたプログラムから、その内容理解のた
めの情報を獲得するプログラム解析装置において、プロ
グラムの識別子の名前から概念用語を獲得する用語生成
手段を設けた。According to a seventeenth aspect of the present invention, there is provided a program analysis device for acquiring information for understanding the contents of a program formed by a hierarchical structure of modules in which identifiers such as variables, routines, types and macros are set. , The term generation means for acquiring the concept term from the name of the program identifier is provided.

【００２４】[0024]

【作用】請求項１記載の発明は、プログラムの識別子の
名前から概念用語を獲得することで、プログラムの内容
理解を支援できる抽象レベルの情報として概念用語を獲
得する。According to the first aspect of the invention, the concept term is acquired from the name of the identifier of the program, thereby acquiring the concept term as the information of the abstract level which can support the understanding of the content of the program.

【００２５】請求項２記載の発明は、プログラムの識別
子の名前に省略表現が存在する場合は、この省略表現を
本来の表現に復元して概念用語とすることで、概念用語
を理解しやすい形態とする。According to the second aspect of the invention, when an abbreviated expression is present in the name of the program identifier, the abbreviated expression is restored to the original expression to make it a conceptual term, so that the conceptual term can be easily understood. And

【００２６】請求項３記載の発明は、変数の名前から取
り出した字句の並びを名詞的な概念用語とし、ルーチン
の名前から取り出した字句の並びを名詞的な概念用語と
動詞的な概念用語との組とすることで、識別子の種類に
より概念用語の形態を簡易に予想する。According to the third aspect of the present invention, the sequence of lexical terms extracted from the name of the variable is used as a noun-like conceptual term, and the lexical sequence extracted from the name of the routine is defined as a noun-like conceptual term and a verb-like conceptual term. With the set of, the form of the conceptual term is easily predicted depending on the type of the identifier.

【００２７】請求項４記載の発明は、名詞的な概念用語
の字句の並びを係り受け関係に従って名詞句の通常表現
に並び替えることで、概念用語を理解しやすい形態とす
る。The invention according to claim 4 arranges the lexical sequences of the noun-like conceptual terms into the normal expressions of the noun phrases in accordance with the dependency relation, thereby making the conceptual terms easy to understand.

【００２８】請求項５記載の発明は、ルーチンの名前か
ら獲得した字句の並びの動詞は動詞的な概念用語とする
と共に他の字句は名詞的な概念用語とすることで、動詞
的な概念用語と名詞的な概念用語とを簡易に判別する。According to the fifth aspect of the present invention, the verbs in the lexical sequence obtained from the name of the routine are verb-like conceptual terms, and the other lexical terms are noun-like conceptual terms. And a noun-like conceptual term are easily distinguished.

【００２９】請求項６記載の発明は、プログラムの式や
代入文における概念用語の同時出現という事象に基づい
て測定した概念用語の類似度により、多次元尺度法で複
数の概念用語の関係を示す画像を生成することで、概念
用語の相互関係を理解しやすい形態とする。According to the sixth aspect of the present invention, the relationship between a plurality of conceptual terms is shown by a multidimensional scaling method based on the similarity of the conceptual terms measured based on the phenomenon of simultaneous appearance of the conceptual terms in a program expression or an assignment statement. By creating an image, we will make it easier to understand the interrelationship of conceptual terms.

【００３０】請求項７記載の発明は、プログラムにおけ
る型と型付けされたものとの関係という事象に基づいて
測定した概念用語の類似度により、多次元尺度法で複数
の概念用語の関係を示す画像を生成することで、概念用
語の相互関係を理解しやすい形態とする。The invention according to claim 7 is an image showing a relationship between a plurality of concept terms by a multidimensional scaling method based on the similarity of the concept terms measured based on the phenomenon of the relationship between the type and the typed one in the program. By creating the, the relationship between the conceptual terms is made easy to understand.

【００３１】請求項８記載の発明は、概念用語の類似度
をプログラムのファイル構成における概念用語の同時出
現という事象に基づいて測定した類似度により、多次元
尺度法で複数の概念用語の関係を示す画像を生成するこ
とで、概念用語の相互関係を理解しやすい形態とする。According to an eighth aspect of the present invention, the similarity of the concept terms is measured based on the phenomenon that the concept terms appear at the same time in the file structure of the program. By generating the image shown, the relationship between the conceptual terms is made easy to understand.

【００３２】請求項９記載の発明は、ルーチンの外部参
照の名前から獲得した概念用語を隠蔽する効果の尺度で
定義したルーチンの類似度により、クラスタ分析で概念
用語の隠蔽に基づくルーチンの階層的クラスタを構築す
ることで、プログラムの内容理解を支援できる抽象レベ
ルの情報としてプログラムのモジュール構造を反映した
階層的クラスタを獲得する。According to the ninth aspect of the present invention, the hierarchical analysis of the routine based on the hiding of the conceptual term in the cluster analysis is performed by the similarity of the routine defined by the measure of the effect of hiding the conceptual term obtained from the name of the external reference of the routine. By constructing a cluster, we obtain a hierarchical cluster that reflects the module structure of the program as abstract level information that can support understanding of the program content.

【００３３】請求項１０記載の発明は、ルーチンにおけ
る概念用語の分布の度合をルーチン空間におけるエント
ロピとして計量化し、二つのルーチンをクラスタ化した
場合のエントロピの減少量として二つのルーチンの類似
度を測定することで、プログラムの内容理解を支援でき
る抽象レベルの情報として獲得する階層的クラスタにプ
ログラムのモジュール構造を良好に反映させる。According to a tenth aspect of the present invention, the degree of distribution of conceptual terms in a routine is quantified as entropy in the routine space, and the similarity between the two routines is measured as the amount of entropy reduction when the two routines are clustered. By doing so, the module structure of the program is reflected well in the hierarchical cluster acquired as the information of the abstract level that can support the understanding of the content of the program.

【００３４】請求項１１記載の発明は、プログラムの情
報モデルを構築するオブジェクトと属性とリンクとに概
念用語を対応させることで、情報モデルを構築するため
の情報を概念用語から獲得する。According to the eleventh aspect of the present invention, the information for constructing the information model is acquired from the conceptual terms by associating the conceptual terms with the objects, attributes and links that construct the information model of the program.

【００３５】請求項１２記載の発明は、上位のモジュー
ルから獲得した概念用語はオブジェクトの候補とするこ
とで、情報モデルを構築するオブジェクトを簡易に獲得
する。According to the twelfth aspect of the present invention, the conceptual terms acquired from the higher module are used as object candidates, so that the objects for constructing the information model are easily acquired.

【００３６】請求項１３記載の発明は、複合型の名前か
ら獲得した概念用語はオブジェクトの候補とすること
で、情報モデルを構築するオブジェクトを簡易に獲得す
る。According to the thirteenth aspect of the present invention, the conceptual term acquired from the composite type name is used as an object candidate to easily acquire the object for constructing the information model.

【００３７】請求項１４記載の発明は、複数のファイル
に適用された大域変数の名前から獲得した概念用語はオ
ブジェクトの候補とすることで、情報モデルを構築する
オブジェクトを簡易に獲得する。According to the fourteenth aspect of the present invention, the concept terms acquired from the names of the global variables applied to the plurality of files are candidates for the object, so that the object for constructing the information model is easily acquired.

【００３８】請求項１５記載の発明は、複合型の名前か
ら獲得した概念用語をオブジェクトとした場合に、複合
型のサブフィールドの名前から獲得した概念用語を属性
の候補とすることで、情報モデルを構築するオブジェク
トの属性を簡易に獲得する。According to the fifteenth aspect of the present invention, when the concept term acquired from the name of the composite type is used as an object, the concept term acquired from the name of the subfield of the composite type is used as an attribute candidate, whereby the information model is obtained. Easily get the attributes of the object that builds the.

【００３９】請求項１６記載の発明は、ルーチンの名前
から獲得した動詞的な概念用語をリンクの候補とし、こ
のリンクの方向性はルーチンの内部で行なわれた引数の
操作に関する情報と概念用語辞書に事前に設定された動
詞の格支配情報とに基づいて決定することで、情報モデ
ルを構築するオブジェクト間のリンクを簡易に獲得す
る。According to the sixteenth aspect of the present invention, a verb-like conceptual term acquired from the name of the routine is used as a link candidate, and the direction of this link is information regarding the operation of the argument performed inside the routine and the conceptual term dictionary. The link between the objects that construct the information model is easily obtained by making a decision based on the verb case control information set in advance.

【００４０】請求項１７記載の発明は、用語生成手段が
プログラムの識別子の名前から概念用語を獲得すること
で、プログラムの内容理解を支援できる抽象レベルの情
報として概念用語を獲得する。According to the seventeenth aspect of the present invention, the term generation means acquires the concept term from the name of the program identifier, thereby acquiring the concept term as information at an abstract level capable of supporting the understanding of the content of the program.

【００４１】[0041]

【実施例】本発明の一実施例を図面に基づいて以下に説
明する。まず、本実施例のプログラム解析装置（図示せ
ず）は、プログラムの内容を理解するための抽象レベル
の情報を獲得するため、本実施例のプログラム解析方法
により、プログラムの識別子の名前から概念用語を獲得
するようになっている。なお、この概念用語とは、有形
物、役割、出来事、相互作用、仕様等のように、プログ
ラムの理解に役立つ概念を表す用語である。An embodiment of the present invention will be described below with reference to the drawings. First, the program analysis device (not shown) of the present embodiment acquires abstraction level information for understanding the contents of the program. Therefore, the program analysis method of the present embodiment uses the concept term from the name of the program identifier. Is to be acquired. The term “conceptual term” is a term that represents a concept useful for understanding a program, such as a tangible object, a role, an event, an interaction, a specification, and the like.

【００４２】そこで、このような概念用語を獲得する対
象となるプログラムに関して説明する。一般的なプログ
ラムは、変数やルーチンや型やマクロなどの識別子が要
所に設定されており、モジュールの階層構造で形成され
ている。識別子は変数・ルーチン名などとも称されてお
り、文字列の長さや使用できる文字種などに制限があ
る。例えば、Ｃ言語には、以下に例示するように、１．変数名 (例 NumOfPartTimeEmployee) ２．関数名 (例 CalcPaymentOfEmployee) ３．構造体か共有体のタイプ名 (例 struct person) ４．構造体か共有体のメンバ名 (例 p->age，p->name，p->sex) ５．typedef により名付けられたタイプ名 (例 person_t) ６．マクロ（と、その引数） (例 MAX_EMPLOYEES_NUM) ７．GOTO文の飛先ラベル (例 goto error：) と云うような七種類の識別子がある。Therefore, a program for which such a conceptual term is acquired will be described. In a general program, identifiers such as variables, routines, types, and macros are set at important points, and they are formed in a hierarchical structure of modules. The identifier is also called a variable or routine name, and there are restrictions on the length of the character string and the type of characters that can be used. For example, in C language, as illustrated below, 1. Variable name (eg NumOfPartTimeEmployee) 2. Function name (eg CalcPaymentOfEmployee) 3. Type name of structure or union (eg struct person) 4. Member name of structure or union (eg p-> age, p-> name, p-> sex) 5. Type name given by typedef (eg person_t) 6. Macro (and its argument) (eg MAX_EMPLOYEES_NUM) 7. There are seven types of identifiers, such as the destination label of a GOTO sentence (eg goto error :).

【００４３】そして、このようなプログラムの識別子の
名前は、一般的に識別子が付加された問題領域の内容を
反映している。そこで、本実施例のプログラム解析方法
及び装置は、ルーチンの識別子、その引数の変数の識別
子、大域変数の識別子、プログラムで定義された型の識
別子を対象とし、これらの識別子の名前から概念用語を
獲得する。The identifier name of such a program generally reflects the contents of the problem area to which the identifier is added. Therefore, the program analysis method and apparatus of the present embodiment targets a routine identifier, a variable identifier of its argument, a global variable identifier, and a program-defined type identifier, and creates conceptual terms from the names of these identifiers. To earn.

【００４４】なお、実際のプログラムでは、識別子の名
前が不適当であることや、省略に表現されていることも
珍しくない。このため、識別子の名前から概念用語を獲
得する処理を完全に自動化することは困難であり、自動
化した場合の正当性も保証されない。そこで、プログラ
ム解析装置を実際に製作する場合には、分析者との対話
形式で処理を進行させるように形成することが望まし
い。In actual programs, it is not uncommon for an identifier name to be inappropriate or to be omitted. For this reason, it is difficult to completely automate the process of acquiring the concept term from the name of the identifier, and the justification when automated is not guaranteed. Therefore, when the program analysis device is actually manufactured, it is desirable to form the program analysis device so that the processing proceeds in an interactive manner with the analyst.

【００４５】本実施例のプログラム解析装置は、各種情
報を更新自在に記憶するＲＡＭ(Random Access Memory)
等のデバイスで概念用語辞書が設けられており、この概
念用語辞書を参照して各種処理を実行するＣＰＵ(Centr
al Processing Unit）等のデバイスも設けられている。The program analysis device of this embodiment is a RAM (Random Access Memory) that stores various types of information in an updatable manner.
A device such as a device has a conceptual term dictionary, and a CPU (Centr) that executes various processes by referring to the conceptual term dictionary.
al Processing Unit) and other devices are also provided.

【００４６】そして、概念用語辞書には、プログラミン
グやコンピュータ環境の概念をカバーした基本的な概念
用語が事前に設定されているので、プログラムの識別子
の名前に多用される動詞と前置詞と接続詞とが設定さ
れ、これらの用語には、類似語、略語、格支配や品詞な
どの文法情報、なども共に設定されている。そこで、プ
ログラムの識別子の名前に多用される省略表現も、その
本来の表現と共に事前に設定されており、プログラムの
識別子の名前に多用される動詞も、その格支配情報と共
に事前に設定されている。なお、この概念用語辞書は、
プログラムの解析作業の進行と共に内容が更新される。Since basic conceptual terms covering the concepts of programming and computer environment are preset in the conceptual term dictionary, verbs, prepositions, and conjunctions often used in the names of program identifiers are These terms are also set with similar words, abbreviations, grammatical information such as case dominance and part-of-speech, and the like. Therefore, the abbreviated expressions that are often used in the names of program identifiers are also set in advance along with their original expressions, and the verbs that are often used in the names of program identifiers are also set in advance along with their case control information. . In addition, this conceptual term dictionary is
The content is updated as the program analysis work progresses.

【００４７】本実施例のプログラム解析装置によるプロ
グラム解析方法を、図１に基づいて概略的に説明する。
まず、最初にプログラムの識別子の名前から概念用語を
獲得して概念用語辞書を生成し、概念用語の類似度を測
定して概念用語の関係を示す画像として概念ネットワー
クを生成する。つぎに、分析者にプログラムのモジュー
ル構成を同定させるため、ルーチンを概念用語の分布の
度合でクラスタ分析し、さらに、分析者にプログラムの
情報モデルを構築させるため、概念用語をオブジェクト
と属性とリンクとに対応させる。A program analysis method by the program analysis apparatus of this embodiment will be schematically described with reference to FIG.
First, a concept term is acquired from the name of a program identifier, a concept term dictionary is generated, the similarity of concept terms is measured, and a concept network is generated as an image showing the relationship between concept terms. Next, in order to let the analyst identify the module structure of the program, the routine is cluster analyzed by the degree of distribution of the conceptual terms, and further, in order to let the analyst construct the information model of the program, the conceptual terms are linked with the objects and attributes. And correspond to.

【００４８】そこで、このようなプログラム解析装置の
プログラム解析方法を、段階的に以下に順次説明する。Therefore, the program analysis method of such a program analysis device will be described step by step below.

【００４９】まず、プログラムの識別子の名前から概念
用語を獲得する場合、字句検出手段が、プログラムの識
別子の名前から、有形物、役割、出来事、相互作用、仕
様等の概念を表す字句を取り出す。そして、この取り出
した字句から略語検出手段が省略表現を検出した場合、
この省略表現を本来の表現復元手段が概念用語辞書を参
照して本来の表現に復元する。First, when acquiring a conceptual term from the name of a program identifier, the lexical detection means extracts a lexical representing a concept such as a tangible object, a role, an event, an interaction, and a specification from the name of the program identifier. Then, when the abbreviation detecting means detects the abbreviation from the extracted token,
The original expression restoring means restores the abbreviated expression to the original expression by referring to the conceptual term dictionary.

【００５０】また、上述のように字句を取り出した識別
子が変数や型やメンバの場合は、用語識別手段により、
その名前から取り出した字句の並びが名詞的な概念用語
とされる。一方、識別子がルーチンの場合は、その名前
から取り出した字句の並びが名詞的な概念用語と動詞的
な概念用語との組とされる。If the lexical identifier is a variable, type or member as described above, the term identifying means
The lexical sequence extracted from the name is a noun-like concept term. On the other hand, when the identifier is a routine, the sequence of lexical words extracted from the name is a set of noun-like conceptual terms and verb-like conceptual terms.

【００５１】さらに、上述のようにして名詞的な概念用
語が得られた場合は、構文解析手段が字句の並びの接続
詞や前置詞を判別して係り受け関係を認識し、この係り
受け関係に従って字句の並びを表現復元手段が名詞句の
通常表現に並び替える。Further, when a noun-like conceptual term is obtained as described above, the syntactic analysis unit discriminates a conjunction or a preposition in a lexical sequence, recognizes a dependency relation, and recognizes the dependency relation according to this dependency relation. The expression restoring means rearranges the sequence of to the regular expression of the noun phrase.

【００５２】上述のように、変数やルーチンなどの識別
子の名前から、理解が容易な形態で概念用語が得られる
ことになる。なお、このようにして得られた概念用語
は、概念用語辞書に設定される。As described above, the conceptual terms can be obtained in a form that is easy to understand from the names of identifiers such as variables and routines. The conceptual terms thus obtained are set in the conceptual term dictionary.

【００５３】ここで、プログラムの識別子の名前から概
念用語を獲得する具体例を以下に説明する。まず、変数
の識別子の名前が“max points of team”である場合、
以下に例示するように、１．識別子の名前から字句を取り出す → （ｍａｘｐｏｉｎｔｓｏｆ
ｔｅａｍ）２．字句の省略表現を本来の表現に復元 → （ｍａｘｉｍｕｍｐｏｉｎｔ
ｓｏｆｔｅａｍ）３．字句から概念用語を認識 → concept:(maximum team point) と云うステップで概念用語を獲得する。A specific example of acquiring a concept term from the name of the program identifier will be described below. First, if the variable identifier name is “max points of team”,
As illustrated below: Extract lexical name from identifier name → (max points of of
team) 2. Restore the lexical abbreviation to the original expression → (maximum point
s of team) 3. Recognize conceptual terms from lexical words → Acquire conceptual terms in the step of concept: (maximum team point).

【００５４】このようなステップによりプログラムから
概念用語を獲得する具体例を以下に示すと、プログラム → 概念用語 int CalcMonthlytotal(int NewPurchases) → VC:caluclate NC:(monthly total) ｛ int Monthlytotal; → NC:(monthly total) Monthlytotal ＝ NewPurchases ＋ SalesTax(NewPurchases); → NC:(monthly total), NC:(new purchases), NC:(sales tax) retern(MonthlyTotal) } となる。A concrete example of acquiring a conceptual term from a program by such steps is as follows: program → concept term int CalcMonthlytotal (int NewPurchases) → VC: caluclate NC: (monthly total) {int Monthlytotal; → NC: (monthly total) Monthly total = NewPurchases + SalesTax (NewPurchases); → NC: (monthly total), NC: (new purchases), NC: (sales tax) retern (MonthlyTotal)}.

【００５５】そこで、識別子の名前から概念用語を獲得
する前述のようなステップの内容を以下に順次詳述す
る。Therefore, the contents of the above-mentioned steps for obtaining the conceptual term from the name of the identifier will be sequentially described in detail below.

【００５６】まず、識別子の名前から概念用語を獲得す
る処理のステップ１として、識別子の名前から字句を取
り出す場合について説明する。First, as step 1 of the process of acquiring the conceptual term from the name of the identifier, the case of extracting a token from the name of the identifier will be described.

【００５７】一般的に、プログラムの識別子の名前は、
数個の字句の並びからなる。例えば、プログラムがＣ言
語で記述されている場合、識別子の名前の字句の切れ目
は、ハイフンによる区切りや、小文字から大文字への変
化などとして、容易に認識できる。そこで、このように
プログラム言語の特質を利用するなどして、識別子の名
前を個々の字句に分解するので、具体的には、例１ win_hght_siz → win hght siz）例２ CalcMonthlyPymntOfEmple → calc monthly pymnt of emple などとなる。Generally, the name of the program identifier is
It consists of several lexical sequences. For example, when the program is written in the C language, breaks in the lexical name of the identifier can be easily recognized as a delimiter by a hyphen, a change from a lowercase letter to an uppercase letter, or the like. Therefore, since the name of the identifier is decomposed into individual tokens by using the characteristics of the programming language as described above, specifically, Example 1 win_hght_siz → win hght siz) Example 2 CalcMonthlyPymntOfEmple → calc monthly pymnt of emple And so on.

【００５８】つぎに、識別子の名前から概念用語を獲得
する処理のステップ２として、識別子の名前から取り出
した字句の省略表現を本来の表現に復元する処理動作に
ついて説明する。Next, as step 2 of the process of acquiring the conceptual term from the name of the identifier, the process operation of restoring the abbreviation of the lexical phrase extracted from the name of the identifier to the original expression will be described.

【００５９】一般的に、プログラムの識別子の名前には
省略表現が多用されており、このような省略表現として
は略語などがある。略語は、例えば、本来の表現の文字
列から母音を除去することや、本来の表現の先頭の数文
字だけとすることで形成されている。ここで、識別子の
名前に多用されるが概念用語としては不適当な省略表現
である略語は、 win(＝window)，scr(＝screen)，del(＝delete)，prev
(＝previous) と云うようなものである。Generally, abbreviated expressions are often used in the names of program identifiers, and such abbreviated expressions include abbreviations. The abbreviation is formed, for example, by removing vowels from the character string of the original expression or by only the first few characters of the original expression. Here, abbreviations that are often used in identifier names but are abbreviated expressions that are inappropriate as conceptual terms are win (= window), scr (= screen), del (= delete), prev
It is like (= previous).

【００６０】そこで、識別子の名前から取り出した字句
に、略語などの省略表現が存在した場合、概念用語辞書
を参照して省略表現を本来の表現に復元する。まず、識
別子の名前から取り出した字句をスペルチェックし、検
出された字句と一致する省略表現を概念用語辞書から検
索する。そこで、一致する省略表現が検索された場合
は、以下に例示するように、例１ win hght siz → window height size 例２ calc monthly pymnt of emple → calculate monthly payment of employee と云うようにして略語などの省略表現を概念用語辞書に
設定されている本来の表現に復元する。Therefore, if an abbreviation such as an abbreviation exists in the lexical name extracted from the identifier name, the abbreviation is restored to the original expression by referring to the conceptual term dictionary. First, the lexical phrase extracted from the name of the identifier is spell checked, and an abbreviation that matches the detected lexical phrase is searched from the conceptual term dictionary. Therefore, when a matching abbreviation is searched for, abbreviations such as Example 1 win hght siz → window height size Example 2 calc monthly pymnt of emple → calculate monthly payment of employee are obtained as shown below. The abbreviation is restored to the original expression set in the concept term dictionary.

【００６１】なお、字句と一致する省略表現が概念用語
辞書から検索されない場合は、例えば、この字句をメッ
セージと共にディスプレイに表示し、分析者のキーボー
ド入力により字句を復元する。また、略語の多様性
（例；calculate → cal，calc，calcu ，clc …）から
一致する省略表現が競合する場合なども、表現の復元を
分析者に任せることが好適である。さらに、一時変数や
ループ制御変数などでは、ｉ，ｘ，ｐ，等の無意味な変
数を利用することがあるが、これらは復元の対象とする
ことなく破棄する。If an abbreviation that matches a lexical phrase is not retrieved from the conceptual term dictionary, this lexical item is displayed on a display together with a message, and the lexical phrase is restored by an analyst's keyboard input. In addition, it is preferable to let the analyst restore the expression even when the matching abbreviations conflict due to the variety of abbreviations (eg, calculate → cal, calc, calcu, clc ...). Further, as temporary variables and loop control variables, meaningless variables such as i, x, p, etc. may be used, but these variables are discarded without being targeted for restoration.

【００６２】つぎに、識別子の名前から概念用語を獲得
する処理のステップ３として、字句の並替等で概念用語
を認識する処理動作を以下に説明する。Next, as a step 3 of the process of acquiring the conceptual term from the name of the identifier, the processing operation of recognizing the conceptual term by rearranging the lexical words will be described below.

【００６３】まず、一般的に、プログラマが識別子に付
与する名前は、その問題領域に属する概念や、設計や実
装に関する概念が反映されており、これらは名詞的な概
念用語と動詞的な概念用語として各々表現されることに
なる。ここで、名詞的な概念用語は、連体修飾を伴う事
物や状態や時間等（例 top level window）を示し、動
詞的な概念用語は、物理的な位置や形状や数量等、又
は、状態の生成や変化や消滅等を示す。First, in general, a name given to an identifier by a programmer reflects a concept belonging to the problem area and a concept relating to design and implementation. These are noun-like conceptual terms and verb-like conceptual terms. Will be expressed as Here, noun-like conceptual terms indicate things, states, times, etc. that accompany adnominal modification (eg, top level window), and verb-like conceptual terms include physical positions, shapes, quantities, etc., or states. Indicates generation, change, disappearance, etc.

【００６４】そして、このような概念によって識別子に
付与される名前は、その識別子の種類より概念用語が名
詞的か動詞的かを予想することができる。つまり、変数
や型には、一般的に名詞的な名前が付けられるので、そ
の名前から得られる概念用語は名詞的であることが予想
できる。また、ルーチンには、一般的に処理の行為や作
用を表現する名前が付けられるので、名詞的な概念読取
と動詞的な概念用語との組であることが予想できる。The name given to the identifier by such a concept can be predicted whether the concept term is noun or verb, depending on the kind of the identifier. That is, since variables and types are generally given noun names, it can be expected that the conceptual terms obtained from the names are nouns. In addition, since the routine is generally given a name that expresses the action or action of the process, it can be expected to be a set of noun-like conceptual reading and verb-like conceptual term.

【００６５】そこで、本実施例のプログラム解析装置
は、識別子判別手段により識別子が変数や型やメンバで
あると判別した場合は、その名前から取り出した字句の
並びを用語認識手段により名詞的な概念用語と認識す
る。また、識別子判別手段により識別子がルーチンであ
ると判別した場合は、その名前から取り出した字句の並
びを用語認識手段により名詞的な概念用語と動詞的な概
念用語との組として認識する。Therefore, in the program analysis device of the present embodiment, when the identifier discriminating means discriminates that the identifier is a variable, a type or a member, the term recognizing means takes the noun concept of the arrangement of the tokens extracted from the name. Recognize as a term. Further, when the identifier discriminating means discriminates that the identifier is a routine, the term recognition means recognizes the lexical sequence extracted from the name as a set of a noun-like conceptual term and a verb-like conceptual term.

【００６６】また、プログラムは一般的に英語で記述さ
れるが、識別子の名前は字数が制限されているので、自
然言語的な構文規則は成立しない。しかし、接続詞や前
置詞は比較的容易に判別できるので、これを利用して字
句の係り受け関係を認識することで、上述のように名詞
的な概念用語の字句を通常表現に並び替えることができ
る。Although the program is generally written in English, since the name of the identifier has a limited number of characters, a natural language syntax rule cannot be established. However, since connectives and prepositions can be identified relatively easily, it is possible to rearrange lexical words of noun conceptual terms into ordinary expressions by using this to recognize lexical dependency relationships. .

【００６７】そこで、上述のようにして識別子の名前か
ら名詞的な概念用語を獲得した場合、構文解析手段が名
詞的な概念用語の字句の並びの接続詞や前置詞を判別し
て係り受け関係を認識し、表現復元手段が係り受け関係
から連体修飾部と名詞とを同定して字句の並びを名詞句
の通常表現に並び替える。Therefore, when the noun-like conceptual term is acquired from the name of the identifier as described above, the syntactic analysis unit recognizes the dependency relation by discriminating the connective or preposition of the lexical sequence of the noun-like conceptual term. Then, the expression restoration means identifies the adnominal modifier and the noun from the dependency relationship and rearranges the lexical sequence into the normal expression of the noun phrase.

【００６８】また、本実施例のプログラム解析装置で
は、概念用語辞書にプログラムの識別子の名前に多用さ
れる動詞が事前に設定されているので、動詞検出手段が
ルーチンの名前から獲得した字句の並びから概念用語辞
書に設定された動詞を検出すると、用語判別手段が動詞
として検出された字句を動詞的な概念用語とすると共に
他の字句を名詞的な概念用語として認識する。Further, in the program analysis device of this embodiment, since the verbs that are frequently used as the names of the program identifiers are preset in the conceptual term dictionary, the verb detection means arranges the lexical sequences acquired from the routine names. When the verb set in the conceptual term dictionary is detected from the vocabulary, the term determining means recognizes the lexical term detected as the verb as a verb-like conceptual term and recognizes other lexical terms as a noun-like conceptual term.

【００６９】ここで、名詞的な概念用語と動詞的な概念
用語との形式は、名詞的な概念用語の形式：NC:(連体修飾詞１，連体修飾
詞２，…名詞) 動詞的な概念用語の形式：VC:動詞などとなる。Here, the formats of the noun-like conceptual term and the verb-like conceptual term are as follows: Noun-like conceptual term format: NC: (adnominal modifier 1, adnominal modifier 2, ... noun) Verb-like concept Term format: VC: Verb etc.

【００７０】さらに、識別子の名前から獲得した字句の
並びを概念用語として認識する実例を以下に示す。な
お、ここで示す実例は、前述したステップ２で概念用語
の省略表現を本来の表現に復元した概念用語の字句の並
びを、さらに名詞的な概念用語と動詞的な概念用語とし
て認識する場合であり、例１ (window height size) → NC：（ｗｉｎｄｏｗｈｅｉｇｈｔｓｉ
ｚｅ）例２（ｃａｌｃｕｌａｔｅｍｏｎｔｈｌｙｐａｙｍｅｎｔｏｆｅｍ
ｐｌｏｙｅｅ） → VC：calculate，NC:(employee payment) などとなる。Further, an actual example of recognizing a sequence of tokens acquired from the name of the identifier as a conceptual term is shown below. In addition, the example shown here is a case in which the lexical sequence of the concept terms obtained by restoring the abbreviations of the concept terms to the original expressions in step 2 is further recognized as the noun concept terms and the verb concept terms. Yes, Example 1 (window height size) → NC: (window height si)
ze) Example 2 (calculate only payment of em)
playee) → VC: calculate, NC: (employee payment), etc.

【００７１】上述のように、プログラムの識別子から概
念用語を得ることができるので、このようにして得た概
念用語を概念用語辞書に設定し、図１に例示するよう
に、次は概念用語の類似度を測定して概念用語の関係を
示す画像として概念ネットワークを生成する。As described above, since the concept term can be obtained from the program identifier, the concept term thus obtained is set in the concept term dictionary, and as shown in FIG. A conceptual network is generated as an image showing the relationship between conceptual terms by measuring the degree of similarity.

【００７２】この概念ネットワークとは、プログラムか
ら得た複数の概念用語の各々をノードとし、これらの概
念用語の意味的な関連に基づいてリンクを形成したネッ
トワークである。そして、このような概念ネットワーク
を構築するため、本実施例のプログラム解析装置は、概
念用語の類似度を類似度測定手段で測定し、概念用語の
類似度を平面上に投影した画像を画像生成手段により多
次元尺度法で生成する。なお、この多次元尺度法とは、
対象間の類似度が、対象を表す点の位置の近さとして表
現されるような布置を求める統計的手法である。The concept network is a network in which each of a plurality of concept terms obtained from a program is used as a node and a link is formed based on the semantic relation of these concept terms. Then, in order to construct such a concept network, the program analysis device of the present embodiment measures the similarity of the concept terms by the similarity measuring means and generates an image by projecting the similarity of the concept terms on a plane. By means of a multidimensional scaling method. In addition, this multidimensional scaling method is
It is a statistical method for finding a placement in which the degree of similarity between objects is expressed as the closeness of the positions of points representing the objects.

【００７３】そこで、このように概念用語から概念ネッ
トワークを生成する方法を、以下に順次詳述する。ま
ず、概念用語の類似度を測定する必要があるので、本実
施例のプログラム解析装置では、概念用語の類似度を以
下に例示するような三種類の方法で測定する。Therefore, the method for generating the concept network from the concept terms in this way will be described below in order. First, since it is necessary to measure the similarity of the conceptual terms, the program analysis device of this embodiment measures the similarity of the conceptual terms by the following three types of methods.

【００７４】まず、類似度を測定する第一の方法では、
概念用語の類似度をプログラムの式や代入文における概
念用語の同時出現という事象に基づいて測定する。First, in the first method of measuring the similarity,
The similarity of conceptual terms is measured based on the phenomenon of simultaneous appearance of conceptual terms in program expressions and assignment statements.

【００７５】つまり、一般的にプログラムの式や代入文
の中に共に記述された複数の変数には依存関係があるの
で、これらの変数から得られた概念用語にも意味的な関
連性があることが予想される。さらに、変数に加えて関
数が記述されている場合にも同様である。That is, since a plurality of variables described together in a program expression or assignment statement generally have a dependency relationship, the conceptual terms obtained from these variables also have a semantic relevance. It is expected that. The same applies when a function is described in addition to a variable.

【００７６】例えば、“MonthlyTotal＝NewParchases＋
SalesTax(NewParchases)”と云うような代入文（Ｃ言語
では代入式）の変数と関数とからは、NC:(monthly tota
l)とNC:(new parchases）とNC:(sales tax)との三つの
概念用語が獲得されるが、これらの概念用語は意味的に
関連していることが予想される。For example, "MonthlyTotal = NewParchases +"
From the variables and functions of the assignment statement (assignment expression in C language) such as “SalesTax (NewParchases)”, NC: (monthly tota
Three conceptual terms, l), NC: (new parchases) and NC: (sales tax), are acquired, and it is expected that these conceptual terms are semantically related.

【００７７】そこで、本実施例のプログラム解析装置
は、上述のような式や代入文における概念用語の同時出
現という事象をプログラムの全体に対して出現検出手段
で調査し、これを集計して演算処理することで概念用語
の意味的な類似度を測定する。つまり、式や代入文に同
時出現する概念用語Ｔ１，Ｔ２の類似度は、プログラム
に含まれる全部の式（計算式と論理式）と代入文とに対
して概念用語Ｔ１，Ｔ２が同一の式や代入文に出現する
回数により測定される。そこで、この数式は以下に例示
するように、ｎ行ｎ列の類似度行列：｜pij｜＝ΣOccurrence1(sk，ci，cj）｛k:1…m} となる。なお、この数式では、ｎ：プログラムから得た概念用語の総数ｍ：プログラムの式と代入文との総数 pij：類似度行列のｉ行ｊ列の要素 sk：プログラムのｋ番目の式や代入文 ci：ｉ番目の概念用語 cj：ｊ番目の概念用語 Occurrence1(sk，ci，cj）：“sk”の式や代入文に“c
i，cj”の概念用語が同時に出現すると“１”となり、
これ以外では“０”となる関数となっている。Therefore, the program analysis apparatus of the present embodiment investigates the phenomenon of the simultaneous appearance of the conceptual terms in the expressions and assignment statements as described above by the appearance detection means in the entire program, and totalizes and calculates them. By processing, the semantic similarity of conceptual terms is measured. That is, the degree of similarity between the conceptual terms T1 and T2 that appear simultaneously in an expression or an assignment statement is an expression in which the conceptual terms T1 and T2 are the same for all expressions (calculations and logical expressions) and assignment statements included in the program. And the number of occurrences in assignment statements. Therefore, as illustrated below, this equation becomes a similarity matrix of n rows and n columns: | pij | = ΣOccurrence1 (sk, ci, cj) {k: 1 ... m}. In this mathematical expression, n is the total number of conceptual terms obtained from the program, m is the total number of expressions and assignment statements of the program, pij is the element at the i-th row and j-th column of the similarity matrix, sk is the k-th expression or assignment statement of the program. ci: i-th conceptual term cj: j-th conceptual term Occurrence1 (sk, ci, cj): "c" in the expression or assignment statement
When the concept terms "i, cj" appear at the same time, it becomes "1",
Other than this, the function is "0".

【００７８】つぎに、類似度を測定する第二の方法で
は、概念用語の類似度をプログラムにおける型と型付け
されたものとの関係という事象に基づいて測定する。つ
まり、プログラムの変数には型という属性があり、これ
は変数を示すメモリ内の値の意味を決定するので、これ
らの変数と型との各々が持つ意味には関連性があること
が予想される。同様に、関数の戻り値にも型が指定され
るので、これらの関数と型との各々が持つ意味にも関連
性があることが予想される。特に複合型の場合には、型
は所定の対象と連係していることが多く、名前の観点か
らすると、型の名前と、型付けされたものの名前とに
は、相関関係が存在すると予想される。Next, in the second method of measuring the degree of similarity, the degree of similarity of the conceptual terms is measured based on the phenomenon of the relation between the type and the typed one in the program. In other words, the variables of the program have an attribute called type, which determines the meaning of the in-memory value that represents the variable, so it is expected that the meanings of each of these variables and types will be related. It Similarly, since the return value of a function is also designated with a type, it is expected that the meanings of each of these functions and types are related. Especially in the case of complex types, types are often associated with a given target, and from a name perspective, it is expected that there will be a correlation between the name of the type and the name of the typed one. .

【００７９】例えば、以下に例示する型と型付けされた
変数とからは、変数の宣言の例：struct xy_coodes PreviousMousePoin
t; 型の概念用語NC:(previous mouse point）と変数の概念
用語NC:(xy codinate)とが獲得されるが、これらの概念
用語は意味的に関連していることが予想される。For example, from the types illustrated below and typed variables, an example of variable declaration: struct xy_coodes PreviousMousePoin
The concept term NC: (previous mouse point) of type t; and the concept term NC: (xy codinate) of variable are acquired, but it is expected that these concept terms are semantically related.

【００８０】そこで、本実施例のプログラム解析装置
は、上述のような型と型付けされたものとの関係という
事象をプログラムの全体に対して出現検出手段で調査
し、これを集計して演算処理することで概念用語の意味
的な類似度を測定する。つまり、型と型付けされたもの
という事象に基づく概念用語Ｔ１，Ｔ２の類似度は、プ
ログラムに含まれる全部の式（計算式と論理式）と代入
文とに対して概念用語Ｔ１，Ｔ２が同一の式や代入文に
出現する回数により測定される。この数式は以下に例示
するように、ｎ行ｎ列の類似度行列：｜pij｜＝ΣOccurrence2(dk，ci，cj）｛k:1…m} となる。そして、この数式では、ｎ：プログラムから得た概念用語の総数ｍ：プログラムの式と代入文との総数 pij：類似度行列のｉ行ｊ列の要素 dk：プログラムのｋ番目の変数や関数の宣言の総数 ci：ｉ番目の概念用語 cj：ｊ番目の概念用語 Occurrence2(dk，ci，cj）：“dk”の変数や関数の宣言
に“ci，cj”の概念用語が同時に出現すると“１”とな
り、これ以外では“０”となる関数となっている。Therefore, the program analysis apparatus of the present embodiment investigates the phenomenon of the relationship between the above type and the typed one with respect to the entire program by the appearance detection means, and totalizes this to perform arithmetic processing. By doing so, the semantic similarity of the conceptual terms is measured. That is, the similarity between the concept terms T1 and T2 based on the phenomenon of type and typed thing is that the concept terms T1 and T2 are the same for all expressions (calculation expressions and logical expressions) and assignment statements included in the program. It is measured by the number of occurrences in the expression and assignment statement. As exemplified below, this mathematical expression is a similarity matrix of n rows and n columns: | pij | = ΣOccurrence2 (dk, ci, cj) {k: 1 ... m}. In this mathematical expression, n: the total number of conceptual terms obtained from the program, m: the total number of expressions and assignment statements of the program, pij: the element in the i-th row and j-th column of the similarity matrix, dk: the k-th variable or function of the program Total number of declarations ci: i-th conceptual term cj: j-th conceptual term Occurrence2 (dk, ci, cj): When the conceptual terms of "ci, cj" appear at the same time in the declaration of variables or functions of "dk", "1"", And other than this, it is a function that becomes" 0 ".

【００８１】また、プログラムがオブジェクト指向プロ
グラミングによる場合には、クラスオブジェクトが型に
相当し、インスタンスが型付けされたものに相当する。When the program is based on object-oriented programming, class objects correspond to types and instances correspond to types.

【００８２】そして、類似度を測定する第三の方法で
は、概念用語の類似度をプログラムのファイル構成にお
ける概念用語の同時出現という事象に基づいて測定す
る。つまり、プログラムのファイル構成によると、同一
の関数や、同一のファイルや、ファイルが同一のディレ
クトリなどに、同時に存在する概念用語にも関連性があ
ることが予想される。一般的にファイル内のルーチンは
同一機能を実現するモジュールとして設定されているこ
とが多く、このことはファイルとディレクトリについて
も同様である。In the third method of measuring the similarity, the similarity of the conceptual terms is measured based on the phenomenon that the conceptual terms simultaneously appear in the file structure of the program. That is, according to the file structure of the program, it is expected that the concept terms that exist at the same time in the same function, the same file, the same directory, and the like are related. In general, the routines in a file are often set as modules that realize the same function, and this also applies to files and directories.

【００８３】そこで、本実施例のプログラム解析装置
は、上述のようなプログラムのファイル構成における概
念用語の同時出現という事象をプログラムの全体に対し
て出現検出手段で調査し、これを集計して演算処理する
ことで概念用語の意味的な類似度を測定する。つまり、
プログラムのファイル構成における概念用語の同時出現
という事象に基づく概念用語Ｔ１，Ｔ２の類似度は、同
一関数に同時出現、同一ファイルに同時出現、同一ディ
レクトリに同時出現、の順番で高い値を持つようにし、
さらに、このような概念用語の同時出現の全部を合算す
ることで、ｎ行ｎ列の類似度行列：｜pij｜＝ΣOccurrence3(fk，fl，ci，cj）｛k:1…m}｛l:1…m} として概念用語の類似度が測定される。なお、この数式
では、ｎ：プログラムから得た概念用語の総数ｍ：プログラムで定義されている関数の総数 pij：類似度行列のｉ行ｊ列の要素 fk：プログラムで定義されているｋ番目の関数 fl：プログラムで定義されているｌ番目の関数 ci：ｉ番目の概念用語 cj：ｊ番目の概念用語 Occurrence3(fk，fl，ci，cj）：“ci，cj”の概念用語
が“fk，fl”の関数に同時に出現し、かつ、“fk，fl”
が同一の関数の場合は“３”、“fk，fl”が同一のソー
スファイルに含まれる場合は“２”、“fk，fl”が同一
のディレクトリ上のソースファイルに含まれる場合は
“１”、これ以外の場合は“０”となる関数となっている。Therefore, the program analysis apparatus of the present embodiment investigates the phenomenon of simultaneous appearance of the conceptual terms in the file structure of the program as described above by the appearance detection means for the whole program, and totalizes and calculates this. By processing, the semantic similarity of conceptual terms is measured. That is,
The similarities of the conceptual terms T1 and T2 based on the phenomenon that the conceptual terms in the file structure of the program appear at the same time have high values in the order of simultaneous appearance in the same function, simultaneous appearance in the same file, and simultaneous appearance in the same directory. West,
Furthermore, by summing up all such simultaneous occurrences of conceptual terms, an n-by-n similarity matrix: | pij | = ΣOccurrence3 (fk, fl, ci, cj) {k: 1 ... m} {l The similarity of conceptual terms is measured as: 1 ... m}. In this mathematical expression, n is the total number of conceptual terms obtained from the program, m is the total number of functions defined in the program, pij is the element at the i-th row and j-th column of the similarity matrix, fk is the k-th element defined in the program. Function fl: l-th function defined in the program ci: i-th conceptual term cj: j-th conceptual term Occurrence3 (fk, fl, ci, cj): “ci, cj” conceptual term is “fk, Appear simultaneously in the function of "fl" and "fk, fl"
Are "3" if they are the same function, "2" if "fk, fl" are included in the same source file, and "1" if "fk, fl" are included in the source files on the same directory. ", Otherwise it is a function that becomes" 0 ".

【００８４】上述のように、概念用語の類似度が三種類
の行列として測定されるので、本実施例のプログラム解
析装置は、三種類の類似度の行列の一つずつを入力とし
て画像生成手段で多次元尺度法を実行することで、図２
に例示するように、概念用語が平面上にプロットされた
画像を生成し、この画像をディスプレイ（図示せず）の
表示やプリンタ（図示せず）の印刷で出力する。また、
三種類の類似度の行列を線形結合して一つとした行列か
らも、同様な画像を生成することができる。As described above, since the similarities of the conceptual terms are measured as three types of matrices, the program analysis apparatus of the present embodiment uses the image forming means with each of the three types of similarity matrices as an input. By performing a multidimensional scaling method on
As illustrated in FIG. 3, an image in which the conceptual terms are plotted on a plane is generated, and this image is output by displaying on a display (not shown) or printing on a printer (not shown). Also,
A similar image can be generated from a matrix obtained by linearly combining three matrices having similarities.

【００８５】このようにして生成される画像は前述した
概念ネットワークに相当し、プログラムの概念用語の関
係を二次元的に表現しているので、分析者のプログラム
の内容理解を支援することができる。The image thus generated corresponds to the above-mentioned concept network and two-dimensionally expresses the relationship between the concept terms of the program, so that it is possible to assist the analyst in understanding the content of the program. .

【００８６】さらに、本実施例のプログラム解析装置
は、構成解析手段によりルーチンを概念用語の分布の度
合でクラスタ分析し、この結果をプログラムの内容理解
を支援する抽象レベルの情報として分析者に提供する。
そこで、この分析者は獲得した情報に従ってプログラム
のモジュール構成を同定することになる。Further, in the program analysis device of this embodiment, the configuration analysis means performs the cluster analysis of the routine on the degree of distribution of the conceptual terms, and provides the result to the analyst as abstract level information for supporting the understanding of the contents of the program. To do.
Therefore, this analyst will identify the module structure of the program according to the acquired information.

【００８７】ここで、本実施例のプログラム解析装置が
対象とするプログラムは、図３に例示するように、モジ
ュールの階層構造で形成されている。このモジュールと
は、所定の機能を実現するルーチンの集合である。そし
て、分析者がプログラムの内容を理解するためには、そ
のモジュールを判別して階層構造を認識することが重要
であるので、プログラムのモジュール構成を同定する必
要がある。Here, the program targeted by the program analysis apparatus of the present embodiment is formed in a hierarchical structure of modules as illustrated in FIG. This module is a set of routines that realize a predetermined function. In order for the analyst to understand the content of the program, it is important to identify the module and recognize the hierarchical structure, and therefore it is necessary to identify the module configuration of the program.

【００８８】まず、一般的にプログラムにおけるモジュ
ールの設計では情報隠蔽が重視されており、詳細な情報
はモジュールの内部に隠蔽してしまうことが望ましいと
されている。このようにして製作されたモジュールは、
特有の概念用語を含むと予想できる。例えば、ルーチン
の外部参照には、大域変数、外部から定義された型、ル
ーチンの呼出し、などがあるが、モジュールを形成する
ルーチンの集合は、それらのルーチンの外部参照から獲
得される概念用語を良好に隠蔽するルーチンの集合と想
定できる。First, in designing a module in a program, information hiding is generally considered important, and it is desirable to hide detailed information in the module. The module manufactured in this way is
It can be expected to include unique conceptual terms. For example, external references to routines include global variables, externally defined types, routine invocations, etc., but the set of routines that form a module defines the terminology acquired from external references to those routines. It can be assumed to be a set of routines that conceal well.

【００８９】そこで、本実施例のプログラム解析装置で
は、ルーチンの外部参照の名前から獲得した概念用語に
関し、これを隠蔽する効果の尺度を類似度測定手段によ
りルーチンの類似度として定義する。そして、この類似
度に基づいて分析実行手段がクラスタ分析を実行するこ
とで、概念用語の隠蔽に基づくルーチンの階層的クラス
タを構築する。このクラスタ階層は、プログラムに存在
するモジュール構造などを示唆する情報となる。Therefore, in the program analysis device of this embodiment, the measure of the effect of concealing the conceptual term acquired from the external reference name of the routine is defined as the similarity of the routine by the similarity measuring means. Then, the analysis executing means executes the cluster analysis based on the similarity, thereby constructing a hierarchical cluster of routines based on the concealment of the conceptual term. This cluster hierarchy serves as information that suggests the module structure and the like existing in the program.

【００９０】なお、クラスタ分析は、対象データを組織
化して意味のある構造にまとめ、分類を発展させる問題
で多用されている。クラスタ分析では、対象データを類
似度や距離に基づいて結合することでクラスタを形成
し、最終的にクラスタの階層的樹形図を生成するが、そ
の過程で二つの対象データの組の全部に対して類似度を
計量化する。この類似度の定義は、クラスタ分析の結果
が備える意味に影響するので、目的を反映させた定義を
与えておく必要がある。The cluster analysis is often used for the problem of organizing the target data into a meaningful structure and developing the classification. In cluster analysis, target data are combined based on similarity and distance to form a cluster, and finally a hierarchical tree diagram of the cluster is generated. In the process, all two target data sets are combined. The degree of similarity is quantified. This definition of similarity affects the meaning of the result of the cluster analysis, so it is necessary to give a definition that reflects the purpose.

【００９１】そこで、本実施例のプログラム解析装置で
は、度合計量手段により、ルーチンにおける概念用語の
分布の度合をルーチン空間におけるエントロピとして計
量化し、類似度測定手段により、二つのルーチンをクラ
スタ化した場合のエントロピの減少量として二つのルー
チンの類似度を測定する。このように、クラスタ分析の
過程がルーチン空間での概念用語の分布の乱雑さを最適
に減少させる過程と対応するので、ルーチンの外部参照
の名前から獲得された概念用語を良好にクラスタに局所
化することができ、プログラムの設計上の情報隠蔽に根
ざしたルーチンの階層的クラスタを構築することができ
る。Therefore, in the program analysis device of this embodiment, the degree sum distribution means quantifies the degree of distribution of conceptual terms in the routine as entropy in the routine space, and the similarity measurement means clusters the two routines. The similarity between the two routines is measured as the amount of entropy reduction in the case. Thus, the process of cluster analysis corresponds to the process of optimally reducing the clutter of the distribution of concept terms in the routine space, so that the concept terms obtained from the names of routine external references are well localized in clusters. It is possible to construct a hierarchical cluster of routines rooted in information hiding in the design of programs.

【００９２】ここで、本実施例のプログラム解析装置に
おける、度合計量手段によるエントロピの計量方法と、
類似度測定手段によるルーチンの類似度の測定方法とを
以下に詳述する。まず、モジュールを形成するルーチン
に概念用語が出現する事象に対し、その出現確率Ｐ(atr
_insi,ej)は、Here, in the program analysis apparatus of this embodiment, an entropy measuring method by the degree totalizing means,
The routine similarity measurement method by the similarity measurement means will be described in detail below. First, for an event in which a conceptual term appears in a routine that forms a module, the occurrence probability P (atr
_insi, ej) is

【００９３】[0093]

【数１】 [Equation 1]

【００９４】として算定される。なお、この数式では、 atr_insi ：ｉ番目の概念用語 ej ：ｊ番目のルーチン NumEty ：プログラム中のルーチンの総数 NumAtrIns ：プログラムにおける外部参照から得られ
る概念用語の総数 NumOfOccur(atr_insi,ej) ： ejにおけるatr_insiの
出現回数となっている。It is calculated as In this formula, atr_insi: i-th conceptual term ej: j-th routine NumEty: total number of routines in the program NumAtrIns: total number of conceptual terms obtained from external reference in the program NumOfOccur (atr_insi, ej): atr_insi in ej Is the number of appearances of.

【００９５】すると、全部のルーチンにおける概念用語
のエントロピＨ(atr_insi,〔e₁,e₂,…,e_NumEty〕)は、Then, the entropy H (atr_insi, [e ₁ , e ₂ , ..., e _NumEty ]) of the conceptual term in all the routines is

【００９６】[0096]

【数２】 [Equation 2]

【００９７】として算定される。It is calculated as

【００９８】つぎに、二つのルーチンepとeqとの距離Di
stance(ep,eq)は、二つのルーチンepとeqとをクラスタ
化する以前の概念用語のエントロピと、クラスタ化した
以後の概念用語のエントロピとの差として、Next, the distance Di between the two routines ep and eq
stance (ep, eq) is the difference between the entropy of the conceptual term before clustering the two routines ep and eq, and the entropy of the conceptual term after clustering,

【００９９】[0099]

【数３】 (Equation 3)

【０１００】のように算定される。It is calculated as follows.

【０１０１】そして、このようにして二つのルーチンep
とeqとの距離の逆数として、二つのルーチンepとeqとの
類似度が算定されるので、このような類似度を全部のル
ーチンの組合せで算定することになる。Then, in this way, the two routines ep
Since the degree of similarity between the two routines ep and eq is calculated as the reciprocal of the distance between eq and eq, such degree of similarity is calculated for all combinations of routines.

【０１０２】このように、プログラムのモジュールを形
成するルーチンの類似度が測定されるので、このルーチ
ンの類似度により、クラスタ分析で概念用語の隠蔽に基
づくルーチンの階層的クラスタを階層的樹形図として生
成する。このクラスタ分析の処理は、初期クラスタセッ
トの生成、クラスタ化候補の選択、クラスタセットの更
新、という三つのステップからなる。In this way, the similarity of the routines that form the modules of the program is measured, and the similarity of this routine is used to determine the hierarchical cluster of routines based on the concealment of conceptual terms in the cluster analysis. Generate as. The process of the cluster analysis includes three steps of generating an initial cluster set, selecting a clustering candidate, and updating the cluster set.

【０１０３】そこで、クラスタ分析の第一ステップにお
いては、クラスタリングの対象となる全部のルーチンで
初期クラスタセットを形成する。つぎに、第二ステップ
においては、クラスタセットに含まれる要素（最初はル
ーチン）を二つずつ組として選出し、これをクラスタ化
の候補集合として登録する。そして、第三ステップにお
いては、登録した候補集合から同一要素を含むクラスタ
を選出し、このクラスタセットをデータとして保持して
から、候補集合の内容をクラスタセットに反映させて更
新する。そこで、候補集合が空となるまで第二ステップ
と第三ステップとを繰り返し、データとして保持したク
ラスタセットを集計することで、図４に例示するよう
に、ルーチンの階層的クラスタが階層的樹形図として生
成される。Therefore, in the first step of cluster analysis, the initial cluster set is formed by all the routines to be clustered. Next, in the second step, two elements (initially, routines) included in the cluster set are selected as a set and registered as a candidate set for clustering. Then, in the third step, a cluster including the same element is selected from the registered candidate set, this cluster set is held as data, and the contents of the candidate set are reflected in the cluster set and updated. Therefore, by repeating the second step and the third step until the candidate set becomes empty and aggregating the cluster set held as data, as shown in FIG. 4, the hierarchical cluster of the routine has a hierarchical tree shape. Generated as a figure.

【０１０４】このように、本実施例のプログラム解析装
置は、ルーチンの階層的樹形図を生成するので、これを
ディスプレイの表示やプリンタの印刷などで出力する。
このようにして出力される画像はプログラムのモジュー
ルの階層構造を反映しているので、分析者のプログラム
の内容理解を支援することができる。As described above, the program analysis apparatus of this embodiment generates the hierarchical tree diagram of the routine, and outputs it by displaying on the display or printing by the printer.
The image thus output reflects the hierarchical structure of the modules of the program, so that it is possible to assist the analyst in understanding the contents of the program.

【０１０５】つぎに、本実施例のプログラム解析装置
は、モデル生成手段により概念用語をオブジェクトと属
性とリンクとに対応させ、プログラムの内容を示す抽象
レベルの情報として分析者に提供する。そこで、この分
析者は、獲得したオブジェクトと属性とリンクとによ
り、プログラムの情報モデルを構築することになる。Next, in the program analysis apparatus of this embodiment, the model generation means associates the conceptual terms with the objects, the attributes, and the links, and provides them to the analyst as abstract level information indicating the contents of the program. Therefore, this analyst will construct an information model of the program by using the acquired objects, attributes and links.

【０１０６】ここで、プログラムの情報モデルに関して
説明する。まず、この情報モデルとは、プログラムの内
容を示す情報として、Ｓ．シュレィアーやＳ．Ｊ．メラ
ーが提唱したもので、プログラムが対象とする問題領域
をデータ中心に表現したものであり、例えば、オブジェ
クト指向プログラムではオブジェクトモデルと称されて
いる。Here, the information model of the program will be described. First, this information model is an S. Schreier and S. J. It is proposed by Meller and expresses the problem area targeted by the program mainly in data. For example, it is called an object model in an object-oriented program.

【０１０７】そして、情報モデルの目的は、プログラム
を構成する実体群、つまり、有形物、役割、出来事、相
互作用、仕様等のオブジェクト群を、プログラムから抽
出することである。オブジェクトとは、実世界のものの
集合を抽象化したものであり、この集合に属する全部の
ものは、同一の性質を持ち、かつ、全部のインスタンス
が同一の規則や法則に従う、という二つの条件を満足し
ている。The purpose of the information model is to extract from the program an entity group that constitutes the program, that is, an object group such as tangible objects, roles, events, interactions, and specifications. An object is an abstraction of a set of things in the real world, and all the things that belong to this set have the same property, and all instances follow the same rules and rules. Is pleased.

【０１０８】さらに、情報モデルには、オブジェクトの
他に属性とリンクという表現が含まれるが、属性とは、
オブジェクトの特性であり、一つのオブジェクトとして
抽象化された実体の全部が所有する一つの性質を抽象化
したものである。また、リンクとは、実世界の異なる種
類のものの間で体系的に成立する関連の集合を抽象化し
たものであり、一般的にプログラムにおいては問題記述
で動詞により表現されることが多く（例 work）、オブ
ジェクトからオブジェクトへのポインタとして実装され
る(例 company→employee）。Furthermore, the information model includes expressions such as attributes and links in addition to objects.
It is a characteristic of an object, and is an abstraction of one property owned by all entities that are abstracted as one object. A link is an abstraction of a set of relationships that systematically holds between different kinds of things in the real world, and is often expressed as a verb in a problem description in a program (example: work), implemented as a pointer from object to object (eg company → employee).

【０１０９】そこで、本実施例のプログラム解析装置
が、モデル生成手段により情報モデルを構築する方法を
以下に順次説明する。Therefore, the method by which the program analysis device of this embodiment constructs the information model by the model generation means will be sequentially described below.

【０１１０】まず、この段階では、概念用語辞書にはプ
ログラムの識別子の名前から取り出された全部の概念用
語が設定されているが、情報モデルにおけるオブジェク
トに概念用語を同定するためには、この概念用語がプロ
グラムの問題領域に関する概念用語であると共に、オブ
ジェクトとしての条件を満足する必要がある。At this stage, all the conceptual terms extracted from the names of the program identifiers are set in the conceptual term dictionary, but in order to identify the conceptual terms in the object in the information model, this conceptual term is used. The term is a conceptual term related to the problem area of the program, and it is necessary to satisfy the condition as an object.

【０１１１】しかし、概念用語辞書に設定された概念用
語は、前述のようにプログラムから直接的に全部を取り
出したので、プログラムの設計の実装に関する概念用語
や詳細すぎる概念用語まで含まれている。そこで、上述
のように実装に関する概念用語や詳細すぎる概念用語な
どは排除して、プログラムが対象とする問題領域に関す
る概念用語のみを取り出すことになる。However, since all the conceptual terms set in the conceptual term dictionary are directly extracted from the program as described above, the conceptual terms related to the implementation of the program design and the conceptual terms that are too detailed are included. Therefore, as described above, the conceptual terms related to the implementation and the conceptual terms that are too detailed are excluded, and only the conceptual terms related to the problem area targeted by the program are extracted.

【０１１２】ここで、排除が必要な実装に関する概念用
語の実例と、取り出すことが必要な問題領域に関する概
念用語の実例とを、以下に各々例示すると、実装に関す
る概念用語が、 (binary search）（tree structure）（quick sort) となっている場合、問題領域に関する概念用語は、 (employee work time）（student credit）（airplane
altitude) となる。Here, an example of the conceptual terms relating to the implementation that needs to be excluded and an example of the conceptual terms relating to the problem area that needs to be taken out are respectively illustrated below. The conceptual terms relating to the implementation are (binary search) ( tree structure) (quick sort), the conceptual terms for the problem area are (employee work time) (student credit) (airplane
altitude).

【０１１３】つまり、このような概念用語を情報モデル
におけるオブジェクトに同定するためには、その概念用
語がプログラムの問題領域に関する概念用語であるかど
うかと、オブジェクトとしての条件を満足するかどうか
とを判断する必要がある。That is, in order to identify such a conceptual term as an object in the information model, it is necessary to determine whether the conceptual term is a conceptual term related to the problem area of the program and whether the condition as the object is satisfied. Need to judge.

【０１１４】そこで、本実施例のプログラム解析装置
は、概念用語をオブジェクトに同定するルールとして、
以下に順次例示する第一から第三のＨＲ(Heuristic Rul
e)が、ＲＡＭ等のルール記憶手段に設定されている。Therefore, the program analysis apparatus according to the present embodiment uses, as a rule for identifying a conceptual term as an object,
The first to third HRs (Heuristic Rul)
e) is set in rule storage means such as RAM.

【０１１５】なお、これらのＨＲは、最終的な結論を自
動的に生成するためのルールではなく、分析者の判断を
支援するためのルールであるので、プログラム解析装置
は、対話形式で処理を進行させるように形成される。こ
の場合、プログラム解析装置は、ＨＲでオブジェクトの
候補となる概念用語を選出してディスプレイで表示する
ので、この概念用語をオブジェクトに同定するかどうか
は分析者が判断してキーボードの入力操作で指定するこ
とになる。Since these HR are not rules for automatically generating the final conclusion but rules for supporting the analyst's judgment, the program analysis device performs processing in an interactive manner. It is formed to proceed. In this case, the program analysis device selects a conceptual term that is a candidate for an object by the HR and displays it on the display. Therefore, the analyst determines whether or not to identify the conceptual term as an object and designates it by input operation on the keyboard. Will be done.

【０１１６】まず、第一のＨＲとしては、上位のモジュ
ールから獲得した概念用語をオブジェクトの候補とす
る。First, as the first HR, the conceptual terms acquired from the upper module are used as object candidates.

【０１１７】つまり、階層構造のモジュールからなるプ
ログラムは、一般的にトップダウン設計されているの
で、上位のモジュールは上位の設計概念を実現してお
り、下位のモジュールは下位の設計概念を実現してい
る。このため、下位のモジュールは、ハードウェアに依
存する部分を実現していることや、抽象レベルのデータ
構造の実現などより詳細な設計を実現していることが多
く、上位のモジュールは、プログラムの問題領域に関連
することが多い。That is, since a program composed of hierarchically structured modules is generally designed top down, the upper module realizes the upper design concept and the lower module realizes the lower design concept. ing. For this reason, the lower-level module often implements a part that depends on the hardware and a more detailed design such as the implementation of an abstract level data structure. Often related to problem areas.

【０１１８】そこで、上位のモジュールから獲得した概
念用語は、プログラムの問題領域に関する概念用語であ
ることが予想されるので、これをオブジェクトの候補と
して分析者に提供する。Therefore, it is expected that the conceptual term acquired from the higher-level module is a conceptual term related to the problem area of the program, and therefore this is provided to the analyst as a candidate for the object.

【０１１９】つぎに、第二のＨＲとしては、複合型の名
前から獲得した概念用語はオブジェクトの候補とする。Next, as the second HR, the conceptual term acquired from the name of the composite type is a candidate for an object.

【０１２０】つまり、プログラムに定義されている複合
型の型は、一般的に実世界の対象の情報を表現している
ことが多い。そこで、複合型の名前から獲得した概念用
語は、オブジェクトの候補となり得るので、これをオブ
ジェクトの候補として分析者に提供する。ここで、プロ
グラムに定義された複合型と、この複合型から獲得され
る概念用語との実例を以下に例示すると、プログラムに
定義された複合型が、 struct person｛int age；int name；int sex；｝；となっている場合、この複合型から獲得される概念用語
は、 concept:(person) となる。That is, the complex type defined in the program generally represents the information of the target in the real world. Therefore, the conceptual term acquired from the name of the composite type can be a candidate for an object, and therefore this is provided to the analyst as a candidate for the object. Here, exemplifying the complex type defined in the program and the conceptual term acquired from this complex type, the complex type defined in the program is struct person {int age; int name; int sex ;} ;, the concept term acquired from this complex type is concept: (person).

【０１２１】さらに、第三のＨＲとしては、複数のファ
イルに適用された大域変数の名前から獲得した概念用語
はオブジェクトの候補とする。Further, as the third HR, the conceptual term acquired from the names of global variables applied to a plurality of files is a candidate for an object.

【０１２２】つまり、一般的にモジュール間でやり取り
するデータとして大域変数を利用することは、モジュー
ル間のカップリングを強めるために望ましくないと云わ
れている。しかし、プログラムの中で存在が広範囲に明
確なデータや、モジュール内で隠蔽されたデータに関し
ては、大域変数を利用することがある。そこで、複数の
ソースファイルに適用された大域変数は、ある程度大き
な単位のオブジェクトとして捉えることができるので、
これをオブジェクトの候補として分析者に提供する。That is, it is generally said that it is not desirable to use a global variable as data to be exchanged between modules in order to strengthen the coupling between the modules. However, global variables may be used for data whose existence is broadly defined in a program or for data hidden in a module. Therefore, the global variables applied to multiple source files can be regarded as an object of a rather large unit.
This is provided to the analyst as a candidate for the object.

【０１２３】上述のような第一ないし第三のＨＲによ
り、本実施例のプログラム解析装置はオブジェクトの候
補となる概念用語を分析者に提供するので、この分析者
の判断により所定の概念用語がオブジェクトに同定され
ることになる。With the first to third HRs as described above, the program analysis apparatus of this embodiment provides the analyst with the conceptual terms that are object candidates. You will be identified as an object.

【０１２４】そして、このようにして獲得されたオブジ
ェクトの属性と、オブジェクト間のリンクとを獲得でき
れば、プログラムの情報モデルを構築することができ
る。そこで、本実施例のプログラム解析装置は、概念用
語を属性に対応させるルールと、概念用語をリンクに対
応させるルールも、ＲＡＭ等のルール記憶手段に設定さ
れている。そこで、このようなルールに従って所定の概
念用語を属性とリンクとに対応させる処理を以下に順次
説明する。If the attributes of the objects thus obtained and the links between the objects can be obtained, the information model of the program can be constructed. Therefore, in the program analysis device according to the present embodiment, a rule that associates a concept term with an attribute and a rule that associates a concept term with a link are also set in a rule storage unit such as a RAM. Therefore, a process of associating a predetermined conceptual term with an attribute and a link according to such a rule will be sequentially described below.

【０１２５】まず、概念用語を属性に対応させるルール
としては、複合型の名前から獲得した概念用語をオブジ
ェクトとした場合に、複合型のサブフィールドの名前か
ら獲得した概念用語を属性の候補とする。First, as a rule for associating a concept term with an attribute, when a concept term acquired from a composite type name is an object, a concept term acquired from a composite type subfield name is used as an attribute candidate. .

【０１２６】つまり、複合型にサブフィールドが含まれ
る場合、このサブフィールドは複合型の“a-part-of”
であるので、複合型の名前から獲得した概念用語をオブ
ジェクトとした場合は、複合型のサブフィールドの名前
から獲得した概念用語はオブジェクトの属性となり得
る。That is, when the composite type includes a subfield, this subfield is a composite type "a-part-of".
Therefore, if the concept term acquired from the name of the composite type is an object, the concept term acquired from the name of the subfield of the composite type can be an attribute of the object.

【０１２７】ここで、プログラムに定義された複合型
と、この複合型のサブフィールドの名前から獲得される
概念用語との実例を以下に例示すると、複合型の定義
が、 struct person｛int age；int name；int sex；｝；となっている場合、概念用語は、 age，name，ｓｅｘとなる。そこで、上述のような概念用語（ａｇｅ，nam
e，sex)が、オブジェクト（person）の属性の候補とな
る。Here, exemplifying the complex type defined in the program and the conceptual term acquired from the name of the subfield of this complex type, the definition of the complex type is struct person {int age; int name; int sex;} ;, the conceptual terms are age, name, sex. Therefore, the above-mentioned conceptual terms (age, nam
e, sex) is a candidate for the attribute of the object (person).

【０１２８】このように、概念用語を属性の候補として
分析者に提供することができるので、これを分析者の判
断で属性に同定することができる。そして、このように
してオブジェクトと属性とを獲得できると、後はリンク
を獲得できれば情報モデルを構築することができる。As described above, since the concept term can be provided to the analyst as a candidate for the attribute, it can be identified as the attribute by the judgment of the analyst. Then, when the object and the attribute can be acquired in this way, the information model can be constructed after that if the link can be acquired.

【０１２９】そこで、概念用語をリンクに対応させるル
ールとしては、ルーチンの名前から獲得した動詞的な概
念用語をリンクの候補とする。この時、このリンクの方
向性は、リンク生成手段により、ルーチンの内部で行な
われた引数の操作に関する情報と、概念用語辞書に事前
に設定された動詞の格支配情報とに基づいて決定する。Therefore, as a rule for associating a concept term with a link, a verb-like concept term acquired from the name of a routine is used as a link candidate. At this time, the direction of the link is determined by the link generation means based on the information about the operation of the argument performed inside the routine and the verb case control information preset in the concept term dictionary.

【０１３０】つまり、プログラムのルーチンの名前とし
て設定される動詞的な表現は、オブジェクト間のリンク
を示すと想定している。そこで、このことを以下に順次
説明する。ルーチンの名前と引数と、ルーチンの名前か
ら獲得した概念用語との実例を以下に例示すると、ルー
チンの名前と引数とが、 add_new_customer(table,customer_name,customer_id) となっている場合、ルーチンの名前から獲得した概念用
語は、 (VERB:add CONCEPT:(new customer)) となる。That is, it is assumed that the verb-like expression set as the name of the routine of the program indicates a link between objects. Therefore, this will be sequentially described below. The following is an example of the routine name and arguments and the conceptual terms obtained from the routine name.If the routine name and arguments are add_new_customer (table, customer_name, customer_id), the routine name is The acquired conceptual term is (VERB: add CONCEPT: (new customer)).

【０１３１】ここで、このような動詞的な概念用語の同
定に利用する概念用語辞書に関して説明する。前述のよ
うに、プログラムの名前に多用される動詞の種類は多く
はないので、これを事前に予想して概念用語辞書に設定
しておくことは可能である。さらに、この概念用語辞書
には、設定された動詞の各々に対し、文法に基づく意味
や構文の情報も格支配情報として設定されている。この
格支配情報は、動詞が持つことができる深層格の種類、
その取り得る範囲と必須性、さらに、プログラム上でデ
ータ操作の観点から見た場合に、単に参照されるだけの
ものなのか、値の代入などの変更を受けるものなのか、
というような情報からなる。ここで、概念用語辞書に動
詞的な概念用語として設定した “add”を、その格支配
情報と共に以下に例示すると、（add （略語 ad）（類似語 increment）（構文品詞動詞）（格支配情報（(深層格主体)(名詞意味マーカ物 ...) (必須性なし)(データ操作参照)(リンク元)）（(深層格対象)(名詞意味マーカ物 ...) (必須性あり)(データ操作変更)(リンク先)）（(深層格受手)(名詞意味マーカ物 ...) (必須性あり)(データ操作参照)(リンク元)）と云うようになる。Here, the concept term dictionary used for identifying such verb-like concept terms will be described. As described above, since there are not many types of verbs that are frequently used in program names, it is possible to predict this and set it in the concept term dictionary. Further, in this conceptual term dictionary, information on meaning and syntax based on grammar is also set as case control information for each set verb. This case control information is the type of deep case that a verb can have,
The range and the essentiality that can be taken, and from the viewpoint of data manipulation on the program, is it simply referred to or is it subject to change such as value substitution?
It consists of such information. Here, "add" set as a verb-like concept term in the concept term dictionary is illustrated below together with its case dominance information: (add (abbreviation ad) (similar word increment) (syntactic part-of-speech verb) (case dominance information) ((Deep case subject) (Noun meaning marker object ...) (No necessity) (Refer to data manipulation) (Link source)) ((Deep case object) (Noun meaning marker object ...) (Required) (Data manipulation change) (link destination)) ((deep case receiver) (noun meaning marker object ...) (required) (data manipulation reference) (link source)).

【０１３２】そこで、このように動詞と共に概念用語辞
書に設定された格支配情報を利用して、リンクに関連す
るオブジェクトを見出す方法を以下に説明する。Therefore, a method of finding an object associated with a link by using the case dominance information set in the concept term dictionary together with the verb will be described below.

【０１３３】まず、ルーチンの入力と出力との変数を、
変数獲得手段により構文解析して明らかにすることで、
ルーチンの内部で行なわれた引数の操作に関する情報を
獲得する。なお、入力変数とは、ルーチンの仮引数か大
域変数であり、ルーチンの中で単に参照された変数を示
す。また、出力変数とは、ルーチンの仮引数か大域変数
であり、ルーチンの中で内容が変更された変数を示す。First, the variables of the input and output of the routine are
By parsing and clarifying with the variable acquisition means,
Get information about argument manipulations done inside the routine. The input variable is a dummy argument or a global variable of the routine, and indicates a variable simply referred to in the routine. The output variable is a dummy argument or a global variable of the routine, and indicates a variable whose contents are changed in the routine.

【０１３４】つぎに、ルーチンの名前から獲得した動詞
的な概念用語の格支配情報を概念用語辞書から読み出
し、その必須性の有無とデータ操作とを参照して、引数
の操作に関する情報の内容を判定する。そこで、必須性
がなくデータ操作が参照と設定されたものは、動詞の主
体と判定され、必須性がありデータ操作が変更と設定さ
れたものは、動詞の対象と判定され、必須性がありデー
タ操作が参照と設定されたものは、動詞の受手と判定さ
れる。Next, the case governing information of the verb-like concept term acquired from the name of the routine is read from the concept term dictionary, and the contents of the information regarding the operation of the argument are referred to by referring to the necessity and data operation. judge. Therefore, if there is no essentiality and the data operation is set as reference, it is judged as the subject of the verb, and if it is essential and the data operation is set as change, it is judged as the target of the verb and there is essentiality. If the data operation is set to reference, it is determined to be the recipient of the verb.

【０１３５】ここで、ルーチンの名前と引数と、ルーチ
ンの引数から獲得した三つの概念用語との実例を以下に
例示すると、ルーチンと引数との実例が、 add_new_customer(table,customer_name,customer_id) の場合、ルーチンの引数から獲得される概念用語は、 CONCEPT:(table) CONCEPT:(customer name) CONCEPT:(customer id) となる。Here, an example of the name and argument of the routine and three conceptual terms acquired from the argument of the routine will be illustrated below. When the example of the routine and argument is add_new_customer (table, customer_name, customer_id) , The conceptual term obtained from the routine argument is CONCEPT: (table) CONCEPT: (customer name) CONCEPT: (customer id).

【０１３６】そこで、ルーチン “add_new_customer”
を構文解析すると、その引数 “table”のデータ操作は
変更で、“customer name”と“customer id”とのデー
タ操作は参照であることが判明する。そこで、この情報
と動詞 “add”の格支配情報とにより、リンクの方向性
は、リンク元オブジェクト：(table) リンク先オブジェクト：(customer name)，(customer i
d) として得ることができる。Therefore, the routine "add_new_customer"
When parsing, the data operation of the argument “table” is changed, and the data operation of “customer name” and “customer id” is a reference. Therefore, based on this information and the case governing information of the verb "add", the direction of the link is: link source object: (table) link destination object: (customer name), (customer i)
d) can be obtained as

【０１３７】このように、ルーチンの名前から獲得した
動詞的な概念用語 “add”が、オブジェクト“table”
から“customer name”と“customer id”とに向かうリ
ンクの候補として分析者に提供することができるので、
これを分析者の判断でリンクに同定することができる。As described above, the verb-like conceptual term “add” acquired from the name of the routine is converted into the object “table”.
Since it can be provided to the analyst as a candidate for a link from "customer name" to "customer id",
This can be identified as a link at the discretion of the analyst.

【０１３８】そして、上述のように、本実施例のプログ
ラム解析装置は、概念用語をオブジェクトと属性とリン
クとに対応させることができ、これをプログラムの内容
理解を支援できる抽象レベルの情報として分析者に提供
することができ、この分析者は、獲得した情報に基づい
てプログラムの情報モデルを構築することができる。As described above, the program analysis device of the present embodiment can associate the concept term with the object, the attribute, and the link, and analyze this as abstract level information that can support understanding of the content of the program. The analyst can build an information model of the program based on the acquired information.

【０１３９】[0139]

【発明の効果】請求項１記載の発明は、プログラムから
内容理解のための情報を獲得するプログラム解析方法に
おいて、プログラムの識別子の名前から概念用語を獲得
するようにしたことにより、プログラムの内容を示す抽
象レベルの情報として、プログラムの内容を反映してい
る識別子の名前から、概念用語を獲得することができる
ので、分析者によるプログラムの内容理解を簡易に支援
することができる効果を有する。According to the invention described in claim 1, in the program analysis method for acquiring information for understanding the content from the program, the concept term is acquired from the name of the identifier of the program. Since the conceptual term can be acquired from the name of the identifier that reflects the content of the program as the information of the abstraction level, it is possible to easily assist the analyst in understanding the content of the program.

【０１４０】請求項２記載の発明は、プログラムの識別
子の名前に多用される省略表現が本来の表現と共に事前
に設定された概念用語辞書を設け、この概念用語辞書を
参照してプログラムの識別子の名前から省略表現を検出
し、この検出した省略表現を本来の表現に復元して概念
用語とするようにしたことにより、概念用語を理解しや
すい形態とすることができるので、分析者によるプログ
ラムの内容理解を良好に支援することができる効果を有
する。According to the second aspect of the invention, a conceptual term dictionary in which abbreviated expressions frequently used for the names of program identifiers are set in advance together with the original expressions is provided, and the conceptual term dictionary is referred to to identify the program identifiers. By detecting the abbreviation from the name and restoring the detected abbreviation to the original expression to make it a conceptual term, it is possible to make the conceptual term easier to understand. It has the effect of being able to favorably support content understanding.

【０１４１】請求項３記載の発明は、変数の名前から取
り出した字句の並びを名詞的な概念用語とし、ルーチン
の名前から取り出した字句の並びを名詞的な概念用語と
動詞的な概念用語との組とするようにしたことにより、
識別子の種類により概念用語の形態を簡易に判別するこ
とができるので、概念用語に対する以後の処理を簡略化
することができ、プログラムの内容を示す抽象レベルの
情報の生成効率を向上させることができる効果を有す
る。According to the third aspect of the present invention, the sequence of words and phrases extracted from the name of the variable is used as a noun concept term, and the sequence of words and phrases extracted from the name of the routine is defined as a noun concept term and a verb concept term. By making it a set of
Since the form of the conceptual term can be easily discriminated from the type of the identifier, the subsequent processing for the conceptual term can be simplified and the efficiency of generating abstract level information indicating the content of the program can be improved. Have an effect.

【０１４２】請求項４記載の発明は、名詞的な概念用語
の字句の並びの接続詞や前置詞を判別して係り受け関係
を認識し、この係り受け関係に従って字句の並びを名詞
句の通常表現に並び替えるようにしたことにより、概念
用語を理解しやすい形態とすることができるので、分析
者によるプログラムの内容理解を良好に支援することが
できる効果を有する。The invention according to claim 4 recognizes a dependency relation by discriminating a conjunction or a preposition in a lexical sequence of a noun-like conceptual term, and recognizes the lexical sequence into a normal expression of a noun phrase according to the dependency relation. The rearrangement allows the conceptual terms to be in a form that is easy to understand, which has the effect of being able to favorably support the analyst's understanding of the program contents.

【０１４３】請求項５記載の発明は、プログラムの識別
子の名前に多用される動詞が事前に設定された概念用語
辞書を設け、ルーチンの名前から獲得した字句の並びか
ら概念用語辞書に設定された動詞を検出すると、この字
句を動詞的な概念用語とすると共に他の字句を名詞的な
概念用語とするようにしたことにより、動詞的な概念用
語と名詞的な概念用語とを簡易に判別することができる
ので、概念用語に対する以後の処理を簡略化することが
でき、プログラムの内容を示す抽象レベルの情報の生成
効率を向上させることができる効果を有する。According to the invention described in claim 5, a concept term dictionary in which a verb frequently used for a name of a program identifier is preset is provided, and the concept term dictionary is set from a sequence of lexical words acquired from a routine name. When a verb is detected, this lexical phrase is used as a verb-like conceptual term, and other lexical phrases are also used as noun-like conceptual terms, so that a verb-like conceptual term and a noun-like conceptual term can be easily distinguished. As a result, the subsequent processing for the conceptual term can be simplified, and there is an effect that the generation efficiency of the information at the abstract level indicating the content of the program can be improved.

【０１４４】請求項６記載の発明は、概念用語の類似度
をプログラムの式や代入文における概念用語の同時出現
という事象に基づいて測定し、この類似度により多次元
尺度法で複数の概念用語の関係を示す画像を生成するよ
うにしたことにより、概念用語の相互関係を理解しやす
い形態とすることができるので、分析者によるプログラ
ムの内容理解を良好に支援することができる効果を有す
る。According to the sixth aspect of the present invention, the similarity of the concept terms is measured based on the phenomenon of simultaneous appearance of the concept terms in the expression of the program or the assignment statement, and by this similarity, a plurality of concept terms are calculated by the multidimensional scaling method. By generating the image showing the relationship between the terms, it is possible to make the mutual relationship of the conceptual terms into a form that is easy to understand, and thus it is possible to favorably assist the analyst in understanding the contents of the program.

【０１４５】請求項７記載の発明は、概念用語の類似度
をプログラムにおける型と型付けされたものとの関係と
いう事象に基づいて測定し、この類似度により多次元尺
度法で複数の概念用語の関係を示す画像を生成するよう
にしたことにより、概念用語の相互関係を理解しやすい
形態とすることができるので、分析者によるプログラム
の内容理解を良好に支援することができる効果を有す
る。The invention according to claim 7 measures the similarity of the concept terms based on the phenomenon of the relationship between the type and the typed one in the program, and by this similarity, a plurality of concept terms of the concept terms are calculated by the multidimensional scaling method. By generating the image showing the relationship, it is possible to make the mutual relationship of the conceptual terms into a form that is easy to understand, and thus it is possible to favorably support the analyst to understand the contents of the program.

【０１４６】請求項８記載の発明は、概念用語の類似度
をプログラムのファイル構成における概念用語の同時出
現という事象に基づいて測定し、この類似度により多次
元尺度法で複数の概念用語の関係を示す画像を生成する
ようにしたことにより、概念用語の相互関係を理解しや
すい形態とすることができるので、分析者によるプログ
ラムの内容理解を良好に支援することができる効果を有
する。According to the eighth aspect of the present invention, the similarity of the concept terms is measured based on the phenomenon that the concept terms simultaneously appear in the file structure of the program, and the relation between a plurality of concept terms is calculated by the multidimensional scaling method based on the similarity. By generating the image showing "," it is possible to make the mutual relationship of the conceptual terms into a form that is easy to understand, and thus it is possible to favorably support the analyst to understand the contents of the program.

【０１４７】請求項９記載の発明は、ルーチンの外部参
照の名前から獲得した概念用語を隠蔽する効果の尺度を
ルーチンの類似度として定義し、この類似度によりクラ
スタ分析で概念用語の隠蔽に基づくルーチンの階層的ク
ラスタを構築するようにしたことにより、プログラムの
内容理解を支援できる抽象レベルの情報としてプログラ
ムのモジュール構造を反映した階層的クラスタを獲得す
ることができるので、分析者によるプログラムの内容理
解を良好に支援することができる効果を有する。According to the ninth aspect of the present invention, the measure of the effect of hiding the concept term acquired from the name of the external reference of the routine is defined as the degree of similarity of the routine, and based on the degree of similarity, the cluster analysis is based on the hiding of the term. By constructing a hierarchical cluster of routines, it is possible to obtain a hierarchical cluster that reflects the module structure of the program as abstract level information that can help understand the content of the program. It has the effect of being able to support understanding well.

【０１４８】請求項１０記載の発明は、ルーチンにおけ
る概念用語の分布の度合をルーチン空間におけるエント
ロピとして計量化し、二つのルーチンをクラスタ化した
場合のエントロピの減少量として二つのルーチンの類似
度を測定するようにしたことにより、プログラムの内容
理解を支援できる抽象レベルの情報として獲得する階層
的クラスタにプログラムのモジュール構造を良好に反映
させることができるので、分析者によるプログラムの内
容理解を良好に支援することができる効果を有する。According to the tenth aspect of the present invention, the degree of distribution of the conceptual terms in the routine is quantified as entropy in the routine space, and the similarity between the two routines is measured as the entropy reduction amount when the two routines are clustered. By doing so, the module structure of the program can be reflected well in the hierarchical cluster acquired as information at the abstract level that can support the understanding of the program content, so that the analyst can favorably understand the program content. Has the effect of being able to.

【０１４９】請求項１１記載の発明は、プログラムの情
報モデルを構築するオブジェクトと属性とリンクとに概
念用語を対応させるようにしたことにより、プログラム
の内容理解を支援できる抽象レベルの情報としてオブジ
ェクトと属性とリンクとを獲得することができるので、
これに基づいて分析者はプログラムの情報モデルを構築
することができ、分析者によるプログラムの内容理解を
良好に支援することができる効果を有する。According to the eleventh aspect of the present invention, the concept terms are made to correspond to the object, the attribute and the link for constructing the information model of the program. Since you can get attributes and links,
Based on this, the analyst can construct an information model of the program, and has the effect of being able to favorably support the analyst's understanding of the program content.

【０１５０】請求項１２記載の発明は、概念用語をオブ
ジェクトに同定するルールとして、上位のモジュールか
ら獲得した概念用語はオブジェクトの候補とするように
したことにより、プログラムの情報モデルを構築するオ
ブジェクトを簡易に獲得することができるので、分析者
によるプログラムの内容理解を良好に支援することがで
きる効果を有する。According to the twelfth aspect of the present invention, as a rule for identifying a conceptual term in an object, the conceptual term acquired from a higher-order module is set as an object candidate. Since it can be easily acquired, it has the effect of being able to favorably support the analyst's understanding of the program contents.

【０１５１】請求項１３記載の発明は、概念用語をオブ
ジェクトに同定するルールとして、複合型の名前から獲
得した概念用語はオブジェクトの候補とするようにした
ことにより、プログラムの情報モデルを構築するオブジ
ェクトを簡易に獲得することができるので、分析者によ
るプログラムの内容理解を良好に支援することができる
効果を有する。According to the thirteenth aspect of the present invention, as a rule for identifying a conceptual term to an object, the conceptual term acquired from the name of the composite type is set as an object candidate, thereby constructing the information model of the program. Since it can be easily obtained, it has the effect of being able to favorably support the analyst's understanding of the contents of the program.

【０１５２】請求項１４記載の発明は、概念用語をオブ
ジェクトに同定するルールとして、複数のファイルに適
用された大域変数の名前から獲得した概念用語はオブジ
ェクトの候補とするようにしたことにより、プログラム
の情報モデルを構築するオブジェクトを簡易に獲得する
ことができるので、分析者によるプログラムの内容理解
を良好に支援することができる効果を有する。According to the fourteenth aspect of the present invention, as a rule for identifying a conceptual term in an object, the conceptual term acquired from the names of global variables applied to a plurality of files is set as a candidate for the object. Since the object for constructing the information model can be easily acquired, it has the effect of being able to favorably support the analyst's understanding of the contents of the program.

【０１５３】請求項１５記載の発明は、概念用語を属性
に同定するルールとして、複合型の名前から獲得した概
念用語をオブジェクトとした場合に、複合型のサブフィ
ールドの名前から獲得した概念用語を属性の候補とする
ようにしたことにより、プログラムの情報モデルを構築
するオブジェクトの属性を簡易に獲得することができる
ので、分析者によるプログラムの内容理解を良好に支援
することができる効果を有する。According to the fifteenth aspect of the present invention, as a rule for identifying a concept term as an attribute, when the concept term acquired from the composite type name is used as an object, the concept term acquired from the composite type subfield name is used. By making the candidates of the attributes, the attributes of the objects that construct the information model of the program can be easily obtained, and therefore, there is an effect that the analyst can favorably understand the contents of the program.

【０１５４】請求項１６記載の発明は、概念用語をリン
クに同定するルールとして、ルーチンの名前から獲得し
た動詞的な概念用語をリンクの候補とし、このリンクの
方向性はルーチンの内部で行なわれた引数の操作に関す
る情報と概念用語辞書に事前に設定された動詞の格支配
情報とに基づいて決定するようにしたことにより、プロ
グラムの情報モデルを構築するオブジェクト間のリンク
を簡易に獲得することができるので、分析者によるプロ
グラムの内容理解を良好に支援することができる効果を
有する。According to the sixteenth aspect of the present invention, as a rule for identifying a concept term in a link, a verb-like concept term acquired from the name of a routine is used as a link candidate, and the direction of this link is performed inside the routine. It is possible to easily obtain links between objects that construct the information model of the program by making a decision based on the information about the operation of the argument and the verb case control information preset in the conceptual term dictionary. As a result, it is possible to favorably support the analyst's understanding of the contents of the program.

【０１５５】請求項１７記載の発明は、プログラムから
内容理解のための情報を獲得するプログラム解析装置に
おいて、プログラムの識別子の名前から概念用語を獲得
する用語生成手段を設けたことにより、プログラムの内
容を示す抽象レベルの情報として、プログラムの内容を
反映している識別子の名前から、概念用語を獲得するこ
とができるので、分析者によるプログラムの内容理解を
簡易に支援することができる効果を有する。According to the seventeenth aspect of the present invention, in the program analysis apparatus for acquiring information for understanding the content from the program, the term generating means for acquiring the conceptual term from the name of the program identifier is provided, so that the content of the program is Since the conceptual term can be obtained from the name of the identifier that reflects the content of the program as the information of the abstract level indicating the, there is an effect that the analyst can easily support the understanding of the content of the program.

[Brief description of drawings]

【図１】本発明のプログラム解析装置によるプログラム
解析方法の一実施例を例示する模式図である。FIG. 1 is a schematic diagram illustrating an example of a program analysis method by a program analysis device of the present invention.

【図２】プログラム解析装置がプログラム解析方法でプ
ログラムの内容理解を支援できる抽象レベルの情報の一
つとして生成した概念用語の関係を示す画像を例示する
平面図である。FIG. 2 is a plan view exemplifying an image showing a relationship of conceptual terms generated as one of pieces of abstraction level information that can be assisted by the program analysis method to understand the contents of a program by the program analysis method.

【図３】プログラム解析装置がプログラム解析方法の対
象とするプログラムのモジュールの階層構造を例示する
模式図である。FIG. 3 is a schematic diagram exemplifying a hierarchical structure of modules of a program targeted by a program analysis method by a program analysis device.

【図４】プログラムの内容理解を支援できる抽象レベル
の情報の一つとして生成した階層的クラスタである階層
的樹形図を例示する模式図である。FIG. 4 is a schematic diagram exemplifying a hierarchical tree diagram that is a hierarchical cluster generated as one of pieces of abstraction level information that can support understanding of the content of a program.

Claims

[Claims]

1. In a program analysis method for obtaining information for understanding the contents of a program formed by a hierarchical structure of modules in which identifiers of variables, routines, types, macros, etc. are set, the name of the program identifier is used. A program analysis method characterized in that a conceptual term is acquired.

2. A conceptual term dictionary in which abbreviated expressions frequently used for the names of program identifiers are set in advance together with the original expressions is provided, and the abbreviated expressions are detected from the names of program identifiers by referring to this conceptual term dictionary. The program analysis method according to claim 1, wherein the detected abbreviated expression is restored to the original expression to be a conceptual term.

3. A lexical sequence extracted from a variable name is used as a noun-like conceptual term, and a lexical sequence extracted from a routine name is used as a set of noun-like and verb-like conceptual terms. The program analysis method according to claim 1, wherein the program analysis method is performed.

4. A method of recognizing a dependency relation by discriminating a conjunction or a preposition in a lexical sequence of a noun-like conceptual term, and rearranging the lexical sequence into a normal expression of a noun phrase according to the dependency relation. 4. The program analysis method according to claim 3, wherein

5. A concept term dictionary in which a verb frequently used for a name of a program identifier is preset is provided, and when the verb set in the concept term dictionary is detected from a lexical sequence acquired from a routine name, 4. The program analysis method according to claim 3, wherein the token is used as a verb-like conceptual term and the other tokens are used as noun-like conceptual terms.

6. The similarity of concept terms is measured based on the phenomenon of simultaneous appearance of concept terms in an expression of a program or an assignment statement, and an image showing a relationship between a plurality of concept terms is obtained by multidimensional scaling by this similarity. The program analysis method according to claim 1, wherein the program analysis method is generated.

7. The similarity of concept terms is measured based on the phenomenon of the relationship between types and types in a program, and the similarity is used to generate an image showing the relationship between a plurality of concept terms by multidimensional scaling. 7. The program analysis method according to claim 1, wherein the program analysis method is performed.

8. The similarity of conceptual terms is measured based on the phenomenon of simultaneous appearance of conceptual terms in the file structure of a program, and an image showing the relationship between a plurality of conceptual terms is generated by multidimensional scaling by this similarity. The program analysis method according to claim 1, 6 or 7, characterized in that.

9. A measure of the effect of concealing a conceptual term obtained from the name of an external reference of a routine is defined as a similarity of the routine, and the similarity analyzes a hierarchical cluster of routines based on the concealment of the conceptual term by cluster analysis. The program analysis method according to claim 1, wherein the program analysis method is constructed.

10. The degree of distribution of conceptual terms in a routine is quantified as an entropy in a routine space,
10. The program analysis method according to claim 9, wherein the similarity between the two routines is measured as the amount of entropy reduction when the two routines are clustered.

11. The program analysis method according to claim 1, wherein conceptual terms are made to correspond to an object, an attribute, and a link that construct an information model of the program.

12. The program analysis method according to claim 11, wherein, as a rule for identifying the conceptual term as an object, the conceptual term acquired from a higher-order module is set as an object candidate.

13. The program analysis method according to claim 11, wherein the rule for identifying the conceptual term to the object is that the conceptual term acquired from the name of the composite type is set as the candidate of the object.

14. A rule for identifying a concept term to an object, wherein a concept term acquired from the names of global variables applied to a plurality of files is set as a candidate for the object. Alternatively, the program analysis method according to item 13.

15. As a rule for identifying a concept term as an attribute, when a concept term acquired from a composite type name is used as an object, a concept term acquired from a composite type subfield name is set as an attribute candidate. 15. The program analysis method according to claim 11, 12, 13 or 14.

16. As a rule for identifying a concept term to a link, a verb-like concept term acquired from the name of a routine is used as a link candidate, and the directionality of this link is information regarding an operation of an argument performed inside the routine. And the case control information of the verb preset in the concept term dictionary.
The program analysis method according to 3, 14 or 15.

17. The name of a program identifier in a program analysis device for acquiring information for understanding the contents of a program formed by a hierarchical structure of modules in which identifiers such as variables, routines, types and macros are set. A program analysis device comprising a term generation means for acquiring a conceptual term from the.