JPH02166522A

JPH02166522A - Data converting method

Info

Publication number: JPH02166522A
Application number: JP63320667A
Authority: JP
Inventors: Hironobu Kima; 啓伸來間; Kazuo Hamanaka; 濱中　和男
Original assignee: Hitachi Ltd; Hitachi Keiyo Engineering Co Ltd
Current assignee: Hitachi Ltd; Hitachi Keiyo Engineering Co Ltd
Priority date: 1988-12-21
Filing date: 1988-12-21
Publication date: 1990-06-27

Abstract

PURPOSE:To improve the memory efficiency of a computer by performing a pattern matching job at each process for a syntax analysin which is applied to each part of an input symbol train and disusing the syntax analyzing trees that failed in the pattern matching jobs. CONSTITUTION:A recording means which records the data having a structure for each partial tree of a conventional syntax analyzing tree is provided. The pattern matching jobs are carried out based on the information which produces the data having structures and in each process for syntax analysis that is carried out to each part of a symbol train. Thus the data having structures are produced when the pattern matching jobs succeed. At the same time, the data having structures are recorded via the means which records the data having a structure for each partial tree of a syntax analyzing tree. Then the syntax analyzing tree is disused when a pattern matching job fails. In such a manner, a pattern matching job is carried out in each process for the syntax analysis which is performed to each part of a symbol train. Then the syntax analyzing trees which failed in the pattern matching jobs are disused. Thus the unnecessary syntax analyzing trees can be omitted and therefore the memory efficiency of a computer is improved.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、データ処理装置による記号列の変換方法に関
し、外部から与えられる変換規則を用いた。記号列から
構造を持つデータへの変換に関する。このような変換は
、機械による翻訳又は言語処理、例えば、コンパイラ、
インタプリタ恨のプログラミング言語処理系にとって不
可欠な機能である。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a method for converting symbol strings by a data processing device, using conversion rules given from the outside. Concerning the conversion from symbol strings to structured data. Such conversion may be accomplished by machine translation or language processing, such as a compiler,
This is an essential function for interpreted programming language processing systems.

[Conventional technology]

一般に、データ変換処理では変換過程で中間データを設
けて、入力データから中間データへの変換、中間データ
から目的となるデータへの変換と段階的に変換を行なう
、また、多くの場合、中間データとして木構造が用いら
れる。中間データとして木構造を用いると、データ変換
をする際にメンテナンスしやすいという利点がある。Generally, in data conversion processing, intermediate data is provided during the conversion process, and conversion is performed in stages such as converting input data to intermediate data and converting intermediate data to target data. A tree structure is used as Using a tree structure as intermediate data has the advantage of ease of maintenance when converting data.

一般に、入力データから中間データを生成する処理は構
文解析処理と呼ばれ、また、構文解析処理の結果得られ
る木構造のデータは構文解析木と呼ばれる。Generally, the process of generating intermediate data from input data is called parsing process, and the tree-structured data obtained as a result of parsing process is called parsing tree.

構文解析では１文法は通常、生成規則と呼ばれる関係の
集合Ｐ、終端記号の集合Ｔ、非終端記号の集合Ｎ、開始
記号Ｓによって記述される。生成規則は、左辺と右辺よ
り構成され、左辺の記号が右辺の記号列によって書き換
えられることを表す。In syntactic analysis, one grammar is usually described by a set P of relations called production rules, a set T of terminal symbols, a set N of non-terminal symbols, and a starting symbol S. A production rule consists of a left side and a right side, and indicates that the symbol on the left side is rewritten by the symbol string on the right side.

非終端記号は、Ｐによって書き換えられる記号であり、
終端記号はＰによって書き換えられない記号である。Ｓ
は、非終端記号であり、入力文が文法に適合するならば
、ＳにＰを適用することにより入力文に書き換えること
が出来る。A non-terminal symbol is a symbol that is rewritten by P,
A terminal symbol is a symbol that cannot be rewritten by P. S
is a non-terminal symbol, and if the input sentence conforms to the grammar, it can be rewritten into the input sentence by applying P to S.

文法の基本的なりラスの一つに、文脈自由文法がある。One of the basic concepts of grammar is context-free grammar.

文脈自由文法とは、各々の生成規則の左辺を１つの非終
端記号とし、右辺を０個以上の記号の列に制約した文法
である０文脈自由文法の各各の文法規則はＡ　−＋　ａ　　Ａ　Ｅ　Ｎ　、　　ａ　Ｅ　（Ｎ　Ｕ
　Ｔ　）申として表される。ここで、（）　串は集合の
任意の要素の０個以上の列を表す。A context-free grammar is a grammar in which the left-hand side of each production rule is a non-terminal symbol, and the right-hand side is constrained to a string of 0 or more symbols. Each grammar rule in a 0-context-free grammar is A −+ a A E N , a E (N U
T ) Represented as a monkey. Here, () represents a sequence of zero or more arbitrary elements of the set.

構文解析処理には１本質的に異なる２つの方法がある。There are two essentially different methods for parsing.

１つの方法は、与えられた構文規則専用の構文解析処理
装置を作成するものである。この方法による構文解析処
理装置の構成は、与えられた構文規則が強く反映された
ものになり、一般に、より効率的な扱いやすいシステム
が構築できる。One method is to create a parser specifically for a given syntax rule. The configuration of the syntax analysis processing device using this method strongly reflects the given syntax rules, and in general, a more efficient and easy-to-use system can be constructed.

もう１つの方法は、全ての構文規則に対して有効な一般
的な構文解析処理装置を設計することである。この場合
１個々の構文規則は外部から入力され、処理装置はこの
構文規則に基づいて作動する。そのため、この方法によ
る構文解析処理は、高度な柔軟性をもつという特徴をも
つ。Another method is to design a general parser that is valid for all syntax rules. In this case, an individual syntax rule is input from the outside and the processing device operates on the basis of this syntax rule. Therefore, the parsing process using this method is characterized by a high degree of flexibility.

後者の横文解析処理方法を用いたデータ変換方法につい
ては、特願昭６２−１２０２０１に記載されているので
、ここではその概要を説明する。The latter data conversion method using the horizontal text analysis processing method is described in Japanese Patent Application No. 120201/1983, so an outline thereof will be explained here.

文脈自由文法で記述された構文規則に出力項の構造を終
端記号と非終端記号に関連付けて（例えば引数の形で）
記述した変換規則を外部から受は取り、この変換規則に
基づいて入力記号列の構造を構文解析し構文解析木を生
成する。次に、上記構文解析処理の結果得られた構文解
析木を分離しながら、入力記号列の構造を対応する終端
記号と非終端記号に関連した出力項の構造の記述に従っ
て、出力項を生成する。Associating the structure of output terms with terminal and non-terminal symbols (e.g. in the form of arguments) to the syntactic rules described in context-free grammars
The described conversion rules are received from the outside, and the structure of the input symbol string is parsed based on the conversion rules to generate a parse tree. Next, while separating the parse tree obtained as a result of the parsing process, output terms are generated according to the description of the structure of output terms related to terminal symbols and non-terminal symbols corresponding to the structure of the input symbol string.

ここで、項を引数を持つ定数記号（関数記号）を用いて
次のように定義する。Here, terms are defined as follows using constant symbols (function symbols) with arguments.

（１）０引数の関数記号を項である。(1) A function symbol with 0 arguments is a term.

（２）ｆがｎ引数の関数記号、ｔｌ、ｔ２．・・・ｔｎ
が項であれば、ｆ（ｔ　１．ｔ　２．−、ｔ　ｎ）も項
である。(2) f is a function symbol with n arguments, tl, t2. ...tn
If is a term, then f(t1.t2.-, tn) is also a term.

また本明細書では、上記の出力項の構造の記述を（出力
）項を生成するための情報と呼ぶ。Further, in this specification, the description of the structure of the output term is referred to as information for generating the (output) term.

[Problem to be solved by the invention]

上記従来技術では、まず、構文解析処理を行い、構文解
析処理が全て終了した後、構文解析処理の結果得られる
構文解析木を分離しながら構造を持つデータを生成する
情報によりパターンマツチを行ない、構造を持つデータ
を生成している。このため、構文解析処理時に生成され
る構文解析木のうち、構文解析処理終了後に行なわれる
パターンマツチに失敗することにより不必要となる構文
解析木も、構文解析処理が全て終了し、その構文解析木
のパターンマツチが行なわれるまでのあいだ保持されて
いなければならず、これは計算機のメモリの効率を損な
うものである。In the above conventional technology, first, a syntax analysis process is performed, and after all the syntax analysis processes are completed, a pattern match is performed using information that generates structured data while separating the syntax parse tree obtained as a result of the syntax analysis process. Generates structured data. Therefore, among the parsing trees generated during parsing processing, parsing trees that become unnecessary due to a failure in pattern matching after the parsing processing is completed are not included in the parsing process. It must be held until the tree pattern is matched, which impairs the efficiency of the computer's memory.

Ｃ６題を解決するための手段〕本発明においては、従来の構文解析木の各部分木単位に
構造を持つデータを記録する手段を設け、記号列の各部
分にたいして行なわれ構文解析処理の各過程で、構造を
持つデータを生成するための情報によりパターンマツチ
を行い、パターンマツチに成功した場合は構造を持つデ
ータを生成し、上記構文解析木の各部分木単位の横道を
持つデータを記録する手段により上記構造を持つデータ
を記録し、パターンマツチに失敗した場合は、この構文
解析木を破棄する。Means for Solving Problem C6] In the present invention, means for recording data having a structure in each subtree unit of the conventional syntax parsing tree is provided, and each step of the parsing process performed for each part of a symbol string is provided. Then, pattern matching is performed using the information to generate structured data, and if the pattern matching is successful, structured data is generated, and data with side paths for each subtree of the above parse tree is recorded. Data having the above structure is recorded by means, and if pattern matching fails, this parse tree is discarded.

[Effect]

本発明は、記号列の各部分に対して行われる構文解析処
理の各過程でパターンマツチを行い、パターンマツチに
失敗した構文解析木を破棄することにより、不必要な碑
文解析木を持つことがなくなる。The present invention performs pattern matching in each process of parsing processing performed on each part of a symbol string, and discards parsing trees that fail in pattern matching, thereby eliminating unnecessary inscription parsing trees. It disappears.

〔Example〕

第１図は、本発明のデータ変換処理の概要を示すフロー
チャートである。FIG. 1 is a flowchart showing an overview of data conversion processing of the present invention.

まず、外部より、変換規則を入力しく１０１）、入力部
により記号列から記号を１記号入力する（１０２）、も
し、記号列の終わりなら（１０３）、項を出力しく１０
４）処理を終了する。記号列の終わりでなければ構文解
析処理を行い（１０５）、項を生成し生成された項を項
の集合に加え（１０６）、ステップ１０２へ戻る。First, input the conversion rule from the outside (101), input one symbol from the symbol string using the input section (102), and if it is the end of the symbol string (103), output the term (101).
4) End the process. If it is not the end of the symbol string, a parsing process is performed (105), a term is generated, the generated term is added to the term set (106), and the process returns to step 102.

第２図（ａ）は、本発明を実施するハードウェア構成の
一例である。２１は制御＠置、２２は演算装置、２３は
入力装置、２４は記憶装置、２５は出力装置である。ま
た、点線の矢印は制御情報の流れであり、実線の矢印は
データの流れである。FIG. 2(a) is an example of a hardware configuration for implementing the present invention. 21 is a control unit, 22 is an arithmetic unit, 23 is an input device, 24 is a storage device, and 25 is an output device. Further, dotted line arrows indicate the flow of control information, and solid line arrows indicate the flow of data.

第２図（ｂ）は、本発明の基本的な機能構成を示す図で
ある。この回において、１は入力記号列であり、入力部
２から入力されろ。３は出力項であり入力部２で入力さ
れた記号列を変換規則４に従って変換し、生成された項
を出力する。５は構文解析部であり、構文解析処理に携
わる。構文解析処理は、変換規則記憶部８で記憶されて
いる変換規則に基づいて行われる。構文解析処理の結果
は、構文解析木記憶部９に記憶される。また、構文解析
木記憶部９には項を生成するための情報も記憶される。FIG. 2(b) is a diagram showing the basic functional configuration of the present invention. In this case, 1 is the input symbol string, which is input from the input unit 2. Reference numeral 3 denotes an output term, which converts the symbol string input at the input unit 2 according to conversion rule 4, and outputs the generated term. Reference numeral 5 denotes a syntactic analysis unit, which is involved in syntactic analysis processing. The syntax analysis process is performed based on the conversion rules stored in the conversion rule storage section 8. The result of the parsing process is stored in the parsing tree storage section 9. The parse tree storage unit 9 also stores information for generating terms.

′Ｍｔ文解析木分解部６は、構文解析木記憶部９に記憶
されている構文解析処理の結果の構文解析木を分解する
。項生酸部７は構文解析木記憶部６で分解された構文解
析木から、記憶部９に記憶されている項を生成するため
の情報により項を生成する。'Mt The sentence parse tree decomposition unit 6 decomposes the parse tree resulting from the parse process stored in the parse tree storage unit 9. The term generation unit 7 generates terms from the parse tree decomposed by the parse tree storage unit 6 using information for generating terms stored in the storage unit 9.

本実施例では、構文解析処理にジエイ　アーレイ（Ｊａ
ｙ　Ｅａｒｌａｙ）の構文解析アルゴリズムを用いる。In this embodiment, the parsing process uses Ja
y Early) parsing algorithm is used.

Ｊａｙ　Ｅａｒｌｅｙの構文解析アルゴリズムは、１９
７０年２月のシーエイシーエム（ＣＡＣＭ）１３巻２号
掲載のＪａｙ　Ｂａｒｌｅｙの論文「アン・エフイシン
ト・コンチクストフリー・パージング・アルゴリズムＪ
　　（Ａｎ　ｅｆｆｉｃｉｅｎｔ　Ｃｏｎｔｅｘｔ　Ｆ
ｒｅｅ　ＰａｒｓｉｎｇＡＩＨｏｒｉｔｈｍ）に詳述さ
れているので、ここではその概要を説明する。Jay Earley's parsing algorithm is 19
Jay Barley's paper "An Efficient Concentration-Free Purging Algorithm J" published in CACM Vol. 13, No. 2, February 1970.
(An efficient Context F
ree Parsing AI Horithm), so an overview thereof will be explained here.

生成規則の右辺の記号列を一つの印によって分割した印
付き生成規則を用い、上記印付き生成規則と入力記号列
の記号の位置からなる状態を導入し、入力記号列から一
記号読み取るごとに生成される状態を集合にまとめるこ
とにより、構文解析の過程で適用されつつある生成規則
を管理する。Using a marked production rule in which the symbol string on the right side of the production rule is divided by one mark, we introduce a state consisting of the above marked production rule and the symbol position of the input symbol string, and each time one symbol is read from the input symbol string, By organizing the generated states into sets, we manage the production rules that are being applied during the parsing process.

状態は、印付き生成規則の右辺の印の左側の記号列が、
１の列に井き換えられることを意味し、生成規則の適用
状況を表現する。The state is that the symbol string to the left of the mark on the right side of the marked production rule is
It means that it is changed to a column of 1, and expresses the application status of the production rule.

この状態に対応する構文解析木は、状態に対応したポイ
ンタの列を設けて、印付き生成規則の右辺の印の左側の
記号列に対応させ、各々の記号の書き換えの際に適用さ
れた状態をポイントさせることにより表現する。The parse tree corresponding to this state has a string of pointers corresponding to the state, and corresponds to the symbol string on the left side of the mark on the right side of the marked production rule, and the state applied when rewriting each symbol. Expressed by pointing.

文脈自由文法では、１つ記号の書き換えの際に適用され
る生成規則は一般に複数存在するため。In context-free grammars, there are generally multiple production rules that are applied when rewriting a single symbol.

状態に対応したポインタを集合化する。ポインタを集合
化することは、複数の閘文解析本のノードを共用するこ
とにより重ねあわせて表すことに相当する。Collect pointers corresponding to the state. Collecting pointers corresponds to expressing them in an overlapping manner by sharing the nodes of multiple lock analysis books.

Ｅａｒｌｅｙの構文解析アルゴリズムでは、全ての文脈
自由文法に基づく構文解析が可能であり、文法があいま
い性を持つ場合、与えられた入力文から生成可能な全て
の構文解析木を処理の重複なしに生成する。Earley's parsing algorithm allows parsing based on all context-free grammars, and when the grammar has ambiguity, it generates all parse trees that can be generated from a given input sentence without duplication of processing. do.

また、本実施例では、上記印付き生成規則に構造を持つ
データを生成するための情報を持たせたものを変換規則
要素と呼ぶ。Furthermore, in this embodiment, the marked generation rule with information for generating structured data is called a conversion rule element.

第３図は、変換規則配置部で記憶される変換規則要素の
詳細である。変換規則要素は、左辺の非終端記号、右辺
にある印の右側にある非終端記号又は終端記号、次の変
換規則要素へのポインタ、出力項を生成するための情報
、中間ノード集金子へのポインタからなる。出力項を生
成するための情報は、構文解析処理の過程で、中間ノー
ドを生成するときに中間ノードに持たせる。変換規則要
素は、文法を表現する静的な情報であり、構文解析等の
処理過程で生成、更新、消滅はしない。FIG. 3 shows details of the conversion rule elements stored in the conversion rule placement section. A conversion rule element consists of a non-terminal symbol on the left side, a non-terminal symbol or terminal symbol to the right of the mark on the right side, a pointer to the next conversion rule element, information for generating an output term, and a pointer to an intermediate node collection. Become. Information for generating output terms is provided to intermediate nodes when they are generated in the process of parsing. The conversion rule element is static information that expresses the grammar, and is not generated, updated, or deleted during processing such as syntax analysis.

第４図（ａ）〜（ｄ）は、構文解析本記憶部で記憶され
る情報の詳細である。FIGS. 4(a) to 4(d) show details of the information stored in the parsing main storage unit.

４０１は中間ノード集合子である。中間ノード集合子は
、中間ノード結合子と他の構文解析木に属する中間ノー
ド集金子へのポインタの組である。401 is an intermediate node aggregator. An intermediate node aggregator is a set of an intermediate node connector and a pointer to an intermediate node aggregator belonging to another parse tree.

４０２は中間ノード結合子である。中間ノード結合子は
、中間ノードまたはリーフへのポインタと、同じ構文解
析木に属する中間ノード集合子へのポインタの組である
。402 is an intermediate node connector. An intermediate node connector is a set of a pointer to an intermediate node or leaf and a pointer to an intermediate node aggregator belonging to the same parse tree.

４０３は中間ノードの詳細である。中間ノードは、記号
の位置、中間ノード集合子へのポインタ、出力項を生成
するための情報、項の集合へのポインタから構成されて
いる。出力項を生成するための情報は、構文解析処理の
過程で、中間ノードを生成するときに、変換規則要素が
もつ項を生成するための情報を中間ノードに持たせたも
のである。403 is the details of the intermediate node. An intermediate node consists of the position of a symbol, a pointer to an intermediate node aggregator, information for generating an output term, and a pointer to a set of terms. The information for generating the output term is the information for generating the term of the conversion rule element that is provided in the intermediate node when generating the intermediate node in the process of syntactic analysis processing.

項の集合へのポインタは、ノードが管理する各々の構文
解析木から生成された項を要素としてもつ項の集合への
ポインタである。A pointer to a set of terms is a pointer to a set of terms whose elements are terms generated from each parse tree managed by the node.

４０４はリーフである。リーフは、記号の位置。404 is a leaf. The leaf is the position of the symbol.

出力項を生成するための情報、項の集合へのポインタで
構成される。リーフの項の集合へのポインタにポインタ
される項の集合は、ただ１つの項を要素として持つ項の
集合である。It consists of information for generating output terms and a pointer to a set of terms. The set of terms pointed to by the pointer to the leaf term set is a set of terms that has only one term as an element.

第５図は、データ変換処理のメインルーチンのフローチ
ャートである。FIG. 5 is a flowchart of the main routine of data conversion processing.

まず、開始記号を読み込み（５０１）、開始記号と初期
の記号の位置を引数にして手続き１を起動する（５０２
）。次に、入力記号列から１つの記号と入力記号列上の
その記号の位置を読み込む（５０３）、もし、入力記号
列の終わりなら（５０４）、記号の位置を引数に手続き
５を起動する（５０５）。First, read the start symbol (501) and start procedure 1 with the start symbol and the position of the initial symbol as arguments (502
). Next, read one symbol from the input symbol string and the position of that symbol on the input symbol string (503). If it is the end of the input symbol string (504), start procedure 5 with the symbol position as an argument ( 505).

入力記号列の終わりでなければ、記号とその位置を引数
にして手続き２を起動する。If it is not the end of the input symbol string, procedure 2 is started with the symbol and its position as arguments.

第６図は、手続き１のフローチャートである。FIG. 6 is a flowchart of procedure 1.

まず、引数の記号を左辺に持つ変換規則要素を探しく６
０１）、存在しなければ（６０２）処理を終了する。次
に、引数の記号の位置を持つ中間ノードが存在すれば（
６０３）、ステップ６０１へ戻る。引数の記号の位置を
持つ中間ノードが存在しなければ、中間ノードを生成し
く６０４）。First, find the conversion rule element that has the argument symbol on the left side6
01), if it does not exist (602), the process ends. Next, if there is an intermediate node with the position of the symbol of the argument (
603), return to step 601. If there is no intermediate node having the position of the symbol of the argument, generate the intermediate node (604).

空の中間ノード集合子を生成し集合に加え（６０５）、
変換規則要素と中間ノードを引数にして手続き３を起動
しく６０６）、ステップ６０１へ戻る。Generate an empty intermediate node setter and add it to the set (605),
Invoke procedure 3 using the conversion rule element and intermediate node as arguments (606), and return to step 601.

第７図は２手続き２のフローチャートである。FIG. 7 is a flowchart of Procedure 2.

まず、リーフを生成しく７０１）、引数の記号を印の右
側に持つ変換規則要素を捜す（７０２）。First, a leaf is generated (701), and a conversion rule element having an argument symbol on the right side of the mark is searched for (702).

もし、存在しなければ（７０３）処理を終了する。If it does not exist (703), the process ends.

次に、中間ノードの記号の位置が引数の記号の位置につ
ながらなければ（７０４）ステップ７０２へ戻る。中間
ノードの記号の位置が引数の記号に位置につながれば、
中間ノード集合子へのポインタを得て（７０５）、次の
変換規則要素を得て（７０６）、記号の位置を伸ばした
中間ノードを生成して中間ノードの集合に加える（７０
７）。Next, if the intermediate node symbol position does not connect to the argument symbol position (704), the process returns to step 702. If the position of the symbol of the intermediate node is connected to the position of the symbol of the argument, then
Obtain a pointer to the intermediate node aggregator (705), obtain the next conversion rule element (706), generate an intermediate node with the position of the symbol extended, and add it to the intermediate node set (70).
7).

次に、リーフへのポインタと中間ノード集合子へのポイ
ンタを組にしで、中間ノード結合子を生成しく７０８）
、変換規則要素と中間ノードを引数にして手続き３を起
動し、ステップ７０２へ戻る。Next, create an intermediate node connector by pairing the pointer to the leaf and the pointer to the intermediate node aggregator (708)
, the conversion rule element and the intermediate node are used as arguments to start procedure 3, and the process returns to step 702.

第８図は１手続き３のフローチャートである。FIG. 8 is a flowchart of 1 procedure 3.

まず、引数の変換規則要素の印が右辺の左端にあれば（
８０１）、左辺の記号と引数の中間ノードを引数にして
１手続き４を起動しく８０２）処理を終了する。引数の
変換規則要素の印が右辺の右端になく、印の右側の記号
が終端記号でな！−１れば（８０３）、印の右側の記号
を引数にして手続き１を起動しく８０４）処理を終了す
る。印の右側の記号が終端記号であれば処理を終了する
。First, if the mark of the conversion rule element of the argument is at the left end of the right side (
801), 1 procedure 4 is started using the symbol on the left side and the intermediate node of the argument as an argument, and 802) the process ends. The mark of the argument conversion rule element is not at the right end of the right side, and the symbol to the right of the mark is not a terminal symbol! If it is -1 (803), start procedure 1 using the symbol on the right side of the mark as an argument (804) and end the process. If the symbol to the right of the mark is a terminal symbol, the process ends.

第９図は、手続き４のフローチャートである。FIG. 9 is a flowchart of procedure 4.

まず、引数の記号が印の右側にある変換規則要素を捜し
く９０１．）、もし存在しなければ（９０２）処理を終
了する。次に、中間ノードの記号の位置が、引数の記号
の位置とつながらなければ（９０３）ステップ９０１へ
戻る。中間ノードの記号の位置が、引数の記号の位置と
つながれば、次の変換規則要素を得て（９０４，）、パ
ターンマツチを行う（９０５）−パターンマツチに失敗
すれば（９０６）ステップ９０１に戻る。パターンマツ
チに成功すれば、記号の位置を伸ばした中間ノードを生
成し集合に加え（９０７）、項を生成し項の集合に加え
（９０８）−中間ノード集合子を生成し集合に加える（
９０９）。次に変換規則要素と中間ノードを引数にして
手続き３を起動しく９１．０）、ステップ９０１へ戻る
。First, search for the conversion rule element whose argument symbol is on the right side of the mark 901. ), if it does not exist (902), the process ends. Next, if the position of the intermediate node symbol is not connected to the position of the argument symbol (903), the process returns to step 901. If the position of the symbol of the intermediate node is connected to the position of the symbol of the argument, obtain the next conversion rule element (904,) and perform pattern matching (905) - If pattern matching fails (906), proceed to step 901. return. If the pattern match is successful, an intermediate node is generated by extending the position of the symbol and added to the set (907), a term is generated and added to the set of terms (908) - an intermediate node aggregator is generated and added to the set (908).
909). Next, start procedure 3 using the conversion rule element and intermediate node as arguments (91.0), and return to step 901.

ここで、パターンマツチとは、構造を持つデータを生成
する情報により行われ、データの構造を等しくすること
が可能かを調べる処理である。構造を持つデータは、変
数を持つことができ、パターンマツチに成功すれば、変
数のバインドが行われる。例えば゛″で始まる名前を変
数とし、ｆ（ｘ、　　　ｙ）とｆ（ａ、ｂ）についてパターンマツチを行うとパターンマツチは成功
し、　　Ｘにａが、　ｙにｂがそれぞれバインドされる
。Here, pattern matching is a process performed using information that generates structured data to check whether the data structures can be made equal. Structured data can have variables, and if the pattern match is successful, the variables are bound. For example, if a name starting with ``'' is used as a variable and a pattern match is performed for f(x, y) and f(a, b), the pattern match is successful and a is bound to X and b is bound to y.

ｆ（−χ、−ｙ）とｇ（ｃ＊ｄ）についてパターンマツチを行うとパターンマツチは失敗
する。If pattern matching is performed for f(-χ, -y) and g(c*d), the pattern matching will fail.

第１０図は、手続き５のフローチャートである。FIG. 10 is a flowchart of procedure 5.

まず、開始記号を左辺に持、印が右辺の右端にある変換
規則要素を捜す（１００１）、ノードの記号の位置が、
初期位置と引数の記号の位置−１でなければ（１００２
）構文エラーを出力する（１００３）、ノードの記号の
位置が、初期位置と引数の記号の位置−１であれば、ノ
ードの項を出力しく１００４）、処理を終了する。First, search for a conversion rule element with the start symbol on the left side and the mark at the right end of the right side (1001).The position of the node symbol is
If the initial position and the position of the argument symbol are -1 (1002
) A syntax error is output (1003). If the position of the symbol of the node is minus 1 from the initial position of the symbol of the argument, the term of the node is output (1004), and the process ends.

本実施例では、前述のＪａｙ　Ｅａｒｌｓｙの構文解析
アルゴリズムを用いているため、全ての文脈自由文法に
適用できる。This embodiment uses Jay Earlsy's parsing algorithm described above, so it can be applied to all context-free grammars.

また、第４図に示した内部表現は、論理的に同じ意味を
持つ別の表現１例えばテーブルによる表現を用いても実
施できる。Furthermore, the internal representation shown in FIG. 4 can also be implemented using another representation that has the same logical meaning, such as a table representation.

本実施例では、複数の構文解析木を、ノード集合子とノ
ード結合子を用いて４Ｑｊ木として表現しているが、他
の表現方法を用いても実施できる。In this embodiment, a plurality of parse trees are expressed as 4Qj trees using node setters and node connectors, but other expression methods may also be used.

〔Effect of the invention〕

以上述べたように、本発明では、入力記号列の各部分に
適用される構文解析処理の各過程でパターンマツチを行
い、パターンマツチに失敗した構文解析木を砿棄するこ
とにより不必要な構文解析木を持ちつづけることがなく
なり、計算機のメモリの効率をよくすることが可能にな
った。As described above, in the present invention, patterns are matched in each process of parsing processing applied to each part of an input symbol string, and parse trees that fail in pattern matching are discarded, thereby eliminating unnecessary syntax. It is no longer necessary to keep a parse tree, making it possible to improve the efficiency of computer memory.

[Brief explanation of the drawing]

第１図は１本発明の一実施例の処理手順の概要を示すフ
ローチャートである。第２図（ａ）は。本発明の上記実施例のハードウェア構成図である。第２図（ｂ）は、本発明の上記実施例のデータ変換装置
の機能ブロック図である。第３図と第４図は１本発明の
上記実施′例のデータの論理構造を表わす説明図である
。第５図から第１０図は１本発明の上記実施例の処理手
順を表わすフローチャートである。２・・・入力部、５・・・碍文解析部、６・・・構文解
析木分解部、７・・・項生酸部、８・・・変換規則記憶
部、９・・・構文解析木記憶部。第（α）一−−−−−−４−剥叫楕報テ゛−ノ（ｂ）第図拓区第図 ■ 遁回力り図FIG. 1 is a flowchart showing an overview of the processing procedure of an embodiment of the present invention. Figure 2(a) is. FIG. 3 is a hardware configuration diagram of the above embodiment of the present invention. FIG. 2(b) is a functional block diagram of the data conversion device according to the above embodiment of the present invention. FIGS. 3 and 4 are explanatory diagrams showing the logical structure of data in the above embodiment of the present invention. 5 to 10 are flowcharts showing the processing procedure of the above-described embodiment of the present invention. 2... Input section, 5... Text analysis section, 6... Syntactic analysis tree decomposition section, 7... Term production section, 8... Conversion rule storage section, 9... Syntax analysis section Tree memory department. Part (α) 1------4-Screaming Ellipse Section (b) Fig. Taku-ku Fig.■

Claims

[Claims] 1. In a data processing device that converts a symbol string into structured data according to a conversion rule given from the outside, the structure of the symbol string is described in the notation of a context-free grammar, and the structure is a step of inputting a conversion rule that describes the structure of data in association with terminal symbols and non-terminal symbols used in the context-free grammar; a step of inputting a symbol string divided into parts; and a step of inputting a symbol string described in the conversion rule. parsing each part of the symbol string by a reduction process based on the structure of the string; and analyzing the structure of data having a structure associated with terminal symbols and non-terminal symbols corresponding to the structure of the symbol string. Data conversion comprising: generating data having a structure according to the description; and generating and outputting data having an overall structure by synthesizing the partially generated data having the structure. Method. 2. A data conversion method in which information for generating structured data is included in the conversion rule, and in the process of converting an input symbol string to structured data, syntax is applied to each part of the symbol string. Parsing processing, processing to provide each parse tree with information for generating data with the structure specified by the conversion rule, processing to decompose the parse tree, processing for each parse tree according to the information for generating data with structure. A data conversion method characterized in that an input symbol string is converted into data by performing a process of generating structured data from each part of the symbol string, and synthesizing structured data generated from each part of the symbol string. 3. The data conversion method according to claim 1, wherein the syntax analysis process is performed using Jay Earley's syntax analysis algorithm. 4. A data conversion system, which includes an input section for inputting symbol strings, a syntax analysis section for performing syntax analysis processing, a parsing tree decomposition section for disassembling the syntax parse tree, and a structure derived from the decomposed parse tree. For each part of the symbol string input in the input section, the syntax analysis section analyzes the syntax, the parse tree decomposition section performs parse tree decomposition processing, and the data generation section analyzes the structure of each part of the symbol string input in the input section. 1. A data conversion system characterized in that the symbol string is converted into structured data by performing processing to generate data having the following structure, and synthesizing structured data generated from each part of the symbol string. 5. The data conversion method according to claim 1, wherein a plurality of parse trees are expressed as one multiplex tree by sharing common nodes among the nodes of each parse tree. 6. An editing and translation system characterized in that an input sentence is analyzed and converted by using the data conversion method according to claim 1.