JPWO2004088539A1

JPWO2004088539A1 - Parsing method and programmable processor unit for parsing

Info

Publication number: JPWO2004088539A1
Application number: JP2004570163A
Authority: JP
Inventors: 政幸長沼; 野村　勝信; 勝信野村
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2003-03-31
Filing date: 2003-03-31
Publication date: 2006-07-06
Also published as: WO2004088539A1

Abstract

本発明に係る構文解析方法は、メモリにキーワードに対応したリファレンスデータとリファレンスデータに対応した参照先テーブルデータ及びフラグデータとから構成される１組のデータを複数組含むテーブルを複数組含むテーブルをアドレスデータに対応させて複数個用意する工程と、メモリから１つのテーブルを取出す工程と、ストリームデータからターゲットデータを抽出する工程と、ターゲットデータと取出したテーブルに含まれる複数組のデータの各リファレンスデータとを比較する工程と、ターゲットデータと一致したリファレンスデータに対応した参照先テーブルデータ及びフラグデータを選択する工程と、選択した参照先テーブルデータをアドレスデータとして利用してメモリから選択した参照先テーブルデータに対応したテーブルを出力する工程とを有する。In the syntax analysis method according to the present invention, a table including a plurality of sets of tables each including a plurality of sets of one set of reference data corresponding to a keyword, reference destination table data corresponding to the reference data, and flag data is stored in a memory. A step of preparing a plurality corresponding to the address data, a step of extracting one table from the memory, a step of extracting target data from the stream data, and each reference of the target data and a plurality of sets of data included in the extracted table A step of comparing the data, a step of selecting reference destination table data and flag data corresponding to the reference data matching the target data, and a reference destination selected from the memory using the selected reference destination table data as address data Corresponding to table data And a step of outputting the Buru.

Description

本発明は、ＭＰＥＧなどのメディア・データ、多重化されたバイナリ・ストリームデータ及び文字列等のストリームデータから、所望のキーワードを探し出すための構文解析方法に関し、特に高速に処理を行うことができる構文解析方法にする。さらに本発明は、そのような構文解析のためのプログラマブル・プロセッサ・ユニットに関する。 The present invention relates to a parsing method for finding a desired keyword from media data such as MPEG, stream data such as multiplexed binary stream data and character strings, and the syntax that can be processed particularly at high speed. Use the analysis method. The invention further relates to a programmable processor unit for such parsing.

従来の構文解析では、検索を行いたいキーワードを一つ取出し、キーワードと検索対象のストリームデータとを、ストリームデータの先頭から最後に渡って比較していた。そして、一つのキーワードとストリームデータとの比較が終了すると、次のキーワードに関して次の比較を行うというように、処理を繰り返していた。
図１３を用いて、従来の構文解析方法の手順について説明する。図１３において、検索対象のストリームデータ１３００（ｃａｂｃｄｅａａ・・・）が、バッファに格納されているものとする。また、検索したいキーワード１３１０（「ａｂｃｄ」、「ｃａｄ」、「ｃａｅ」及び「ｃｄ」）も、同様にバッファに格納されているものとする。最初に、キーワード「ａｂｃｄ」を取出し、検索１回目１３２０に示す様に、ストリームデータ１３００の先頭から順に比較を行っていく。図１３の例では、ストリームデータ１３００の２文字目から５文字目と、キーワード「ａｂｃｄ」が一致している。引き続き、一致した事を何らかの手段によって保持しながら、ストリームデータ１３００の６文字目から最後まで比較を行う。このようにして検索１回目１３２０を終了する。
次に、キーワード「ｃａｄ」を取出し、検索２回目１３３０に示す様に、ストリームデータ１３００の先頭から順に比較を行っていく。さらに、検索３回目１３４０、検索４回目１３５０の順で、キーワード１３１０の個数分だけ比較を継続して、全ての検索処理を終了する。
このような比較を高速に行う手法として、ハッシュ法がある。ハッシュ法とは、文字列からなる集合の要素からこの要素の格納場所のアドレスを求める関数（ハッシュ関数）を定義し、ハッシュ関数によって求められたアドレスをテーブルに格納するようにした方法である。ハッシュ法を利用した従来例が特開平４−９６１７４号公報に記載されている。その従来例では、候補文字列の読出し手段と一致検索手段とを独立して動作するように構成して、比較の高速化を図っている。
しかしながら、従来の構文解析では、比較を行うキーワードが複数ある場合に、キーワードの数に比例して検索時間が増大してしまうという問題があった。また、検索対象のストリームデータが全て揃っていないと、構文解析を始められないという問題もあった。さらに、キーワードは通常固定されて設定されており、容易に変更することができないという問題もあった。
発明の要約
そこで、本発明は、予め登録し且つテーブル展開しておいた全てのキーワードに対して、ストリームデータから抽出されたターゲットデータとの比較を並行して行い、キーワードが複数あっても高速に検索することができる構文解析方法及びそのような構文解析のためのプログラマブル・プロセッサ・ユニットを提供することを目的とする。
また、本発明は、検索対象のストリームデータが全て揃っていなくても検索を行うことができる構文解析方法及びそのような構文解析のためのプログラマブル・プロセッサ・ユニットを提供することを目的とする。
上記目的を達成するために、本発明に係る構文解析方法は、メモリにキーワードに対応したリファレンスデータとリファレンスデータに対応した参照先テーブルデータ及びフラグデータとから構成される１組のデータを複数組含むテーブルをアドレスデータに対応させて複数個用意する工程と、メモリから１つのテーブルを取出す工程と、ストリームデータからターゲットデータを抽出する工程と、ターゲットデータと取出したテーブルに含まれる複数組のデータの各リファレンスデータとを比較する工程と、ターゲットデータと一致したリファレンスデータに対応した参照先テーブルデータ及びフラグデータを選択する工程と、選択した参照先テーブルデータをアドレスデータとして利用してメモリから選択した参照先テーブルデータに対応したテーブルを出力する工程とを有し、抽出工程、比較工程、選択工程及び出力工程を、ストリームデータを構成するデータを順次新たなターゲットデータとして抽出しながら、繰返し行うことによってストリームデータからキーワードを検出することを特徴とする。
本発明に係る構文解析方法では、ストリームデータから１データずつをターゲットデータとして抽出し、メモリ上にテーブル展開されたキーワードとの比較を順次行っていくので、検索対象のストリームデータが最後まで揃っていない場合でも、ストリームデータの先頭から存在する分だけでも構文解析に着手することができる。また、検索対象のストリームデータが途中で追加されても、容易に構文解析を再開又は続行することができる。さらに、検索したいキーワードが複数ある場合でも、メモリ上に全てのキーワードをテーブル展開することによって、全てのキーワードの検索を同時並行的に処理することができる。
また、上記目的を達成するために、本発明に係るプログラマブル・プロセッサ・ユニットは、ストリームデータから抽出されたターゲットデータを記憶するレジスタと、キーワードに対応したリファレンスデータとリファレンスデータに対応した参照先テーブルデータ及びフラグデータとから構成される１組のデータを複数組有するテーブルセットを複数記憶し、アドレスデータの入力に応じてアドレスデータに対応したテーブルセットを出力するメモリと、メモリから１つのテーブルセットが有する複数組のデータを入力して、ターゲットデータと複数組のデータの各リファレンスデータとを比較し、一致した場合に、一致したリファレンスデータに対応した参照先テーブルデータ及びフラグデータを選択し、選択した参照先テーブルデータに対応したテーブルセットをメモリから入力するために、選択した参照先テーブルデータをアドレスデータとして出力する処理回路とを有することを特徴とする。
本発明に係るプログラマブル・プロセッサ・ユニットは、前述した本発明に係る構文解析方法を実現するためのユニットである。また、検索したいキーワードをプログラム設定可能としたので、入来するストリームデータの仕様に応じて、キーワードを容易に変更することが可能となる。また、検索したいキーワードが複数ある場合でも、全てのキーワードの検索を同時並行的に処理することができる。In the conventional syntax analysis, one keyword to be searched is taken out, and the keyword and stream data to be searched are compared from the beginning to the end of the stream data. When the comparison between one keyword and stream data is completed, the process is repeated such that the next comparison is performed for the next keyword.
The procedure of the conventional syntax analysis method will be described with reference to FIG. In FIG. 13, it is assumed that stream data 1300 (cabcdea...) To be searched is stored in the buffer. Also, it is assumed that the keyword 1310 (“abcd”, “cad”, “cae”, and “cd”) to be searched is also stored in the buffer. First, the keyword “abcd” is extracted, and as shown in the first search 1320, the comparison is performed in order from the head of the stream data 1300. In the example of FIG. 13, the second to fifth characters of the stream data 1300 match the keyword “abcd”. The comparison is performed from the sixth character to the end of the stream data 1300 while holding the match by some means. In this way, the first search 1320 is completed.
Next, the keyword “cad” is taken out and compared as shown in the second search 1330 in order from the top of the stream data 1300. Further, the comparison is continued by the number of keywords 1310 in the order of the third search 1340 and the fourth search 1350, and all search processes are completed.
As a technique for performing such comparison at high speed, there is a hash method. The hash method is a method in which a function (hash function) for obtaining an address of a storage location of an element is defined from elements of a set of character strings, and the address obtained by the hash function is stored in a table. A conventional example using the hash method is described in Japanese Patent Laid-Open No. 4-96174. In the conventional example, the candidate character string reading means and the match search means are configured to operate independently to increase the speed of comparison.
However, the conventional parsing has a problem that the search time increases in proportion to the number of keywords when there are a plurality of keywords to be compared. There is also a problem that the parsing cannot be started unless all the stream data to be searched is prepared. Furthermore, there is a problem that keywords are usually fixed and set and cannot be easily changed.
SUMMARY OF THE INVENTION Therefore, the present invention compares all the keywords registered in advance and developed in the table with the target data extracted from the stream data in parallel. It is an object of the present invention to provide a parsing method and a programmable processor unit for such parsing.
It is another object of the present invention to provide a syntax analysis method capable of performing a search even when all the stream data to be searched is not available, and a programmable processor unit for such syntax analysis.
In order to achieve the above object, a syntax analysis method according to the present invention includes a plurality of sets of data each composed of reference data corresponding to a keyword, reference destination table data corresponding to the reference data, and flag data in a memory. A step of preparing a plurality of tables corresponding to address data, a step of extracting one table from a memory, a step of extracting target data from stream data, and a plurality of sets of data included in the extracted table of target data A step of comparing each of the reference data, a step of selecting reference destination table data and flag data corresponding to the reference data that matches the target data, and selecting from the memory using the selected reference destination table data as address data For the referenced table data A process of outputting the table, and by repeating the extraction process, the comparison process, the selection process, and the output process while sequentially extracting the data constituting the stream data as new target data, keywords are extracted from the stream data. It is characterized by detecting.
In the parsing method according to the present invention, each piece of data is extracted as target data from the stream data and sequentially compared with keywords expanded in a table on the memory. Even if not, syntax analysis can be started only by the amount existing from the beginning of the stream data. Moreover, even if stream data to be searched is added in the middle, parsing can be easily resumed or continued. Furthermore, even when there are a plurality of keywords to be searched, by searching all the keywords on the memory, a search for all the keywords can be processed in parallel.
In order to achieve the above object, a programmable processor unit according to the present invention includes a register for storing target data extracted from stream data, reference data corresponding to a keyword, and a reference table corresponding to the reference data. A plurality of table sets each including a plurality of sets of data and flag data, a memory that outputs a table set corresponding to the address data in response to input of the address data, and one table set from the memory A plurality of sets of data is input, the target data is compared with each reference data of the plurality of sets of data, and if they match, the reference table data and flag data corresponding to the matched reference data are selected, Selected reference table data The table set corresponding to the input from memory, and having a processing circuit for outputting a referenced table data selected as the address data.
The programmable processor unit according to the present invention is a unit for realizing the syntax analysis method according to the present invention described above. Further, since the keyword to be searched can be set as a program, the keyword can be easily changed according to the specifications of the incoming stream data. Further, even when there are a plurality of keywords to be searched, it is possible to process searching for all keywords simultaneously.

図１は、本発明に係わる構文解析方法の全体の手順を示すフローチャートである。
図２は、キーワードのテーブル展開の手順を説明するための図である。
図３は、テーブル展開されたデータの一例を示す図である。
図４は、データストリームの検索の手順の概略を説明するための図である。
図５は、データストリームの検索の一例を示す図である。
図６は、データストリームの検索の一例を示す図である。
図７は、データストリームの検索の一例を示す図である。
図８は、データストリームの検索の一例を示す図である。
図９は、データストリームの検索の一例を示す図である。
図１０は、メモリに格納されるテーブルの一例を示す図である。
図１１は、本発明に係るプログラマブル・プロセッサ・ユニットの概略構成を示す図である。
図１２は、図１１に示す処理回路の概略回路構成を示す図である。
図１３は、従来の構文解析方法を説明するための図である。FIG. 1 is a flowchart showing the overall procedure of the syntax analysis method according to the present invention.
FIG. 2 is a diagram for explaining a procedure for expanding a keyword table.
FIG. 3 is a diagram illustrating an example of data developed in a table.
FIG. 4 is a diagram for explaining an outline of a procedure for searching a data stream.
FIG. 5 is a diagram illustrating an example of data stream search.
FIG. 6 is a diagram illustrating an example of data stream search.
FIG. 7 is a diagram illustrating an example of data stream search.
FIG. 8 is a diagram illustrating an example of data stream search.
FIG. 9 is a diagram illustrating an example of data stream search.
FIG. 10 is a diagram illustrating an example of a table stored in the memory.
FIG. 11 is a diagram showing a schematic configuration of a programmable processor unit according to the present invention.
FIG. 12 is a diagram showing a schematic circuit configuration of the processing circuit shown in FIG.
FIG. 13 is a diagram for explaining a conventional syntax analysis method.

図１は、本発明に係る構文解析方法の手順を示すフローチャートである。本発明に係る構文解析方法では、検索対象となるデータストリームから、予めテーブル展開されたキーワードの検索を行い、検索結果を出力することができるように構成されている。
以下に、データストリームとして以下の８通りの文字が使用されている文字列を用い、４種類のキーワード（「ａｂｃｄ」、「ｃａｄ」、「ｃａｅ」及び「ｃｄ」）の検索を行う場合を例にして説明を行う。
文字コードｄａｔａ
００００ａ
０００１ｂ
０００２ｃ
０００３ｄ
０００４ｅ
０００５ｆ
０００６ｇ
０００７ｈ
最初に、テーブルの総数、ポインタ配列の深さを定義し、テーブルの初期化を行う（ステップ１０１）。ここでは、テーブル総数は６、ポインタ配列の深さＮ＝１０に設定した。
次に、登録するキーワードの総数を定義し、キーワードの初期化を行う（ステップ１０２）。ここでは、キーワードの総数を４に設定した。
次に、検索を行うキーワードを一つずつ登録し、テーブル展開する（ステップ１０３）。図２に、キーワード「ｃａｅ」を登録し、テーブル展開を行う例を示す。
図２では、キーワード「ａｂｃｄ」は既に登録されているものとする。キーワード「ａｂｃｄ」の登録により、先頭文字テーブル２００、２番目の文字テーブル２１０、３番目の文字テーブル２２０及び４番目の文字テーブル２３０が作成されている。各テーブルには、図示されるようなアドレスデータが対応しており、リファレンスデータ、参照先テーブルデータ及びフラグデータから構成される１組のデータを複数含んでいる。また、リファレンスデータは、検索対象の文字列を構成している８種の文字に対応している。なお、各テーブルにおいて、参照先テーブルデータの初期値は［０ｘ００００］、またフラグデータの初期値は［ｕｎｍａｔｃｈ］に設定されている。
まず、先頭文字テーブル２００において、キーワード「ｃａｅ」の先頭文字に応じて、リファレンスデータ「ｃ」に対応する参照先テーブルデータを［０ｘ０００４］に変更し、リファレンスデータ「ｃ」に対応するフラグデータを［ｃｏｎｔｉｎｕｅ］に変更する。
次に、キーワード「ｃａｅ」の２番目の文字及び参照先テーブルデータの［０ｘ０００４］に対応して、新たに２番目の文字テーブルとして２４０を作成する。さらに、２番目のテーブル２４０のリファレンスデータ「ａ」に対応する参照先テーブルデータを［０ｘ０００５］に変更し、リファレンスデータ「ａ」に対応するフラグデータを［ｃｏｎｔｉｎｕｅ］に変更する。
次に、キーワード「ｃａｅ」の３番目の文字及び参照先テーブルデータの［０ｘ０００５］に対応して、新たに３番目の文字テーブルとして２５０を作成する。さらに、３番目のテーブル２５０のリファレンスデータ「ｅ」に対応するフラグデータを［ｍａｔｃｈ］に変更して、キーワード「ｃａｅ」の登録を完了する。
同様に、キーワード「ｃａｅ」及び「ｃｄ」の登録を行う。全てのキーワードの登録及びテーブル展開を完了した後の複数のテーブルセットを図３に示す。図３に示す複数のテーブルセットは、先頭文字テーブル３００、２番目の文字テーブル３１０及び３４０、３番目の文字テーブル３２０及び３４０、４番目の文字テーブル３５０から構成されている。
次に、全てのキーワードの登録及びテーブル展開を完了した後の複数のテーブルセットの表示を適当なディスプレイ上で行う（ステップ１０４）。複数のテーブルセットの表示は、デバックのためのものであり、不必要な場合には省略することも可能である。
次に、検索対象の文字列からキーワードの検索を実行する（ステップ１０５）。図４に、検索の手順の概略を示す。ここでは、検索対象の文字列が「ｃａｂｃｄｅ・・・」であるものとする。最初に、検索対象の文字列から最初の一文字「ｃ」をターゲットデータとして抽出し、「ｃ」と４つのキーワードの最初の一文字目との比較を行う（４０１）。ターゲットデータ「ｃ」とキーワード「ｃａｄ」、「ｃａｅ」及び「ｃｄ」の一文字目が一致するので、これらのキーワードは継続したものとする。
次に、検索対象の文字列の次の一文字「ａ」をターゲットデータとして抽出し、「ａ」と４つのキーワードとの比較を行う（４０２）。ターゲットデータ「ａ」とキーワード「ｃａｄ」及び「ｃａｅ」の２文字目が一致するので、これらのキーワードは継続したものとする。また、キーワード「ｃｄ」の２文字目は「ａ」でないので、これらは一致しない。さらに、ターゲットデータ「ａ」とキーワード「ａｂｃｄ」の一文字目が一致するので、このキーワードは継続したものとする。
次に、検索対象の文字列の次の一文字「ｂ」をターゲットデータとして抽出し、「ｂ」と４つのキーワードとの比較を行う（４０３）。ターゲットデータ「ｂ」とキーワード「ａｂｃｄ」の２文字目が一致するので、このキーワードは継続しているものとする。また、キーワード「ｃａｄ」及び「ｃａｅ」の３文字目は「ｂ」でないので、これらは一致しない。
次に、検索対象の文字列の次の一文字「ｃ」をターゲットデータとして抽出し、「ｃ」と４つのキーワードとの比較を行う（４０４）。ターゲットデータ「ｃ」とキーワード「ａｂｃｄ」の３文字目は一致するので、このキーワードは継続したものとする。さらに、ターゲットデータ「ｃ」とキーワード「ｃａｄ」、「ｃａｅ」及び「ｃｄ」の一文字目は一致するので、これらのキーワードは継続したものとする。
次に、検索対象の文字列の次の一文字「ｄ」をターゲットデータとして抽出し、「ｄ」と４つのキーワードとの比較を行う（４０５）。ターゲットデータ「ｄ」とキーワード「ａｂｃｄ」の４文字目は一致するので、このキーワードは文字列中に検出されたものとする。また、キーワード「ｃａｄ」及び「ｃａｅ」の２文字目は「ｄ」でないので、これらのキーワードは一致しない。さらに、キーワード「ｃｄ」の２文字目は「ｄ」であるが、キーワード「ａｂｃｄ」が一致しているので、キーワード「ｃｄ」は無効とする。
このようにして、検索対象の文字列から一文字ずつ抽出して、全てのキーワードとの比較を同時並行的に行う処理を、検索対象の文字列の全てについて行う。
次に、図５〜図９を用いて、上述した検索処理を、複数のテーブルセットとポインタを用いてどのように実行するかを説明する。
最初に、検索対象の文字列からターゲットデータとして「ｃ」を取出して比較を行う（図５参照）。まず、ポインタの０階層に初期設定されている［０ｘ００００］に基づいてテーブル３００を選択して（５０１）、ターゲットデータ「ｃ」とリファレンスデータ「ａ」〜「ｆ」を比較し、一致したリファレンスデータ「ｃ」のフラグデータが［ｃｏｎｔｉｎｕｅ］であるので、参照先テーブルデータ［０ｘ０００４］をポインタに格納する（５０２）。したがって、最初のターゲットデータ「ｃ」との比較後には、ポインタは、ポインタ（後）３７０のように設定されている。なお、ポインタの最下位層には、常に［０ｘ００００］が設定される。
次に、検索対象の文字列から次のターゲットデータ「ａ」を取出して、比較を行う（図６参照）。この時ポインタは、図５の結果からポインタ（前）３６０のように設定されている。まず。ポインタの第０階層に設定されている［０ｘ０００４］に基づいてテーブル３４０を選択して（６０１）、ターゲットデータ「ａ」とリファレンスデータ「ａ」〜「ｆ」を比較し、一致したリファレンスデータ「ａ」のフラグデータが［ｃｏｎｔｉｎｕｅ］であるので、参照先テーブルデータ［０ｘ０００５］をポインタに格納する（６０２）。次に、ポインタの第１階層に設定されている［０ｘ００００］に基づいてテーブル３００を選択して（６０３）、ターゲットデータ「ａ」とリファレンスデータ「ａ」〜「ｆ」を比較し、一致したリファレンスデータ「ａ」のフラグデータが［ｃｏｎｔｉｎｕｅ］であるので、参照先テーブルデータ［０ｘ０００１］をポインタに格納する（６０４）。したがって、ターゲットデータ「ａ」との比較後には、ポインタは、ポインタ（後）３７０のように設定されている。
次に、検索対象の文字列から次のターゲットデータ「ｂ」を取出して、比較を行う（図７参照）。この時ポインタは、図６の結果からポインタ（前）３６０のように設定されている。まず。ポインタの第０階層に設定されている［０ｘ０００５］に基づいてテーブル３５０を選択して（７０１）、ターゲットデータ「ｂ」とリファレンスデータ「ａ」〜「ｆ」を比較し、一致したリファレンスデータ「ｂ」のフラグデータが［ｕｎｍａｔｃｈ］であるので、ポインタへの格納はしない。次に、ポインタの第１階層に設定されている［０ｘ０００１］に基づいてテーブル３１０を選択して（７０２）、ターゲットデータ「ｂ」とリファレンスデータ「ａ」〜「ｆ」を比較し、一致したリファレンスデータ「ａ」のフラグデータが［ｃｏｎｔｉｎｕｅ］であるので、参照先テーブルデータ［０ｘ０００２］をポインタに格納する（７０３）。次に、ポインタの第２階層に設定されている［０ｘ００００］に基づいてテーブル３００を選択して（７０４）、ターゲットデータ「ｂ」とリファレンスデータ「ａ」〜「ｆ」を比較し、一致したリファレンスデータ「ｂ」のフラグデータが［ｕｎｍａｔｃｈ］であるので、ポインタへの格納はしない。したがって、ターゲットデータ「ｂ」との比較後には、ポインタは、ポインタ（後）３７０のように設定されている。
次に、検索対象の文字列から次のターゲットデータ「ｃ」を取出して、比較を行う（図８参照）。この時ポインタは、図７の結果からポインタ（前）３６０のように設定されている。まず、ポインタの第０階層に設定されている［０ｘ０００２］に基づいてテーブル３２０を選択して（８０１）、ターゲットデータ「ｃ」とリファレンスデータ「ａ」〜「ｆ」を比較し、一致したリファレンスデータ「ｃ」のフラグデータが［ｃｏｎｔｉｎｕｅ］であるので、参照先テーブルデータ［０ｘ０００３］をポインタに格納する（８０２）。次に、ポインタの第１階層に設定されている［０ｘ００００］に基づいてテーブル３００を選択して（８０３）、ターゲットデータ「ｃ」とリファレンスデータ「ａ」〜「ｆ」を比較し、一致したリファレンスデータ「ｃ」のフラグデータが［ｃｏｎｔｉｎｕｅ］であるので、参照先テーブルデータ［０ｘ０００４］をポインタに格納する（８０４）。したがって、ターゲットデータ「ｃ」との比較後には、ポインタは、ポインタ（後）３７０のように設定されている。
次に、検索対象の文字列から次のターゲットデータ「ｄ」を取出して、比較を行う（図９参照）。この時ポインタは、図８の結果からポインタ（前）３６０のように設定されている。まず。ポインタの第０階層に設定されている［０ｘ０００３］に基づいてテーブル３３０を選択して（９０１）、ターゲットデータ「ｄ」とリファレンスデータ「ａ」〜「ｆ」を比較し、一致したリファレンスデータ「ｄ」のフラグデータが［ｍａｔｃｈ］であるので、キーワード「ａｂｃｄ」が検出されたことが判別される。次に、ポインタの第１階層に設定されている［０ｘ０００４］に基づいてテーブル３４０を選択して（９０２）、ターゲットデータ「ｄ」とリファレンスデータ「ａ」〜「ｆ」を比較し、一致したリファレンスデータ「ｄ」のフラグデータが［ｍａｔｃｈ］であるが同レベルでキーワード「ａｂｃｄ」が検出されていることから、キーワード「ｃｄ」の検出を無効とする。次に、ポインタの第２階層に設定されている［０ｘ００００］に基づいてテーブル３００を選択して（９０３）、ターゲットデータ「ｄ」とリファレンスデータ「ａ」〜「ｆ」を比較し、一致したリファレンスデータ「ｄ」のフラグデータが［ｕｎｍａｔｃｈ］であるので、ポインタへの格納はしない。したがって、ターゲットデータ「ｄ」との比較後には、ポインタは、ポインタ（後）３７０のように設定されている。
このようにして、検索対象の文字列から１文字ずつターゲットデータを抽出しながら、検索対象の文字列の最後まで、キーワードとの比較を行っていく。したがって、本発明に係る構文解析方法では、検索対象の文字列が最後まで揃っていない場合でも、文字列の先頭から存在する分だけでも構文解析に着手することができる。また、検索対象の文字列が途中で追加されても、容易に構文解析を再開又は続行することができる。さらに、検索したいキーワードが複数ある場合でも、全てのキーワードの検索を同時並行的に処理することができる。
次に、図１０〜１２を用いて、前述した本発明に係る構文解析方法を効率良く実現するためのプログラマブル・プロセッサ・ユニットについて説明を行う。
図１０に、本発明に係るプログラマブル・プロセッサ・ユニット１０００の概略を示す。なお、図１０に示すプログラマブル・プロセッサ・ユニット１０００は、一例であって、これに限定されるものでは無い。図示されるように、プログラマブル・プロセッサ・ユニット１０００は、処理回路１００２、ターゲットレジスタ１００４、遅延回路１００６及びメモリ１００８から構成されている。
プログラマブル・プロセッサ・ユニット１０００では、検索対象の文字列１０１０がターゲットレジスタ１００４に入力され、一文字ずつターゲットデータ１０１２として処理回路１００２に入力される。処理回路１００２では、ＮＥＸＴアドレス信号１０１４を出力して遅延回路１００６に一次記憶させる。また、所定のタイミングでアドレス信号１０１６がメモリ１００８に入力される。メモリ１００８には後述する複数のテーブルセットが予め記憶され、アドレス信号１０１６に対応したテーブルセットに含まれる複数組のデータ１０１８（リファレンスデータ［ｎ］、参照先テーブルデータ［ｎ］、有効バイトデータ［ｎ］及びフラグデータ［ｎ］）が処理回路１００２に入力される。処理回路１００２では、前述した構文解析方法に基づく理論に従って、ターゲットデータ１０１２及びテーブルセットに含まれる複数組のデータ１０１８を用いて、ＮＥＸＴアドレス信号１０１４及びフラグデータ１０２０を出力するように構成されている。
次に、メモリ１００８に記憶されている複数のテーブルセットについて説明する。複数のテーブルセットは、前述した図３に示されるような６つのテーブルセット３００〜３５０であって良い。しかしながら、利用しない項目を圧縮して、図１１に示すようなテーブルセットを作成することもできる。図１１において、テーブルセット例１１００は、アドレスデータ１１０２、リファレンスデータ１１０４、参照先テーブルデータ１１０６、有効バイトデータ１１０８及びフラグデータ１１１０から構成されている。なお、有効バイトデータ１１０８は、対応するリファレンスデータの比較が必要か否かを示すデータである。
図１１に示すテーブルセットを利用すれば、一度に処理する組データ（リファレンスデータ［ｎ］、参照先テーブルデータ［ｎ］、有効バイトデータ［ｎ］及びフラグデータ［ｎ］）は、４組ずつで良くなり、ハードウエアの構成を簡素化することが可能となる。
次に、処理回路１００２の回路構成の概略を図１２に示す。図１２に示す処理回路１００２は、図１１に示すテーブルセット１１００を利用した場合に適合しており、一度に４つの組データを処理することができる。
処理回路１００２の主要部は、図１２に示すように、アドレスデコーダ１２００、コンパレータ１２０２、１２０８、１２１４及び１２２０、インバータ１２０４、１２１０、１２１６及び１２２２、ＯＲ回路１２０６、１２１２、１２１８及び１２２４、プライオリティ・エンコーダ１２２６、第１出力バッファ１２２８、及び第２出力バッファ１２３０から構成される。処理回路１００２は、有効バイトが１ビットの場合に４リファレンスデータを同時比較する理論に基づいて構成されている。
処理回路１００２は、メモリ１００８からアドレスデータに対応した４つの組データ（リファレンスデータ［ｎ］、参照先テーブルデータ［ｎ］、有効バイトデータ［ｎ］及びフラグデータ［ｎ］）を同時に参照する。図１２において、参照した４つの組データを１２４０として示す。
各行のリファレンスデータ［０］〜［３］は、アドレスデコーダ１２００によってデコードされてコンパレータ１２０２、１２０８、１２１４及び１２２０に入力されて、ターゲットデータと比較される。デコードされたリファレンスデータがターゲットデータと一致する場合には、ハイレベルがコンパレータから出力されるように構成されている。
各行の有効バイトデータ［０］〜［３］は、デコードされて、インバータ１２０４、１２１０、１２１６及び１２２２にそれぞれ入力される。したがって、有効バイトデータが”ｖａｌｉｄ”の場合は各インバータからはローベルが出力され、有効バイトデータが”ｉｎｖａｌｉｄ”の場合は各インバータからはハイベルが出力される。
各参照先テーブルデータ［０］〜［３］は、第１の出力バッファ１２２８に格納され、各フラグデータ［０］〜［３］は、第２の出力バッファ１２３０に格納される。
インバータ出力がローベルの場合、ＯＲ回路は、コンパレータからの出力がハイレベルの場合にハイレベルの出力を行う。即ち、この場合、デコードされたリファレンスデータとターゲットデータが一致したこととなる。また、インバータ出力がハイレベルの場合、ＯＲ回路は、無条件でハイレベルの出力を行う。
プライオリティ・デコーダ１２２６は、ＯＲ回路からハイレベルの出力がなされた組の参照先テーブルデータを第１出力バッファ１２２８からｎｅｘｔアドレスデータ１０１４として出力し、同様にＯＲ回路からハイレベルの出力がなされた組のフラグデータを第２出力バッファからフラグデータ１０２０として出力する。
第１出力バッファから出力されたｎｅｘｔアドレスデータ１０１４は、遅延回路１００６で一旦保持され、次のシーケンスを待つこととなる。また次のシーケンスでは、次のターゲットデータにアップデートされて、同様の処理が行われる。なお、文字列の全てについての処理が完了するまで、シーケンスは繰り返される。このようにして、全文字列に対して、キーワードが存在するか否かの検索が行われる。
このようにストリームデータ中に予めメモリ１００８にテーブル展開されたキーワードが存在するか否かを高速に検索することが可能となった。特に、本プログラマブル・プロセッサ・ユニットを利用すれば、検索対象のストリームデータが最後まで揃っていない場合でも、先頭から存在する分だけでも構文解析に着手することができる。さらに、データストリームが途中で追加されても、容易に構文解析を再開又は続行することができる。さらに、検索したいキーワードが複数ある場合でも、全てのキーワードの検索を同時並行的に処理することができる。したがって、通信手段を用いて、ＭＰＥＧなどの画像データと制御データとが混在するストリームデータを順次受信している場合に、そのようなストリームデータから制御データのみを検索する場合等に特に本発明は有効である。
また、キーワードは容易にメモリ上にプログラム設定することができるので、予め複数グループのキーワードをテーブル展開してメモリ上に記憶させておき、入来するストリームデータに応じて、検索するキーワードを切替えることも可能である。FIG. 1 is a flowchart showing a procedure of a syntax analysis method according to the present invention. The syntax analysis method according to the present invention is configured so that a keyword expanded in a table in advance is searched from a data stream to be searched, and a search result is output.
The following is an example in which a search is performed for four types of keywords (“abcd”, “cad”, “cae”, and “cd”) using a character string that uses the following eight characters as a data stream: I will explain.
Character code data
0000 a
0001 b
0002 c
0003 d
0004 e
0005 f
0006 g
0007 h
First, the total number of tables and the depth of the pointer array are defined, and the table is initialized (step 101). Here, the total number of tables is set to 6, and the depth N of the pointer array is set to 10.
Next, the total number of keywords to be registered is defined, and keyword initialization is performed (step 102). Here, the total number of keywords is set to 4.
Next, keywords to be searched are registered one by one and developed in a table (step 103). FIG. 2 shows an example in which the keyword “cae” is registered and table expansion is performed.
In FIG. 2, it is assumed that the keyword “abcd” has already been registered. By registering the keyword “abcd”, the first character table 200, the second character table 210, the third character table 220, and the fourth character table 230 are created. Each table corresponds to address data as illustrated, and includes a plurality of sets of data including reference data, reference table data, and flag data. Further, the reference data corresponds to eight types of characters constituting the character string to be searched. In each table, the initial value of the reference table data is set to [0x0000], and the initial value of the flag data is set to [unmatch].
First, in the first character table 200, the reference table data corresponding to the reference data “c” is changed to [0x0004] according to the first character of the keyword “cae”, and the flag data corresponding to the reference data “c” is changed. Change to [continue].
Next, 240 is newly created as the second character table corresponding to the second character of the keyword “cae” and the reference table data [0x0004]. Further, the reference table data corresponding to the reference data “a” in the second table 240 is changed to [0x0005], and the flag data corresponding to the reference data “a” is changed to [continue].
Next, 250 is newly created as the third character table corresponding to the third character of the keyword “cae” and the reference table data [0x0005]. Further, the flag data corresponding to the reference data “e” in the third table 250 is changed to [match], and the registration of the keyword “cae” is completed.
Similarly, the keywords “cae” and “cd” are registered. A plurality of table sets after registration of all keywords and table expansion are shown in FIG. The plurality of table sets shown in FIG. 3 includes a first character table 300, a second character table 310 and 340, a third character table 320 and 340, and a fourth character table 350.
Next, a plurality of table sets are displayed on an appropriate display after registration of all keywords and table expansion are completed (step 104). The display of the plurality of table sets is for debugging, and can be omitted if unnecessary.
Next, a keyword search is executed from the character string to be searched (step 105). FIG. 4 shows an outline of the search procedure. Here, it is assumed that the character string to be searched is “cabcde...”. First, the first character “c” is extracted as target data from the search target character string, and “c” is compared with the first character of the four keywords (401). Since the first character of the target data “c” and the keywords “cad”, “cae”, and “cd” match, these keywords are assumed to have continued.
Next, the next character “a” of the character string to be searched is extracted as target data, and “a” is compared with the four keywords (402). Since the target data “a” and the second characters of the keywords “cad” and “cae” match, it is assumed that these keywords have continued. Further, since the second character of the keyword “cd” is not “a”, they do not match. Further, since the first character of the target data “a” and the keyword “abcd” match, this keyword is assumed to be continued.
Next, the next character “b” of the search target character string is extracted as target data, and “b” is compared with the four keywords (403). Since the second character of the target data “b” and the keyword “abcd” match, it is assumed that the keyword continues. Further, since the third characters of the keywords “cad” and “cae” are not “b”, they do not match.
Next, the next character “c” of the character string to be searched is extracted as target data, and “c” is compared with the four keywords (404). Since the third character of the target data “c” and the keyword “abcd” match, this keyword is assumed to be continued. Furthermore, since the target data “c” and the first characters of the keywords “cad”, “cae”, and “cd” match, it is assumed that these keywords continue.
Next, the next character “d” of the character string to be searched is extracted as target data, and “d” is compared with the four keywords (405). Since the fourth character of the target data “d” and the keyword “abcd” match, it is assumed that this keyword is detected in the character string. Further, since the second characters of the keywords “cad” and “cae” are not “d”, these keywords do not match. Further, the second character of the keyword “cd” is “d”, but since the keyword “abcd” matches, the keyword “cd” is invalidated.
In this way, the process of extracting characters one by one from the search target character string and performing comparison with all the keywords simultaneously is performed for all the search target character strings.
Next, using FIG. 5 to FIG. 9, how the above-described search process is executed using a plurality of table sets and pointers will be described.
First, “c” is extracted as target data from the character string to be searched for comparison (see FIG. 5). First, the table 300 is selected based on [0x0000] initially set in the 0th hierarchy of the pointer (501), the target data “c” is compared with the reference data “a” to “f”, and the matched reference is selected. Since the flag data of the data “c” is [continue], the reference table data [0x0004] is stored in the pointer (502). Therefore, after the comparison with the first target data “c”, the pointer is set like a pointer (rear) 370. [0x0000] is always set in the lowest layer of the pointer.
Next, the next target data “a” is extracted from the character string to be searched and compared (see FIG. 6). At this time, the pointer is set as a pointer (previous) 360 from the result of FIG. First. The table 340 is selected based on [0x0004] set in the 0th layer of the pointer (601), the target data “a” is compared with the reference data “a” to “f”, and the matched reference data “ Since the flag data of “a” is [continue], the reference table data [0x0005] is stored in the pointer (602). Next, the table 300 is selected based on [0x0000] set in the first layer of the pointer (603), the target data “a” is compared with the reference data “a” to “f”, and they match. Since the flag data of the reference data “a” is [continue], reference destination table data [0x0001] is stored in the pointer (604). Therefore, after the comparison with the target data “a”, the pointer is set as a pointer (after) 370.
Next, the next target data “b” is extracted from the character string to be searched and compared (see FIG. 7). At this time, the pointer is set as a pointer (front) 360 from the result of FIG. First. The table 350 is selected based on [0x0005] set in the 0th hierarchy of the pointer (701), the target data “b” is compared with the reference data “a” to “f”, and the matched reference data “ Since the flag data of “b” is [unmatch], it is not stored in the pointer. Next, the table 310 is selected based on [0x0001] set in the first layer of the pointer (702), the target data “b” is compared with the reference data “a” to “f”, and they match. Since the flag data of the reference data “a” is [continue], the reference table data [0x0002] is stored in the pointer (703). Next, the table 300 is selected based on [0x0000] set in the second layer of the pointer (704), the target data “b” is compared with the reference data “a” to “f”, and they match. Since the flag data of the reference data “b” is “unmatch”, it is not stored in the pointer. Therefore, after the comparison with the target data “b”, the pointer is set as a pointer (rear) 370.
Next, the next target data “c” is extracted from the character string to be searched and compared (see FIG. 8). At this time, the pointer is set as a pointer (previous) 360 from the result of FIG. First, the table 320 is selected based on [0x0002] set in the 0th hierarchy of the pointer (801), the target data “c” is compared with the reference data “a” to “f”, and the matched reference is obtained. Since the flag data of the data “c” is [continue], the reference table data [0x0003] is stored in the pointer (802). Next, the table 300 is selected based on [0x0000] set in the first layer of the pointer (803), the target data “c” is compared with the reference data “a” to “f”, and they match. Since the flag data of the reference data “c” is [continue], the reference destination table data [0x0004] is stored in the pointer (804). Therefore, after the comparison with the target data “c”, the pointer is set as a pointer (rear) 370.
Next, the next target data “d” is extracted from the search target character string and compared (see FIG. 9). At this time, the pointer is set as a pointer (previous) 360 from the result of FIG. First. The table 330 is selected based on [0x0003] set in the 0th layer of the pointer (901), the target data “d” is compared with the reference data “a” to “f”, and the matched reference data “ Since the flag data of “d” is [match], it is determined that the keyword “abcd” has been detected. Next, the table 340 is selected based on [0x0004] set in the first layer of the pointer (902), the target data “d” is compared with the reference data “a” to “f”, and they match. Since the flag data of the reference data “d” is [match] but the keyword “abcd” is detected at the same level, the detection of the keyword “cd” is invalidated. Next, the table 300 is selected based on [0x0000] set in the second layer of the pointer (903), the target data “d” is compared with the reference data “a” to “f”, and they match. Since the flag data of the reference data “d” is “unmatch”, it is not stored in the pointer. Therefore, after the comparison with the target data “d”, the pointer is set as a pointer (rear) 370.
In this manner, the target data is extracted character by character from the character string to be searched, and the comparison with the keyword is performed up to the end of the character string to be searched. Therefore, in the syntax analysis method according to the present invention, even when the character strings to be searched are not prepared to the end, the syntax analysis can be started only by the amount existing from the top of the character string. Even if a search target character string is added in the middle, parsing can be easily resumed or continued. Furthermore, even when there are a plurality of keywords to be searched, it is possible to process all keyword searches in parallel.
Next, a programmable processor unit for efficiently realizing the syntax analysis method according to the present invention described above will be described with reference to FIGS.
FIG. 10 shows an outline of a programmable processor unit 1000 according to the present invention. The programmable processor unit 1000 shown in FIG. 10 is an example, and the present invention is not limited to this. As shown in the figure, the programmable processor unit 1000 includes a processing circuit 1002, a target register 1004, a delay circuit 1006, and a memory 1008.
In the programmable processor unit 1000, the character string 1010 to be searched is input to the target register 1004, and is input to the processing circuit 1002 as target data 1012 character by character. The processing circuit 1002 outputs a NEXT address signal 1014 to be temporarily stored in the delay circuit 1006. An address signal 1016 is input to the memory 1008 at a predetermined timing. A plurality of table sets to be described later are stored in advance in the memory 1008, and a plurality of sets of data 1018 (reference data [n], reference destination table data [n], valid byte data [ n] and flag data [n]) are input to the processing circuit 1002. The processing circuit 1002 is configured to output the NEXT address signal 1014 and the flag data 1020 using the target data 1012 and a plurality of sets of data 1018 included in the table set according to the theory based on the syntax analysis method described above. .
Next, a plurality of table sets stored in the memory 1008 will be described. The plurality of table sets may be six table sets 300 to 350 as shown in FIG. 3 described above. However, it is also possible to create a table set as shown in FIG. 11 by compressing items that are not used. In FIG. 11, an example table set 1100 includes address data 1102, reference data 1104, reference destination table data 1106, valid byte data 1108, and flag data 1110. The valid byte data 1108 is data indicating whether or not comparison of corresponding reference data is necessary.
If the table set shown in FIG. 11 is used, four sets of set data (reference data [n], reference destination table data [n], valid byte data [n], and flag data [n]) to be processed at a time are used. And the hardware configuration can be simplified.
Next, an outline of a circuit configuration of the processing circuit 1002 is shown in FIG. The processing circuit 1002 shown in FIG. 12 is suitable when the table set 1100 shown in FIG. 11 is used, and can process four sets of data at a time.
As shown in FIG. 12, the main part of the processing circuit 1002 includes an address decoder 1200, comparators 1202, 1208, 1214 and 1220, inverters 1204, 1210, 1216 and 1222, OR circuits 1206, 1212, 1218 and 1224, and a priority encoder. 1226, a first output buffer 1228, and a second output buffer 1230. The processing circuit 1002 is configured based on the theory of simultaneously comparing four reference data when the effective byte is 1 bit.
The processing circuit 1002 refers to four sets of data (reference data [n], reference destination table data [n], valid byte data [n], and flag data [n]) corresponding to the address data from the memory 1008 at the same time. In FIG. 12, the four sets of referenced data are shown as 1240.
The reference data [0] to [3] of each row is decoded by the address decoder 1200 and input to the comparators 1202, 1208, 1214, and 1220, and is compared with the target data. When the decoded reference data matches the target data, a high level is output from the comparator.
Valid byte data [0] to [3] of each row are decoded and input to inverters 1204, 1210, 1216 and 1222, respectively. Therefore, when the valid byte data is “valid”, a low bell is output from each inverter, and when the valid byte data is “invalid”, a high bell is output from each inverter.
Each reference table data [0] to [3] is stored in the first output buffer 1228, and each flag data [0] to [3] is stored in the second output buffer 1230.
When the inverter output is low level, the OR circuit outputs a high level when the output from the comparator is high level. That is, in this case, the decoded reference data matches the target data. Further, when the inverter output is at a high level, the OR circuit outputs a high level unconditionally.
The priority decoder 1226 outputs the reference destination table data of the set from which the high level is output from the OR circuit as the next address data 1014 from the first output buffer 1228, and similarly the set from which the high level is output from the OR circuit. Are output as flag data 1020 from the second output buffer.
The next address data 1014 output from the first output buffer is temporarily held by the delay circuit 1006 and waits for the next sequence. In the next sequence, the next target data is updated and the same processing is performed. The sequence is repeated until the processing for all the character strings is completed. In this way, a search is performed to determine whether or not a keyword exists for all character strings.
As described above, it is possible to search at high speed whether there is a keyword previously expanded in the memory 1008 in the stream data. In particular, if this programmable processor unit is used, even if the stream data to be searched is not complete to the end, it is possible to start parsing only by the amount existing from the beginning. Furthermore, parsing can be easily resumed or continued even if the data stream is added midway. Furthermore, even when there are a plurality of keywords to be searched, it is possible to process all keyword searches in parallel. Therefore, when stream data in which image data such as MPEG and control data are mixed is sequentially received using communication means, the present invention is particularly useful when searching for only control data from such stream data. It is valid.
In addition, since keywords can be easily programmed in the memory, a plurality of groups of keywords are expanded in a table and stored in the memory in advance, and the keyword to be searched is switched according to the incoming stream data. Is also possible.

Claims

In the syntax analysis method for detecting keywords from stream data to be searched,
A plurality of tables corresponding to the address data are prepared in the memory in correspondence with the address data. The table includes a plurality of sets of one set of reference data corresponding to the keyword, reference destination table data corresponding to the reference data, and flag data. Process,
Removing one table from the memory;
Extracting target data from the stream data;
A step of comparing the target data with each reference data of a plurality of sets of data included in the extracted table;
Selecting reference table data and flag data corresponding to reference data that matches the target data;
Using the selected reference table data as address data, and outputting a table corresponding to the selected reference table data from the memory,
Detecting the keyword from the stream data by repeatedly performing the extraction step, the comparison step, the selection step, and the output step while sequentially extracting data constituting the stream data as new target data. Character parsing method.

The syntax analysis method according to claim 1, wherein the flag data indicates whether or not the keyword is included in the stream data.

The syntax analysis method according to claim 1, wherein the memory includes a set of data including reference data, flag data, and reference destination table data corresponding to a plurality of keywords.

In a programmable processor unit for detecting programmable keywords from stream data,
A register for storing target data extracted from the stream data;
A plurality of tables having a plurality of sets of one set of data composed of reference data corresponding to the keyword and reference destination table data and flag data corresponding to the reference data are stored, and the table according to the input of address data A memory that outputs a table corresponding to the address data; and
When a plurality of sets of data included in one table set are input from the memory, the target data and each reference data of the plurality of sets of data are compared, and if they match, the reference destination corresponding to the matched reference data A processing circuit for selecting table data and flag data and outputting the selected reference table data as address data in order to input a table corresponding to the selected reference table data from the memory; Programmable processor unit.

The set of data further includes valid byte data, and the processing circuit corresponds to the matched reference data only when the reference data indicating that the valid byte data is valid matches the target data. 5. The programmable processor unit according to claim 4, wherein flag data and reference destination table data are selected.

The programmable processor unit according to claim 4, wherein the flag data indicates whether or not the keyword is included in the stream data.

5. The programmable processor unit according to claim 4, wherein the memory stores a set of data including reference data, reference table data, and flag data corresponding to a plurality of keywords.