JPH04315259A

JPH04315259A - Character string collation processing system

Info

Publication number: JPH04315259A
Application number: JP3108901A
Authority: JP
Inventors: Sueji Miyahara; 末治宮原
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1991-04-12
Filing date: 1991-04-12
Publication date: 1992-11-06

Abstract

PURPOSE:To make character string retrieval speedy and flexible by storing document data to be retrieved while dividing them for the unit of a line and executing the retrieval by an in-line character string collation processing and an inter-line character string collation processing when retrieving character strings by parallel processors. CONSTITUTION:A parallel character string collation part 72 is equipped with plural processor elements 721 and executes parallel processings. The character string same as a retrieving word inputted from the outside is detected out of the document data to be detected in a data base. The document data are stored in memory areas 74, which can be directly accessed by the processor elements 721, while being divided for the unit of the line. In the in-line character string collation, the data to be retrieved are read into the processor elements 721 and the character string retrieval is executed by the collation for the unit of a character in the retrieving word. When there is possibility for the retrieving word to be existent over an (m) line and an (m+1) line, a prescribed character component at the head of the (m+1) line is linked to the end of the (m) line, one line is formed and the collation is executed.

Description

[Detailed description of the invention]

【０００１】0001

【産業上の利用分野】本発明は、自然言語の文書データ
を文字コード列として蓄積した文書データベースの中か
ら、所望の文字列を高速に検索することを可能にした文
字列照合処理方式に関するものである。[Industrial Application Field] The present invention relates to a character string matching processing method that makes it possible to quickly search for a desired character string from a document database that stores natural language document data as character code strings. It is.

【０００２】0002

【従来の技術】従来、プロセッサエレメント（ＰＥ）を
複数個有する単一命令多重データ処理（ＳＩＭＤ（Ｓｉ
ｎｇｌｅ　Ｉｎｓｔｒｕｃｔｉｏｎ　Ｍｕｌｔｉｐｌｅ
　Ｄａｔａ））形の並列処理装置に外部から入力された
所望の文字列である検索単語と同一、あるいは類似の文
字列をデータベース内の被検索文である被検索文書デー
タ中から検出する文字列照合の処理において、文書デー
タベースの中から所望の文字列を高速に検索する技術と
しては、専用のハードウェアによって、文字列相互間の
照合を行うように構成したものが多かった（たとえば「
あいまい検索が可能な文字列検索ＬＳＩ」：日経エレク
トロニクス、１９８７．６．１，Ｎｏ４２２）　。2. Description of the Related Art Conventionally, single instruction multiple data processing (SIMD), which has a plurality of processor elements (PEs),
ngle Instruction Multiple
Character string matching that detects a character string that is the same as or similar to a search word, which is a desired character string input from the outside into a parallel processing device of type Data), from searched document data, which is a searched sentence in a database. In the processing of
"Character string search LSI capable of ambiguous search": Nikkei Electronics, June 1, 1987, No. 422).

【０００３】この原理を図示すると図８に示すような文
字列照合処理装置になる。[0003] This principle is illustrated in a character string collation processing device as shown in FIG.

【０００４】図８において、１は検索単語入力部、２は
文字列照合部、３は文書データ格納部（データベース）
、４は照合結果格納部、５はデータ転送バス、６は照合
処理制御部である。なお（２０）１　……（２０）Ｎ　
は個別単語照合部を示す。In FIG. 8, 1 is a search word input section, 2 is a character string matching section, and 3 is a document data storage section (database).
, 4 is a verification result storage section, 5 is a data transfer bus, and 6 is a verification processing control section. Note that (20) 1 ... (20) N
indicates the individual word matching section.

【０００５】文字列照合部２の詳細な構成を図９に示す
。[0005] A detailed configuration of the character string matching section 2 is shown in FIG.

【０００６】図９において２０は個別単語照合部、２１
は検索単語格納部、２２は被検索データ保持部、２３は
文字列一致検出部、２４は検索単語入力部、２５は被検
索データ入力部、２６は一致情報出力部である。In FIG. 9, 20 is an individual word matching unit;
22 is a search word storage section, 22 is a search data holding section, 23 is a character string match detection section, 24 is a search word input section, 25 is a search data input section, and 26 is a match information output section.

【０００７】この装置を動作するには、図８に示す検索
単語入力部１から、検索単語（ここでは例として文字列
と云う単語を使用する）“文字列”が入力されたとする
と、その単語は一旦文字列照合部２の中の検索単語格納
部２１の各レジスタ（２１１，２１２，・・・）に格納
され、被検索データが転送されて来るのを待つ。To operate this device, if a search word (here, the word character string is used as an example) is input from the search word input section 1 shown in FIG. is temporarily stored in each register (211, 212, . . . ) of the search word storage unit 21 in the character string matching unit 2, and waits for the data to be searched to be transferred.

【０００８】被検索データは文書データ格納部３の中か
ら順次読み出され、被検索データ入力部２５を経由して
被検索データ保持部２２の各レジスタ（２２１，２２２
，・・・）に送られる。The searched data is sequentially read out from the document data storage section 3 and sent to each register (221, 222) of the searched data holding section 22 via the searched data input section 25.
,...).

【０００９】被検索データ保持部２２では被検索データ
を１文字づつ入力し、そのつど文字列一致検出部２３の
比較器（２３１，２３２，・・・）で、検索単語のデー
タとこれまでに入力した被検索データとの文字列が一致
するか否かを判定し、一致した時点で一致情報出力部２
６から一致情報を出力し、一致位置の情報を照合結果格
納部４に記録する。[0009] In the searched data holding unit 22, the searched data is inputted one character at a time, and each time, the comparators (231, 232, . . .) of the character string match detection unit 23 compare the data of the search word with the previous one. It is determined whether or not the character string matches the input search data, and when a match is found, the matching information output unit 2
Matching information is output from 6, and matching position information is recorded in the matching result storage section 4.

【００１０】図１０，図１１は文字列照合の際の個別単
語照合部２０の処理内容の状態変化の様子を示したもの
である。FIGS. 10 and 11 show how the status of the processing contents of the individual word matching unit 20 changes during character string matching.

【００１１】すなわち、例として検索単語格納部２１に
検索単語“文字列”が設定された後、被検索データ“高
速に文字列を走査して………”が被検索データ保持部２
２に入力されて、その中に存在する“文字列”が検出さ
れるまでの個別単語照合部２０内の処理過程の様子を示
している。That is, for example, after the search word "character string" is set in the search word storage section 21, the searched data "scanning the string at high speed..." is stored in the searched data holding section 2.
2 shows the process of processing within the individual word matching unit 20 until a "character string" present in the text is detected.

【００１２】図１０は、検索単語と同じ被検索データの
“文字列”が個別単語照合部２０の被検索データ保持部
２２に入力を開始した時点の状態図である。FIG. 10 is a state diagram at the time when a "character string" of the searched data that is the same as the search word starts being input into the searched data holding section 22 of the individual word matching section 20.

【００１３】図１１は、図１０の状態から３つの文字が
被検索データ保持部２２に入力されて検索単語との文字
列一致が検出された状態を示す。FIG. 11 shows a state in which three characters have been input to the searched data holding section 22 from the state shown in FIG. 10, and a character string match with the search word has been detected.

【００１４】[0014]

【発明が解決しようとする課題】図８に示した従来方式
の被検索データは、入力系にデータ幅の狭い一組のデー
タ転送バス５を通して送られてくるため、データ転送バ
ス５の転送速度がネックになって、検索速度の速いもの
が得られない。[Problems to be Solved by the Invention] In the conventional method shown in FIG. becomes a bottleneck, making it impossible to obtain fast search speeds.

【００１５】また、文字列照合部２が、ハードウェアで
固定化されているので、検索単語長が長くなった場合な
どの検索条件の変化に対して柔軟に対処できない、など
の問題があった。[0015] Furthermore, since the character string matching unit 2 is fixed in hardware, there are problems such as the inability to flexibly deal with changes in search conditions such as when the search word length becomes longer. .

【００１６】本発明の目的は従来の問題点を解決し、文
字列の検索処理において、汎用の並列処理装置を効率よ
く運用することにより、高速に文字列検索を行なうこと
ができる文字列照合方式を提供することにある。An object of the present invention is to solve the conventional problems and to provide a character string matching method that can perform character string searches at high speed by efficiently operating a general-purpose parallel processing device in character string search processing. Our goal is to provide the following.

【００１７】[0017]

【課題を解決するための手段】本発明は上記目的を達成
するため、プロセッサエレメントを複数個有する単一命
令多重データ処理形の並列処理装置に外部から入力され
た所望の文字列である検索単語と同一、あるいは類似の
文字列をデータベース内の被検索文である被検索文書デ
ータ中から検出する文字列照合の処理において、前記プ
ロセッサエレメントが直接アクセス可能なメモリ領域に
文書データを該プロセッサエレメントの個数単位である
ライン単位に分割して格納しておき、検索の要求に対し
て検索単語との一致を照合判定するライン内文字列照合
工程と、ライン間にまたがる文字列を照合するためのラ
イン間文字列照合工程とを有することを特徴とする。[Means for Solving the Problems] In order to achieve the above object, the present invention provides a search word that is a desired character string that is externally input to a single-instruction multiple data processing type parallel processing device having a plurality of processor elements. In the process of character string matching to detect a character string that is the same as or similar to the searched document data that is the searched sentence in the database, the document data is stored in a memory area that can be directly accessed by the processor elements. The in-line character string matching process is divided into line units, which are stored, and is compared and determined to match the search word in response to a search request, and the line-to-line string matching process is used to match character strings that span between lines. The method is characterized by comprising a character string matching step.

【００１８】また、プロセッサエレメントを複数個有す
る単一命令多重データ処理形の並列処理装置に外部から
入力された所望の文字列である検索単語と同一、あるい
は類似の文字列をデータベース内の被検索文である被検
索文書データ中から検出する文字列照合の処理において
、前記並列処理装置の各プロセッサエレメントが２次元
に配置されており、該プロセッサエレメント相互が上下
、左右、上端が下端と接続され、左端と右端とが段違い
に接続されている並列処理構成を備え、ライン間にまた
がる文字列を照合するためのライン間文字列照合工程が
検索単語の長さを許容するように注目ラインの文字列と
次段ラインの文字列とのマージ量を決めてマージし、新
たなライン文字列を生成する文字列マージ工程と、ライ
ン間にまたがる文字列の照合のため新たなライン文字列
を移動させてライン内に存在する文字列と同じ文字列の
状態にする文字列移動処理工程と、前記プロセッサエレ
メントが直接アクセス可能なメモリ領域に文書データを
該プロセッサエレメントの個数単位であるライン単位に
分割して格納しておき、検索の要求に対して検索単語と
の一致を照合判定するライン内文字列照合工程と、検出
された文字列位置から元の文書データ内の文字列位置を
算出する文字列位置補正処理工程とから成ることを特徴
とする態様は有効である。[0018] Furthermore, a character string that is the same as or similar to a search word that is a desired character string that is externally input to a single-instruction multiple data processing type parallel processing device that has a plurality of processor elements is searched for in a database. In the process of character string matching detected from search target document data, which is a sentence, each processor element of the parallel processing device is arranged two-dimensionally, and the processor elements are connected vertically, horizontally, and the top end is connected to the bottom end. , has a parallel processing configuration in which the left end and right end are connected at different levels, and the line-to-line character string matching process for matching character strings that span between lines matches the characters in the line of interest to allow the length of the search word. A string merging process that determines the amount of merging between a column and the next line's string and generates a new line string, and moves the new line string to match the string that spans between lines. a character string movement processing step to make the character string the same as the character string existing in the line, and dividing the document data into a memory area that can be directly accessed by the processor element into units of lines corresponding to the number of processor elements. An in-line character string matching process that stores and matches search words in response to a search request, and a character string position that calculates the character string position in the original document data from the detected character string position. An embodiment characterized by comprising a correction processing step is effective.

【００１９】[0019]

【作用】本発明は、従来の文字列照合装置とは異なった
装置構成、すなわち、装置内プログラムの入れ替えによ
って処理の内容を自由に変更できる汎用の並列処理装置
を用いて、高速で、かつ柔軟な文字列照合処理を実現す
るもので、汎用の並列処理装置を用いて高速な文字列照
合ができ、被検索データを格納したデータ領域のライン
間にまたがる文字列に対しても高速に検索できる。[Operation] The present invention uses a device configuration different from conventional character string matching devices, that is, a general-purpose parallel processing device that can freely change the processing content by replacing the program in the device, and is capable of achieving high speed and flexibility. This enables high-speed string matching using a general-purpose parallel processing device, and allows high-speed searches for strings that span between lines in the data area that stores the data to be searched. .

【００２０】以下図面にもとづき実施例について説明す
る。Embodiments will be described below based on the drawings.

【００２１】[0021]

【実施例１】図１は本発明の実施例を説明する図であっ
て、１は検索単語入力部、３は文書データ格納部（デー
タベース）、４は照合結果格納部、５はデータ転送バス
、６は照合処理制御部、７は並列処理部、７１は照合処
理プログラム格納部、７２は文字列並列照合部、７３は
大容量データバス、７４は検索情報格納部（照合データ
メモリ）、７２１、７２２、………、７２Ｒはプロセッ
サエレメントである。[Embodiment 1] FIG. 1 is a diagram illustrating an embodiment of the present invention, in which 1 is a search word input section, 3 is a document data storage section (database), 4 is a matching result storage section, and 5 is a data transfer bus. , 6 is a collation processing control unit, 7 is a parallel processing unit, 71 is a collation processing program storage unit, 72 is a character string parallel collation unit, 73 is a large-capacity data bus, 74 is a search information storage unit (verification data memory), 721 , 722, ......, 72R are processor elements.

【００２２】なお７４１は検索単語格納領域、７４２は
被検索データ格納領域、７４３は照合結果一次格納領域
、７４４は照合結果二次格納領域を示す。Note that 741 indicates a search word storage area, 742 a searched data storage area, 743 a matching result primary storage area, and 744 a matching result secondary storage area.

【００２３】図２，図３は並列処理部７の一部（照合処
理系）の詳細図であり、図２はＲ個のプロセッサエレメ
ント（７２１，７２２，・・・，７２Ｒ）を２次元的（
エリア）に配置した様子を示した図である。2 and 3 are detailed diagrams of a part of the parallel processing unit 7 (verification processing system), and FIG. 2 shows a two-dimensional diagram of R processor elements (721, 722, . . . , 72R). (
FIG.

【００２４】図３は図２のプロセッサエレメント（７２
１，７２２，・・・，７２Ｒ）を一次元（ライン）的に
使用する場合の様子を概念的に示した図である。FIG. 3 shows the processor element (72
1,722, . . . , 72R) are used in a one-dimensional (line) manner.

【００２５】このとき、検索情報格納部７４は検索単語
格納領域７４１と被検索データ格納領域７４２とに分割
して使用され、文字列並列照合部７２のプロセッサエレ
メント（７２１，７２２，・・・，７２Ｒ）とのデータ
の送受はプロセッサエレメントの個数Ｒに相当する組数
分用意された大容量データバス７３を経由して行なわれ
る。At this time, the search information storage section 74 is used by being divided into a search word storage area 741 and a searched data storage area 742, and the processor elements (721, 722, . . . 72R) is carried out via large-capacity data buses 73 prepared for the number of sets corresponding to the number R of processor elements.

【００２６】これを動作するには、予め文書データ格納
部３から被検索データを並列処理部７の被検索データ格
納部７４２に文書データを転送しておき、検索単語入力
部１からの検索単語の入力に備える。To operate this, the document data to be searched is transferred from the document data storage section 3 to the searched data storage section 742 of the parallel processing section 7 in advance, and the search word from the search word input section 1 is transferred. Prepare for input.

【００２７】次に、検索単語入力部１から検索単語“文
字列”が入力されると、その単語は並列処理部７内の検
索単語格納領域７４１に格納された後、直ちに照合処理
が実行される。Next, when the search word "character string" is input from the search word input section 1, the word is stored in the search word storage area 741 in the parallel processing section 7, and then a matching process is immediately executed. Ru.

【００２８】図４は照合処理工程の中のライン内文字列
照合工程を説明する図であり、プロセッサエレメントが
ラインとしてはＲ＝２５６個、すなわちエリアとしては
（１６×１６）個で構成されている場合を仮定して図示
している。FIG. 4 is a diagram illustrating the in-line character string matching process in the matching process, in which the processor elements are composed of R=256 lines, that is, (16×16) areas. The illustration is based on the assumption that

【００２９】この図における処理の第１段階は文字列の
並列処理部７のデータ配置を示したものであり、これは
被検索データである“高速に文字列を照合して………の
文字列を検索する………”という文の中から検索単語“
文字列”の存在する位置を検出するための照合開始の初
期状態を示したものである。The first stage of processing in this figure shows the data arrangement of the character string parallel processing unit 7, which is the data to be searched for, ``Quickly collate character strings...'' Search for column……” search word in the sentence “
This shows the initial state of the start of matching to detect the position where the character string "exists."

【００３０】被検索データはプロセッサエレメントの個
数単位に分割されて“高速に………”から“………て文
字”までが一つのラインとして記録され、“列を検索す
る………”以降は次のラインとして検索情報格納部７４
の被検索データ格納領域７４２に記録保持される。[0030] The searched data is divided into units of the number of processor elements, and the lines from "High speed..." to "...characters" are recorded as one line, and after "Search column......" is the search information storage section 74 as the next line.
It is recorded and held in the searched data storage area 742 of .

【００３１】検索単語は検索単語格納領域７４１の中に
各プロセッサエレメント７２１の深さ方向に記録されて
照合の処理に備える。The search word is recorded in the search word storage area 741 in the depth direction of each processor element 721 in preparation for verification processing.

【００３２】次に第２段階の処理、すなわちライン内文
字列照合工程では、図４、第１段階に示す状態から被検
索データをプロセッサエレメント７２１に読み込んだ後
、検索単語との照合を行なう。Next, in the second stage of processing, that is, the in-line character string matching step, the data to be searched is read into the processor element 721 from the state shown in the first stage in FIG. 4, and then matched with the search word.

【００３３】このとき被検索データと検索単語の照合は
文字単位（１６ビット単位）で行ない、検索単語の先頭
文字から順次行なっていく。At this time, the data to be searched and the search word are compared in units of characters (in units of 16 bits), starting from the first character of the search word.

【００３４】照合結果は、図４、第２段階に示すように
照合結果一次格納領域７４３に一致（０）、不一致（１
）の形で格納される。The matching results are stored in the matching result primary storage area 743 as shown in the second stage of FIG.
) is stored in the form.

【００３５】この図では検索単語長ｎが３文字から成る
“文字列”で構成されている場合を示している。This figure shows a case where the search word length n is composed of a "character string" consisting of three characters.

【００３６】すなわち、一致文字が存在することを示す
“０”が１プロセッサエレメントづつずらした形で出現
することになる。That is, "0" indicating the existence of a matching character appears shifted by one processor element.

【００３７】この状態から検索単語と同一の文字列位置
を検出するには、検索単語のα文字目に対応する照合結
果一次格納領域７４３の各行の値を（α−１）個づつ左
側にシフトし、右端からは“０”をつめる。To detect the same character string position as the search word from this state, shift the values in each row of the matching result primary storage area 743 corresponding to the α-th character of the search word to the left by (α-1). Then, fill in "0" from the right end.

【００３８】この処理によって図４、第３段階に示すよ
うに、照合結果二次格納領域７４４の検索単語に対応す
る各行において一致文字列の先頭の位置に相当する部分
に“０”がプロセッサエレメントの深さ方向に並ぶこと
になる。As a result of this process, as shown in the third step of FIG. 4, "0" is placed in the processor element at the position corresponding to the beginning of the matched character string in each line corresponding to the search word in the secondary matching result storage area 744. are lined up in the depth direction.

【００３９】次に、図４、第３段階の照合結果二次格納
領域７４４内のライン間のＡＮＤをとることによって、
“０”が存在する位置から検索単語と同一の文字列が存
在することが分かる。Next, by performing an AND between the lines in the verification result secondary storage area 744 of the third stage in FIG.
It can be seen from the position where "0" exists that the same character string as the search word exists.

【００４０】この処理によってライン内（２５６文字内
）に検索単語と同一の文字列が存在すること、および末
尾の（ｎ−１）文字を検査する事によってライン間にお
ける検索単語の存在の可能性を知ることができる。[0040] Through this process, it is determined that the same character string as the search word exists within the line (within 256 characters), and by checking the last (n-1) characters, there is a possibility that the search word exists between the lines. can be known.

【００４１】図５はライン間文字列照合工程を説明する
ための図であって、図５の被検索データ〔Ｉ〕は被検索
データのｍライン目の末尾が“………て文字”で終り、
（ｍ＋１）ライン目が“列を………”で始まり、検索単
語がｎ文字（ここではｎ＝３の“文字列”）から成る場
合の文字列照合処理を示す。FIG. 5 is a diagram for explaining the line-to-line character string matching process, and the searched data [I] in FIG. end,
The character string matching process is shown when the (m+1)th line starts with "String..." and the search word consists of n characters (here, n=3 "character string").

【００４２】たとえば、ｍライン目のライン内文字列照
合処理工程において照合結果二次格納領域７４４の末尾
から（ｎ−１）個のプロセッサエレメント内に“０”が
存在する場合、図５のワークライン上の被検索データ〔
ＩＩ〕に示すように（ｍ＋１）ライン目の被検索データ
を先頭からＸ文字分（Ｘはｎよりも大きい値、ここでは
例として１６を選ぶ）ｍライン目のデータが書き込まれ
たワークラインの被検索データに上書きする（この被検
索データをｍ＋と呼ぶことにする）。For example, if "0" exists in the (n-1) processor elements from the end of the matching result secondary storage area 744 in the m-th line in-line character string matching process, the work in FIG. Searched data on the line [
II], the data to be searched on the (m+1)th line is written for X characters from the beginning (X is a value larger than n, in this case 16 is chosen as an example). Overwrite the searched data (this searched data will be referred to as m+).

【００４３】次に、このｍ＋ライン目の被検索データを
左方向へ移動し、あふれた左端のデータを右端から入力
してやれば、図５の１ＰＥ移動〔ＩＩＩ　〕に示すよう
に分離されていた文字列が連続した文字列となる。Next, by moving this m+th line of searched data to the left and inputting the overflowing left end data from the right end, the characters that were separated as shown in 1PE movement [III] in FIG. The column becomes a continuous string.

【００４４】この処理を本装置で実現するには、図６に
示すような処理を行なうことになる。[0044] In order to realize this process with this apparatus, the process shown in FIG. 6 will be performed.

【００４５】すなわち、例として検索単語を１６文字以
内とすると、まず、ｍライン目の被検索データをワーク
ラインにコピーする。That is, assuming that the search word is 16 characters or less, first, the m-th line of searched data is copied to the work line.

【００４６】次に、ｍ＋１ライン目の被検索データを呼
び出しつつ先頭から１６文字分のデータをワークライン
にコピーする。Next, while calling the data to be searched on the (m+1)th line, data for 16 characters from the beginning is copied to the work line.

【００４７】次に、プロセッサエレメントを（１６×１
６）のエリアと見なしてワークラインのデータを１段、
すなわち１６個のデータを同時に１プロセッサエレメン
トだけ前段にシフトする（図５の１ＰＥ移動〔ＩＩＩ　
〕と等しいデータ構造になる）。Next, the processor element is (16×1
6) The work line data is considered as the area of 1 stage,
In other words, 16 pieces of data are simultaneously shifted to the previous stage by one processor element (1PE movement [III
).

【００４８】次に、このラインに対してライン内文字列
照合処理を行なえば、検索単語と一致する文字列が存在
するか否かと、存在する場合の存在位置とが検出できる
。Next, by performing in-line character string matching processing on this line, it is possible to detect whether or not a character string that matches the search word exists, and if so, the location of the character string.

【００４９】最後に、この検索結果に対してＸ文字分の
位置補正を行なうことにより、ライン間にまたがる文字
列が検出できる。Finally, by performing positional correction for X characters on this search result, character strings spanning between lines can be detected.

【００５０】[0050]

【実施例２】一般的な汎用の並列処理装置は、ライン上
の情報を部分的に取り出したり、部分的に書き変えたり
する処理が一回の処理でできない場合が多い。Embodiment 2 In general general-purpose parallel processing devices, it is often impossible to partially extract or partially rewrite information on a line in a single process.

【００５１】このような場合、すなわちプロセッサエレ
メントが完全に並列動作するような場合のライン間文字
列照合処理は、図７に示すようなライン間文字列照合処
理の処理フローに従って一致文字列を検出する。In such a case, that is, when the processor elements operate completely in parallel, the line-to-line string matching process detects matching strings according to the processing flow of the line-to-line string matching process shown in FIG. do.

【００５２】すなわち、実施例１と同じ条件において、
照合結果二次格納領域７４４の末尾（ｎ−１）文字以内
に“０”が存在した場合、たとえば、検索単語を１６文
字以内とした場合はプロセッサエレメントが（１６×１
６）の処理において、まず、ｍライン目の被検索データ
を１段分すなわち１６個のデータを同時に１ＰＥだけ前
方にシフトし、次に、最終段のプロセッサエレメントＰ
Ｅ（１６個のデータ）にマスクをかけてクリヤしてデー
タＡを得る。That is, under the same conditions as Example 1,
If “0” exists within the last (n-1) characters of the matching result secondary storage area 744, for example, if the search word is within 16 characters, the processor element
In the process 6), first, the m-th line of searched data is shifted forward by 1 PE by one stage, that is, 16 pieces of data, and then the final stage processor element P
E (16 pieces of data) is masked and cleared to obtain data A.

【００５３】次にｍ＋１ライン目の被検索データを１段
分だけ後方にシフトした後、最終段のプロセッサエレメ
ントＰＥを残してそれ以外のデータにマスクをかけてク
リヤしてデータＢを得る。Next, after the data to be searched on the (m+1)th line is shifted backward by one stage, data B is obtained by masking and clearing the other data, leaving the processor element PE at the last stage.

【００５４】次にデータＡとデータＢとのＯＲを採った
後にライン内文字列照合処理を行なえば、検索単語と一
致する文字列が存在するか否かが検出でき、この検出結
果を用いて文字列の存在位置を補正すればライン間にま
たがる文字列が検出できる。Next, by performing an in-line character string matching process after ORing data A and data B, it is possible to detect whether or not a character string that matches the search word exists. By correcting the position of the character string, character strings spanning between lines can be detected.

【００５５】[0055]

【実施例３】ライン間文字列照合の処理、すなわちライ
ン間にまたがる検索単語の存在の有無は、ｍライン目の
照合結果二次格納領域７４４の末尾（ｎ−１）文字以内
に“０”が存在することと、ｍ＋１ライン目の文字列並
びによって知ることができる。[Embodiment 3] Line-to-line character string matching processing, that is, the presence or absence of a search word spanning between lines is determined by "0" within the last (n-1) characters of the m-th line matching result secondary storage area 744. It can be known from the existence of the character string on the m+1th line.

【００５６】すなわち、被検索データのｍライン目にお
いて末尾から（ｎ−１）文字内に検索単語の先頭からα
個文字が一致したものとして検出された場合、（ｍ＋１
）ライン目のライン内文字列照合において照合結果の一
次格納領域７４３を上のラインから順に（ｎ−１）個，
（ｎ−２）個，………，１個づつ左端からは０づめをし
て右方向にシフトし、プロセッサエレメントの（ｎ−α
）番目に“０”が縦方向に並べばｍライン目の末尾から
α文字目の文字から検索単語と同じ文字列が始まること
になり、検索単語が検出できる。That is, in the m-th line of the searched data, there are α characters from the beginning of the search word within (n-1) characters from the end.
If individual characters are detected as a match, (m+1
) The primary storage area 743 for the matching results in the line 2-th line character string matching is (n-1) in order from the top line,
(n-2) pieces, ......, one by one from the left end, shift to the right and (n-α
), if "0" is arranged vertically, the same character string as the search word will start from the α-th character from the end of the m-th line, and the search word can be detected.

【００５７】なお、実施例１〜３では、プロセッサエレ
メントが１６ビットで構成されている場合について記し
てきたが、プロセッサエレメントが１ビットで構成され
ている場合は、検索単語や被検索データを構成する文字
データを検索情報格納部７４に記録する際、データのビ
ット配置をプロセッサエレメントの深さ方向に格納すれ
ば、実施例１〜３と同様の処理によって文字列検索の処
理が実現できる。[0057] In Examples 1 to 3, the case where the processor element is composed of 16 bits has been described, but when the processor element is composed of 1 bit, the search word or searched data is composed of When character data to be searched is recorded in the search information storage section 74, if the bit arrangement of the data is stored in the depth direction of the processor element, character string search processing can be realized by the same processing as in the first to third embodiments.

【００５８】[0058]

【発明の効果】以上説明したように、本実施例において
本発明は次のような特長を持つ。As explained above, the present invention has the following features in this embodiment.

【００５９】（イ）汎用の並列処理装置を用いて高速な
文字列照合ができる。(a) High-speed character string matching is possible using a general-purpose parallel processing device.

【００６０】（ロ）被検索データを格納したデータ領域
のライン間にまたがる文字列に対しても高速に検索でき
る。(b) Character strings extending between lines of the data area storing the data to be searched can be searched at high speed.

【００６１】従って、本発明は、■検索時間が短くてよ
い、■検索漏れがないなどの利点を持つ。Therefore, the present invention has advantages such as (1) short search time and (2) no omissions in the search.

[Brief explanation of drawings]

【図１】本発明の文字列照合処理方式の構成図である。FIG. 1 is a configuration diagram of a character string matching processing method according to the present invention.

【図２】文字列並列照合部の構成その１である。FIG. 2 shows the first configuration of a string parallel matching unit.

【図３】文字列並列照合部の構成その２である。FIG. 3 is a second configuration of a string parallel matching unit.

【図４】文字列並列照合の処理工程を説明するための状
態図である。FIG. 4 is a state diagram for explaining processing steps for parallel character string matching.

【図５】ライン間文字列照合工程を説明するための状態
図である。FIG. 5 is a state diagram for explaining a line-to-line character string matching process.

【図６】ライン間文字列照合工程を説明するための処理
フロー図である。FIG. 6 is a process flow diagram for explaining a line-to-line character string matching process.

【図７】マスク処理のない完全並列処理形のライン間文
字列照合の処理フロー図である。FIG. 7 is a processing flow diagram of line-to-line character string matching in a fully parallel processing type without mask processing.

【図８】文字列照合の従来方式の構成図である。FIG. 8 is a configuration diagram of a conventional method of character string matching.

【図９】従来方式の文字列照合部の構成である。FIG. 9 shows the configuration of a conventional character string matching section.

【図１０】従来方式の文字列照合部の状態変化の様子そ
の１である。FIG. 10 shows the first state change of the conventional character string matching section.

【図１１】従来方式の文字列照合部の状態変化の様子そ
の２である。FIG. 11 is the second state change of the conventional character string matching section.

[Explanation of symbols]

１　　検索単語入力部２　　文字列照合部３　　文書データ格納部４　　照合結果格納部５　　データ転送バス６　　照合処理制御部７　　並列処理部２０　　個別単語照合部２１　　検索単語格納部２２　　被検索データ保持部２３　　文字列一致検出部２４　　検索単語入力部２５　　被検索データ入力部２６　　一致情報出力部７１　　照合処理プログラム格納部７２　　文字列並列照合部７３　　大容量データバス７４　　検索情報格納部（照合データメモリ）７２１，
７２２，・・・，７２Ｒ　　プロセッサエレメント７４１　　検索単語格納領域７４２　　被検索データ格納領域７４３　　照合結果一次格納領域７４４　　照合結果二次格納領域1 Search word input section 2 Character string matching section 3 Document data storage section 4 Matching result storage section 5 Data transfer bus 6 Matching processing control section 7 Parallel processing section 20 Individual word matching section 21 Search word storage section 22 Searched data holding section 23 Character string match detection section 24 Search word input section 25 Searched data input section 26 Match information output section 71 Matching processing program storage section 72 Character string parallel matching section 73 Large capacity data bus 74 Search information storage section (matching data memory) 721,
722,...,72R Processor element 741 Search word storage area 742 Searched data storage area 743 Matching result primary storage area 744 Matching result secondary storage area

Claims

[Claims]

Claim 1: A character string that is the same as or similar to a search word that is a desired character string input externally to a single-instruction multiple data processing type parallel processing device having a plurality of processor elements is searched in a database. In the process of character string matching detected from searched document data, which is a sentence,
Document data is divided and stored in line units corresponding to the number of processor elements in a memory area that can be directly accessed by the processor element, and characters in the line are used to match and determine a match with a search word in response to a search request. A character string matching processing method comprising a column matching process and an interline character string matching process for matching character strings spanning between lines.

[Claim 2] A character string that is the same as or similar to a search word that is a desired character string that is externally input to a single-instruction multiple data processing type parallel processing device that has a plurality of processor elements is searched in a database. In the process of character string matching detected from searched document data, which is a sentence,
Each processor element of the parallel processing device is arranged two-dimensionally, and the processor elements are arranged above and below each other.
It has a parallel processing configuration in which the left and right ends are connected to the bottom end, and the left and right ends are connected at different levels. A string merging process that determines and merges the amount of merging between the string of the line of interest and the string of the next line so as to allow, and generates a new line string, and the process of matching strings that span between lines. A character string movement processing step that moves a new line character string to the same character string as the character string existing in the line, and stores document data in a memory area that can be directly accessed by the processor element in units of the number of processor elements. The character string matching process is performed by dividing and storing each line into line units, and comparing and determining a match with the search word in response to a search request. A character string matching processing method comprising a character string position correction processing step of calculating a column position.