JPH0272481A

JPH0272481A - Character string retrieving device by logical expression and control system for such device

Info

Publication number: JPH0272481A
Application number: JP63225223A
Authority: JP
Inventors: Tsunesuke Takahashi; 恒介高橋
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1988-09-07
Filing date: 1988-09-07
Publication date: 1990-03-12

Abstract

PURPOSE:To decrease storage capacity and to execute content retrieve by means of a logical expression efficiently and at high speed by representing variety in the expression of a retrieve character string by means of a conception code, and registering it. CONSTITUTION:A first memory means, whose access address is determined by the output of a character collating means 110 to discriminate to which retrieve character string out of the plural strings of a text record corresponds and which accumulates a concept code to be related to each retrieve character string, a mark bit register 130, that temporarily stores which concept code is included in units of a text register, and a second memory means 140, whose access address is determined by the output of the mark bit register 130 and which stores the logical expression of a retrieve condition, are provided. By unifying the character strings of synonyms to the concept code, the storage capacity necessary for the registration of the retrieve condition logical expression is reduced. Thus the storage capacity of the second memory means 140 is reduced, and the whole sentences can be promptly retrieved by the logical expression of the character string data.

Description

【発明の詳細な説明】（産業上の利用分野）本発明は特許や学術文献などのテキストデータの論理式
による内容検索を能率よく高速に処理する事ができる文
字列検索装置とその制御方式に関するものである。[Detailed Description of the Invention] (Field of Industrial Application) The present invention relates to a character string search device and its control method that can efficiently and quickly process content searches using logical formulas for text data such as patents and academic literature. It is something.

（従来の技術）特許や学術文献などのテキスト情報はＯＡ（オフィス、
オートメーション）機器の技術の進歩に合わせて増加の
一途にある。それ等は再利用されるために次々と大容量
化するファイルメモリに貯えられて行く。しかし、それ
らの内容を高速に検索して、必要な情報をとり出せるよ
うにするファイルデータのデータベース化は余りスムー
ズに進まない。すなわち、ファイルデータの各レコード
にその内容にマツチしたキーワードを付加し、且つ、キ
ーワード検索を高速化できるようなキーワードの並べ替
えをする作業が人手に頼っていて、容易に達成されない
からである。また、内容がマツチしていても、検索の時
に使うキーワードがレコードに付加するキーワードと一
致しない用語のばらつきの問題もあるからである。(Conventional technology) Text information such as patents and academic literature is available on OA (office,
Automation) continues to increase as equipment technology advances. In order to be reused, they are stored in a file memory whose capacity increases one after another. However, creating a database of file data that allows users to quickly search their contents and extract the necessary information is not progressing smoothly. That is, the work of adding a keyword that matches the content to each record of file data and rearranging the keywords so as to speed up the keyword search requires manual labor and is not easily accomplished. Furthermore, even if the contents match, there is also the problem of variations in the keywords used during the search that do not match the keywords added to the records.

したがって、キーワードを付加したり、並べ替えたりの
作業なしで、ファイルメモリの中のテキストデータを高
速に検索できないかと、リアルタイムな文字列検索の色
々な技法が研究されていた。Therefore, various techniques for real-time character string searches have been studied to find ways to quickly search text data in file memory without adding keywords or rearranging data.

一般に、文字列検索装置はキーワードとして与えらえる
検索文字列がテキストデータのどこに含まれるかを探し
出す機能を持ったものである。この機能を実現するため
には、テキストデータの中の各文字列を複数個の検索文
字列と一斉に比較照合する文字列照合手段が必須となる
。Generally, a character string search device has a function of finding out where in text data a search character string given as a keyword is included. In order to realize this function, it is essential to have a string matching means that simultaneously compares and matches each string in text data with a plurality of search strings.

そこで、本願の発明者等は任意の長さの文字列を複数個
記憶する連想メモリ部と入力文字列の各文字に対する連
想メモリ部の文字比較結果を並列に受付けて、入力文字
列と記憶文字列間の文字の並び（順序）の比較を行なう
順序ロジック部とで文字列照合を可能とする構成の記号
列識別装置の特許出願を行った。名称が異なるが、以上
の装置が文字列検索装置の基本要素となる。（特開昭６
０−２１１５３９号公報（特願昭５９−０６８４９５号
明細書）、特開昭６１−２５３５３６号公報（特願昭６
０−９６２１３号明細書）など）このような文字列検索
装置は検索したいキーワード（検索文字列）を登録した
後、被検索テキスト文字列を入力すると、テキスト文字
列が検索文字列のいずれかに一致した時にマツチ信号を
発生し、テキスト文字列のどの部分がどの検索文字列に
マツチしたかを出力する。したがって、キーワードのつ
いていないテキスト文字列データの全文検索（フルテキ
ストサーチ）が可能となる。すなわち、ファイル記憶さ
れるテキスト文字列情報にキーワードを付加するなどの
作業を伴わないで、テキストファイル情報をデータベー
ス化できる。Therefore, the inventors of the present application received in parallel the associative memory unit that stores a plurality of character strings of arbitrary lengths, and the character comparison results of the associative memory unit for each character of the input character string. We have filed a patent application for a symbol string identification device that is configured to enable character string matching using a sequence logic unit that compares the arrangement (order) of characters between strings. Although the names are different, the above devices are the basic elements of a character string search device. (Unexamined Japanese Patent Publication No. 6
0-211539 (Specification of Japanese Patent Application No. 59-068495), Japanese Unexamined Patent Publication No. 61-253536 (Specification of Japanese Patent Application No. 1983)
0-96213), etc.) In such a character string search device, after registering a keyword (search character string) to be searched, when a text string to be searched is input, the text string matches one of the search strings. When a match occurs, it generates a match signal and outputs which part of the text string matches which search string. Therefore, it is possible to perform a full-text search on text string data without keywords. That is, text file information can be created into a database without adding keywords to text string information stored in a file.

しかしながら、全文検索（フルテキストサーチ）方式に
よる情報検索は、キーワードを付加したテキスト文字列
データのキーワード部分のみの検索で検索条件にマツチ
したテキスト文字列情報のファイル格納位置をみつけ出
すインデックスサーチ方式に比べて、はるかに長い検索
時間を必要とする。検索条件論理式の照合を考え合わせ
ると、さらに検索処理速度が低下する。ソフトウェアの
力を借りて照合を行なう場合には、速度低下が顕著とな
り、フルテキストサーチ方式の実用化が困難になる。However, information retrieval using the full text search method is an index search method in which the file storage location of text string information that matches the search conditions is found by searching only the keyword part of text string data with keywords added. requires much longer search time. If consideration is given to matching the search condition logical expressions, the search processing speed will further decrease. When matching is performed with the help of software, there is a noticeable drop in speed, making it difficult to put the full-text search method into practical use.

（発明が解決しようとする課題）従来の文字列検索装置は検索条件の論理式に含まれるキ
ーワード（検索文字列）とテキストデータの文字列との
照合を行なうのみで、検索条件の論理式に合っているか
の照合についてはソフトウェア（プログラム）の助けを
借りていた。そのために、文字列照合は高速に達成され
ても、検索条件の論理式に合ったか否かの判定までの時
間は長ずざるという問題があった。(Problem to be Solved by the Invention) Conventional character string search devices only match the keyword (search character string) included in the logical expression of the search condition with the character string of text data. They relied on software (programs) to check whether they matched. Therefore, even if character string matching is achieved at high speed, there is a problem in that it takes a long time to determine whether or not the logical expression of the search condition is met.

また、１つの検索条件理論式ではそこに含まれる検索文
字列に多くの同義語（シノニム）があって、検索要求に
合ったテキストレコードを見つけ出せない。そのような
事のないように、同義語の検索文字列を全て登録し、且
つ、それ等を含む検索条件の論理式を積和標準形で表わ
すと、積項の数が非常に多くなり、それを登録するハー
ドウェア量が多くなりすぎるという問題があった。Furthermore, in one search condition theoretical expression, the search string contained therein has many synonyms, making it impossible to find a text record that meets the search request. To avoid such a situation, if you register all the search strings for synonyms and express the logical expression of the search condition including them in the product-sum standard form, the number of product terms will be very large. There was a problem in that the amount of hardware that had to be registered was too large.

本発明の目的は上記２つの問題を解決する文字列技術を
提供することにある。An object of the present invention is to provide a character string technique that solves the above two problems.

（課題を解決するための手段）したがって、本発明は複数検索文字列を記憶し、テキス
トレコードの入力文字列が複数検索文字列のいずれに一
致するかを判別する文字列照合手段と、前記文字列照合
手段の出力によってアクセス番地が決まり、各検索文字
列に関連づけられる概念コードを貯える第１記憶手段と
テキストレコード単位でどの概念コードが含まれたかを
一時記憶するマークビットレジスタと、前記マークビッ
トレジスタの出力によってアクセス番地が決まり、検索
条件の論理式を記憶する第２記憶手段とを備えた論理式
による文字列検索装置であって、この文字列検索装置の
制御方式は論理式に含まれる検索文字列とその概念コー
ドを対応づけて文字列照合手段と第１記憶手段に登録す
る文字列初期登録プロセスと、積和標準形に変換された
論理式の各積項に真（否定でない形）で含まれる検索文
字列に対する文字列照合手段と第１記憶手段の出力とマ
ークビットレジスタを用いて、第２記憶手段のアクセス
番地を決め、積項順に真の論理信号を書込んで行く論理
式初期登録プロセスと、新しい論理式に含まれる検索文
字列が文字列照合手段に登録された検索文字列にマツチ
しない時に、それを新検索文字列とし、その概念コード
と対応づけて文字列照合手段と第１記憶手段に追加登録
する検索文字列追加登録プロセスと、新しい論理式の各
積項に真で含まれる検索文字列に対する文字列照合手段
と第１記憶手段の出力とマークビットレジスタを用いて
、第２記憶手段のアクセス番地を決め、別の記憶領域に
積項順に真の論理信号を書込んで行く論理式追加登録プ
ロセスとを含む登録制御を行なう事を特徴とする。さら
に、上記の文字列検索装置の制御方式は検索条件論理式
を登録した後の全文検索に際して、文字列データをテキ
ストレコード単位で順次に文字列照合手段で受付け、文
字列マツチングのあった時に、第１記憶手段をアクセス
し、その出力をマークビットレジスタに設定する文字列
照合プロセスと、テキストレコードの切替り目にマーク
ビットレジスタの出力を用いて第２記憶手段をアクセス
し、論理式に対する照合結果を出力する論理式検索プロ
セスと、第２記憶手段のアクセス後に登録概念コードと
対応したマークビットレジスタのみをリセット状態に戻
し、残りをマークされた状態のままとするリセットプロ
セスとを含む検索制御を行なう事を特徴とする。(Means for Solving the Problems) Therefore, the present invention provides a string matching means that stores a plurality of search strings and determines which of the plurality of search strings an input string of a text record matches; An access address is determined by the output of the column matching means, and a first storage means stores a concept code associated with each search character string, a mark bit register temporarily stores which concept code is included in each text record, and the mark bit. A character string search device using a logical formula, wherein an access address is determined by the output of a register, and a second storage means for storing a logical formula of a search condition, wherein a control method of this character string search device is included in the logical formula. A character string initial registration process in which a search character string and its concept code are associated with each other and registered in a character string matching means and a first storage means; ), the output of the first storage means and the mark bit register are used to determine the access address of the second storage means, and write true logic signals in the order of product terms. In the initial expression registration process, when the search string included in a new logical expression does not match the search string registered in the string matching means, it is used as a new search string, and the string is matched by matching it with the concept code. a search string additional registration process for additionally registering the search string in the means and the first storage means, and the output of the string matching means and the first storage means and the mark bit register for the search string that is true and included in each product term of the new logical expression. The method is characterized in that it performs registration control including a logical formula additional registration process in which the access address of the second storage means is determined using the second storage means, and true logical signals are written into another storage area in the order of product terms. Furthermore, in the control method of the above-mentioned string search device, when performing a full-text search after registering a search condition logical expression, the string matching means receives string data sequentially in text record units, and when string matching occurs, A character string matching process that accesses a first storage means and sets its output in a mark bit register; and a string matching process that accesses a second storage means using the output of the mark bit register at the time of text record switching, and performs a check against a logical expression. A search control including a logical expression search process that outputs a result, and a reset process that returns only the mark bit register corresponding to the registered concept code to the reset state after accessing the second storage means, and leaves the rest in the marked state. It is characterized by doing the following.

（作用）本発明では検索条件論理式を登録でき、それを登録した
後でテキストファイルメモリの文字列データ全文を検索
し、ソフトウェアを介さないで論理式にマツチしたテキ
ストレコードを見つけ出す事ができる。(Function) In the present invention, a search condition logical formula can be registered, and after registering it, the entire text string data in the text file memory can be searched to find a text record that matches the logical formula without using software.

また、本発明は同義語（シノニム）の多い文字列を含む
検索条件論理式を同義語の文字列を概念コードにまとめ
る事で、記憶手段への登録に要する記憶容量が少なくて
すむ。概念コードにまとめるとは例えば次のようなこと
である。ＧａＡｓを表現するのにＧａＡｓ、砒化ガリウ
ム、ガリウム砒素、ガリウムアーセナイド等の同義語が
あるが、これを例えば“００１０”といったコードに統
一することである。Further, according to the present invention, the storage capacity required for registration in the storage means can be reduced by combining the synonymous character strings into a concept code for a search condition logical expression that includes a character string with many synonyms. For example, what can be summarized in a conceptual code is as follows. There are synonyms for expressing GaAs, such as GaAs, gallium arsenide, gallium arsenide, and gallium arsenide, but the idea is to unify them into a code such as "0010".

概念コードを使った検索の一例として、５個の検索文字
列を含む検索条件論理式の登録を行なう場合を考えよう
。各検索文字列に５個のシノニムがあるとすると、５５
個の検索条件論理式が存在することになり、各論理式に
おける積項の数が２２５個になる。したがって、通常の
やり方では、２２５Ｘ５５個（約１０００億個）のメモ
リセルが１組の検索条件論理式の登録に必要である。概
念コードを利用すると、２５個の積項を持つ論理式を１
つ記憶すればよく、使用メモリセル数が２５個（３２個
）へ減る。これが検索条件論理式を積和標準形に変換し
て登録する効呆的な方法となる。以下、図面に従って、
本発明のより詳細な説明を行なう。As an example of a search using concept codes, let us consider a case where a search condition logical expression containing five search strings is registered. Assuming that each search string has 5 synonyms, 55
Therefore, there are 225 search condition logical expressions, and the number of product terms in each logical expression is 225. Therefore, in a normal manner, 225×55 (approximately 100 billion) memory cells are required to register one set of search condition logical expressions. Using the conceptual code, a logical formula with 25 product terms can be reduced to 1
The number of memory cells used is reduced to 25 (32). This is an effective method of converting the search condition logical formula into the product-sum standard form and registering it. According to the drawing below,
A more detailed explanation of the present invention will now be given.

（実施例）第１図に本発明の第１の実施例の構成を示す。文字列照
合手段１１０は連想メモリ（ＣＡＭ、コンテント・アド
レッサブル・メモリの略）と順序ロジック（ＦＳＡ。(Embodiment) FIG. 1 shows the configuration of a first embodiment of the present invention. The string matching means 110 uses content addressable memory (CAM, abbreviation for content addressable memory) and sequential logic (FSA).

ファイナイト・ステート・オートマトンの略）とエンコ
ーダとから成り、複数個の検索文字列を記憶でき、入力
文字列が複数個の記憶検索文字列のいずれに一致するか
を判別する。It consists of a finite state automaton (abbreviation for finite state automaton) and an encoder, can store a plurality of search strings, and determines which of the plurality of stored search strings an input string matches.

第１記憶手段１２０は文字列照合手段１１０の出力によ
ってアクセス番地が決まり、各検索文字列に関連づけら
れる概念コード１２１のＸデコーダ１２３でのデコード
結果と検索文字列のＸデコーダ１２２でのデコード結果
との関係の有無に対応した１、０のビット情報の形で記
憶する。したがって、メモリマトリクス内は概念コード
の記憶の一例を示している。The access address of the first storage means 120 is determined by the output of the character string matching means 110, and the result of decoding the conceptual code 121 associated with each search string at the X decoder 123 and the result of decoding the search string at the X decoder 122. It is stored in the form of bit information of 1 and 0 corresponding to the presence or absence of the relationship. Therefore, the memory matrix represents an example of storing conceptual codes.

文字列照合手段１１０が文字列Ａ１．Ｂ１．Ｂ２．Ｃ１
，Ｃ２，Ｄｌ。The character string matching means 110 compares the character string A1. B1. B2. C1
, C2, Dl.

Ｂ２．Ｂ３並びにＥ工を記憶しているとし、各々の概念
コードをＡ、　Ｂ、　Ｂ、　Ｃ／Ａ、　Ｃ／Ｂ、　Ｄ、
　Ｄ、　Ｄ並びにＥ／Ｆとしている。故に、文字列Ａ１
が与えられると第１記憶手段１２０の左第１列目から′
１″が出力され、文字列Ｂ１かＢ２が与えられると、左
第２列目から１１１１１が出力され、文字列Ｄ１かＢ２
かＢ３が与えられると、左第４列目から“１”′が出力
される。この出力をエンコーダ１２４でエンコードした
ものは概念コードを表わす。B2. Assume that B3 and E are memorized, and the respective concept codes are A, B, B, C/A, C/B, D,
D, D and E/F. Therefore, string A1
' from the first left column of the first storage means 120.
1'' is output, and when the character string B1 or B2 is given, 11111 is output from the second left column, and the character string D1 or B2 is output.
When B3 is given, "1"' is output from the fourth column on the left. This output encoded by encoder 124 represents a conceptual code.

文字列Ｃ１が与えられると、左第１列目と第３列目から
１″が出力されるが、これは文字列Ｃ１が２つの概念コ
ードＡとＣにまたがっている場合に起こる。When the character string C1 is given, 1'' is output from the first and third left columns, but this occurs when the character string C1 straddles two concept codes A and C.

例えば文字列Ａ１を半導体、Ｃ１をＨＢＴとすると、概
念コードＡとしては、ＭＯＳ、バイポーラ等が考えられ
る。一方概念コードＣとしては化合物半導体、バイポー
ラ等が考えられる。つまりＣ１の概念コードはＡのバイ
ポーラと共通するわけである。３つ以上の概念にまたが
る事もこの第１記憶手段１２０では許される。そのため
に、第１記憶手段１２０は各文字列がどの概念に属する
かを示すエンコーダ１２４にプライオリティエンコード
機能を使う。２つ以上の概念にまたがる時にはプライオ
°リティエンコーダ１２４が優先度の高い方から順に概
念コード１２５を出力する。For example, if the character string A1 is a semiconductor and C1 is an HBT, the concept code A may be MOS, bipolar, or the like. On the other hand, as the concept code C, compound semiconductors, bipolar, etc. can be considered. In other words, the conceptual code of C1 is common to that of A's bipolar. This first storage means 120 also allows the use of three or more concepts. To this end, the first storage means 120 uses a priority encoding function in the encoder 124 that indicates to which concept each character string belongs. When two or more concepts are involved, the priority encoder 124 outputs concept codes 125 in order of priority.

第１記憶手段１２０に接続されたマークビットレジスタ
１３０は概念コードのデコーダ１３１と概念数に等しい
個数のレジスタ１３２から成る。レコードアドレスの変
更時点にリセット端子１３３から与えられるリセントパ
ルスで全レジスタ１３２の内容がゼロになる。その後で
、第１記憶手段１２０から概念コードが与えられると、
それをデコードする事で選択されるレジスタ１３２にマ
ークビット″１”がセットされる。これによって、検索
されているテキストレコードに含まれる全概念コードに
対応するマークビットの“１″がレジスタ１３２に次々
と書込まれる。The mark bit register 130 connected to the first storage means 120 comprises a concept code decoder 131 and a number of registers 132 equal to the number of concepts. When the record address is changed, the contents of all registers 132 become zero by a recent pulse applied from the reset terminal 133. After that, when the concept code is given from the first storage means 120,
By decoding it, a mark bit "1" is set in the selected register 132. As a result, "1" mark bits corresponding to all concept codes included in the text record being searched are successively written into the register 132.

書込みが終わった時のレジスタ１３２の内容が第２記憶
手段１４０のアクセス番地を決定する。The contents of the register 132 when writing is completed determines the access address of the second storage means 140.

第２記憶手段１４０は検索条件の論理式を積和標準形で
表現し、その積項をマークビットレジスタ１３０の出力
によってＸデコーダ１４１を介して選択される各番地に
対応させて、検索条件の論理式に含まれた積項に対応す
る番地に′１″を書込むことで、論理式の記憶を行なう
。いくつかの検索条件の論理式を記憶する第２記憶手段
１４０の中で論理式の選択をＸデコーダ１４２で行なう
。選択コード１４５で指定した論理式と第１記憶手段１
２０の出力する概念コードとの整合結果１４４はこの第
２記憶手段１４０がら出力される。The second storage means 140 expresses the logical expression of the search condition in the product-sum standard form, and associates the product term with each address selected via the X decoder 141 according to the output of the mark bit register 130. The logical formula is stored by writing '1'' in the address corresponding to the product term included in the logical formula. is selected by the X decoder 142.The logical formula specified by the selection code 145 and the first storage means 1
The matching result 144 with the concept code outputted by 20 is outputted from this second storage means 140.

第２記憶手段１４０のメモリマトリクス１６０には３つ
の論理式の格納例が示されている。マークビットレジス
タ１３０から、Ｘデコーダに３つの概念コード（Ａ、　
Ｂ、　Ｃ）が与えられるとし、残りの概念コードは未登
録と仮定し、１に固定されているとしている。したがっ
て、図中のメモリマトリクス１６０の左端１行目カラ１
ｌｌＮ　ニ各を地ニＡＢＣ，ＡＢｍ、ＡＢＣ，ＡＢＣ，
八ＢＣ。The memory matrix 160 of the second storage means 140 shows examples of storing three logical formulas. From mark bit register 130, three conceptual codes (A,
B, C) are given, and the remaining concept codes are assumed to be unregistered and fixed to 1. Therefore, the leftmost row 1 of the memory matrix 160 in the figure
llN ABC, ABm, ABC, ABC,
Eight B.C.

ＡＢＣ，ＡＥｒＣ，ＡＢＮ、　９．、カ割当テラｈ　ル
。一方、メモツマトリクスの１行目は論理式（ＡＢＣ）
を、２行目は論理式（ＡＢＣ十人ＢＧ）を、３行目は論
理式（λＵＣ）を記憶する場合の記憶状態を示している
。記憶値をｉ番地のｊビット目で凧、とすると、ｊ番目
のビット線には次の論理式が格納されることになる。ABC, AErC, ABN, 9. , allocating information. On the other hand, the first row of the memo matrix is a logical formula (ABC)
, the second line shows the storage state when storing the logical formula (ABC Junin BG), and the third line shows the storage state when storing the logical formula (λUC). Assuming that the stored value is "kite" at the j-th bit of the i-address, the following logical formula will be stored in the j-th bit line.

Ｗｌ、ＡＢＣ＋Ｗ２．ＡＢＣ＋Ｗ３．Ａ百Ｃ＋・・・Ｗ
、は１か０の論理信号であり、各テキストレコーダリに対する希望の出力をそのテキストレコードに対する第
２記憶手段１２０の出力によって選択される番地に書込
む事によって決まる。Wl, ABC+W2. ABC+W3. A100C+...W
, are logical 1 or 0 signals determined by writing the desired output for each text record to the address selected by the output of the second storage means 120 for that text record.

以上のような回路構成を実現すると、ここへの検索条件
論理式の登録と検索の制御は次のように行なわれる。ま
ず、第１に、論理式に含まれる検索文字列とその概念コ
ードを対にしてそれぞれ、文字列照合手段１１０と第１
記憶手段１２０に初期登録する。これを文字列登録プロ
セスと呼ぶ。勿論、初期登録の前には文字列照合手段１
１０と第１記憶手段１２０の記憶内容がクリヤされてい
るものとする。特に、第１記憶手段１２０の読み出し出
力がすべて“０″であるとする。クリヤの後、文字列照
合手段１１０に検索文字列を登録する時に合わせ、その
登録アドレスが第１記憶手段１２０のＸデコーダ１２２
に入力され、概念コードに対するＹデコーダ１２３の選
択するビット線に“１″を書込む事で概念コードの登録
を行なう。When the circuit configuration as described above is realized, the registration of the search condition logical expression therein and the control of the search are performed as follows. First, the search string included in the logical formula and its concept code are paired and the string matching means 110 and the first
Initial registration is made in the storage means 120. This is called the string registration process. Of course, before the initial registration, the string matching means 1
It is assumed that the storage contents of 10 and first storage means 120 have been cleared. In particular, it is assumed that all read outputs of the first storage means 120 are "0". After clearing, when registering the search character string in the character string matching means 110, the registered address is stored in the X decoder 122 of the first storage means 120.
The conceptual code is registered by writing "1" to the bit line selected by the Y decoder 123 for the conceptual code.

第２に、積和標準形に表わされた論理式の中に含まれる
概念コード間の論理積項に対し、マークビットレジスタ
１３０の出力で選択された番地に“１″を初期登録する
。ただし、初期登録の前には第２記憶手段１４０のメモ
リマトリクスの記憶内容は論理信号０にクリヤされてい
るものとする。すなわち、登録前にはＯｘ　Ａ−Ｂ−Ｃ＋　Ｏｘ　ｋ−Ｂ：Ｃ＋　Ｏｘ　Ａ−
Ｂ−Ｃ＋・・・・・であるが、論理式（ＡＢＣ＋　ＡＢ
Ｃ）を登録するには、ＩＸＡＢＣ＋０ＸＡＢＣ＋ＩＸＡ
−Ｂ−Ｃ・・・・・に標準化して、１番地と３番地に入
力端子１４３から与えられる論理信号１を書き込む。Second, "1" is initially registered in the address selected by the output of the mark bit register 130 for the logical product term between conceptual codes included in the logical formula expressed in the product-sum standard form. However, it is assumed that the stored contents of the memory matrix of the second storage means 140 are cleared to logic signal 0 before the initial registration. That is, before registration, Ox A-B-C+ Ox k-B:C+ Ox A-
B-C+..., but the logical formula (ABC+ AB
To register C), IXABC+0XABC+IXA
-B-C..., and write the logic signal 1 given from the input terminal 143 to addresses 1 and 3.

多数の相異なる論理式を初期登録するには、論理式の選
択コードを第２記憶手段１４０のＹデコーダに与え、メ
モリマトリクスの上の行から下の行へ入力端子１４３か
ら与えられる論理信号１を順次に書き込んで行けばよい
。To initially register a large number of different logical expressions, a selection code of the logical expression is given to the Y decoder of the second storage means 140, and the logic signal 1 applied from the input terminal 143 from the upper row to the lower row of the memory matrix is applied. You can write them in order.

検索条件の論理式の登録が終わった後、テキストデータ
ファイルメモリから文字列データを全文検索する。検索
制御は、まず検索条件の論理式を選択コード１４５によ
って指定し、次に文字列データ１５０をテキストレコー
ド単位（数１００Ｂから数ＫＢ）で順次にテキストデー
タファイルメモリから読み出して、文字列照合手段１１
０に入力する事からはじまる。文字列データ１５０が文
字列照合手段１１０に登録された検索文字列にマツチす
る都度、第１記憶手段１２０がアクセ′スされ、マツチ
した検索文字列の概念コードがマークビットレジスタ１
３０に送り出される。この制御を文字列検索プロセスと
呼ぶ。After the logical expression of the search condition has been registered, the text data file memory is searched for the full text of the character string data. Search control is performed by first specifying the logical expression of the search condition using the selection code 145, and then sequentially reading out the character string data 150 in units of text records (several 100 bytes to several kilobytes) from the text data file memory. 11
Start by entering 0. Each time the character string data 150 matches a search string registered in the string matching means 110, the first storage means 120 is accessed, and the concept code of the matched search string is stored in the mark bit register 1.
Sent out at 30. This control is called a string search process.

マークビットレジスタ１３０はテキストレコード単位で
マツチした概念コードをデコードした後のマークピット
として貯える。登録概念コードに対応したレジスタ１３
０の内容はテキストレコードが切替る時にＯにリセット
される。他のレジスタ１３０の内容は１のままとする。The mark bit register 130 stores the matched concept codes in units of text records as mark pits after being decoded. Register 13 corresponding to registered concept code
The contents of 0 are reset to O when the text record is switched. The contents of other registers 130 remain at 1.

リセットされる直前に、第２記憶手段１４０がアクセス
され、マークビットレジスタ１３０の出力で選択される
番地の記憶内容（０か１か）が各テキストレコーダの検
索条件論理式に対する照合結果として出力される。これ
らの制御を論理式検索プロセスとリセットプロセスと呼
ぶ。全文検索処理の後で、検索条件の変更や追加が起こ
る。その時には次のプロセスが使われる。Immediately before being reset, the second storage means 140 is accessed, and the stored content (0 or 1) of the address selected by the output of the mark bit register 130 is output as a result of matching against the search condition logical expression of each text recorder. Ru. These controls are called a logical expression search process and a reset process. After the full-text search process, changes or additions to search conditions occur. The following process is then used:

まず、新しい検索条件論理式に含まれる検索文字列が文
字列照合手段１１０に登録済みの検索文字列とマツチし
ない時に、それを新検索文字列とし、その概念コードと
対応づけて文字列照合手段と第Ｊ記憶手段に追加登録す
る。文字列照合手段１１０における新検索文字列の登録
アドレスはその前に行なわれた検索文字列の登録後に決
められ、追加登録時まで保持されるものとする。一方、
第１記憶手段１２０への追加登録は文字列照合手段１１
０の新検索文字列に対する出力で選択される番地と入力
端子１２１から与えられる概念コードで選択されるビッ
ト列との交点に論理信号“′１″を書き込むことで達成
される。この制御は文字列追加登録プロセスと呼ぶ。First, when a search string included in a new search condition logical expression does not match a search string registered in the string matching means 110, it is set as a new search string, and the string matching means 110 associates it with the concept code. and is additionally registered in the J-th storage means. The registration address of a new search string in the string matching means 110 is determined after the previous search string is registered, and is held until additional registration. on the other hand,
Additional registration to the first storage means 120 is performed by the character string matching means 11.
This is achieved by writing a logic signal "'1" at the intersection of the address selected by the output for the new search character string of 0 and the bit string selected by the concept code given from the input terminal 121. This control is called the character string addition registration process.

次に、新しい検索条件の論理式を第２記憶手段１４０に
追加登録する。そのためには、標準形で表わされた新検
索条件論理式を複数積項に分解し、各項に真で含まれる
検索文字列を順次に文字列照合手段１１０に与えて、照
合結果によって第１記憶手段１２０から出力される概念
コードを順次にマークビットレジスタ１３０に貯え、積
項毎の出力によって選択される番地の第２記憶手段１４
０に論理信号１を書き込んで行く。この時、Ｙデコーダ
１４２に与えられる選択コードは検索条件論理式が登録
済みのものと同一でない限り、新たに設定される必要が
ある。たとえば、論理式Ａ−Ｂ−Ｃ＋Ａ−Ｂ−Ｃを追加
登録する場合、まず始めに、概念コードＡ、　Ｂ、　Ｃ
に対応する文字列を文字列照合手段１１０に入力し、そ
れらに対する照合結果で第１記憶手段１２０からＡ、　
Ｂ、　Ｃの概念コードを出力し、それらをマークピント
レジスタ１３０に貯え終わると、それの出力で選択され
る番地の第２記憶手段１４０に論理信号１を書き込む。Next, the logical expression of the new search condition is additionally registered in the second storage means 140. To do this, the new search condition logical expression expressed in standard form is decomposed into multiple product terms, the search strings that are true and included in each term are sequentially fed to the string matching means 110, and the The concept codes output from the first storage means 120 are sequentially stored in the mark bit register 130, and the second storage means 14 stores the concept codes output from the second storage means 120 at an address selected by the output for each product term.
A logic signal 1 is written to 0. At this time, the selection code given to the Y decoder 142 needs to be newly set unless the search condition logical expression is the same as the registered one. For example, when registering the logical formula A-B-C+A-B-C, first, the concept codes A, B, C
The character strings corresponding to A,
When the concept codes B and C are output and stored in the mark focus register 130, a logic signal 1 is written in the second storage means 140 at the address selected by the output thereof.

次に、Ａ、Ｂに対応した文字列を文字列照合手段１１０
に入力し、第１記憶手段１２０からＡ、Ｈの概念コード
を出力し、それらをマークビットレジスタ１３０に貯え
終わると、それの出力で選択される番地の第２記憶手段
１４０に１を書き込む。２回の書き込み動作でＡ−Ｂ−
Ｃ＋Ａ−Ｂ−Ｃの登録を終える。この制御が論理式追加
登録プロセスと呼ばれる。Next, the character strings corresponding to A and B are checked by the character string matching means 110.
and outputs the concept codes A and H from the first storage means 120, and when they are stored in the mark bit register 130, 1 is written in the second storage means 140 at the address selected by the output thereof. A-B- with two write operations
Finish registering C+A-B-C. This control is called the logical expression addition registration process.

以上のように、制御方式は登録制御と検索制御に分かれ
、登録制御は文字列初期登録、論理式初期登録、文字列
追加登録と論理式追加登録の４つのプロセスから成り、
検索制御は文字列検索と論理式検索とりセントの３つの
プロセスから成る。As mentioned above, the control method is divided into registration control and search control, and registration control consists of four processes: initial string registration, initial logical expression registration, additional character string registration, and additional logical expression registration.
Search control consists of three processes: character string search and logical expression search.

（発明の効果）第１図に示した論理式による文字列検索装置の実施例の
説明でわかるように、本発明によれば、検索条件論理式
を積和標準形で表わし、それを積項順に登録することが
でき、且つ、検索文字列の表現にバラエティ−がある時
にそれを概念コードで代表して登録することができる。(Effects of the Invention) As can be seen from the explanation of the embodiment of the character string search device using logical expressions shown in FIG. They can be registered sequentially, and when there is variety in the expression of the search string, it can be registered as a representative concept code.

さらに、テキストファイルメモリに貯えられた文字列デ
ータの全文検索に際しては、文字列照合だけでなく、各
テキストレコードが概念コードの論理式にマツチしたか
否かの判定結果をテキストファイルメモリの全文文字列
データを高速で連続読み出しながら出力できる。Furthermore, when performing a full-text search on character string data stored in the text file memory, in addition to character string matching, the judgment result of whether each text record matches the logical expression of the concept code is used to search the entire text of the text file memory. Column data can be output while being read continuously at high speed.

本発明の効果は検索条件論理式を登録するに際して、論
理式を貯える第２記憶手段１４０の記憶容量を大巾に低
減できるという点と、テキストファイルメモリの文字列
データの論理式による全文検索を一括して高速に行なえ
る点にある。The effects of the present invention are that when registering search condition logical expressions, the storage capacity of the second storage means 140 for storing logical expressions can be greatly reduced, and that full-text searches using logical expressions of character string data in a text file memory can be performed. The advantage is that it can be done all at once at high speed.

検索条件論理式の登録に際して、検索文字列に多くの同
義語が存在すると、その都度、別の論理式として登録す
ると、第１の記憶手段１２０を削減できる代わりに、第
２記憶手段１４０の記憶容量が実大なものとなる。When registering a search condition logical expression, if there are many synonyms in the search string, registering each synonym as a separate logical expression can save the memory of the second storage means 140 instead of reducing the need for the first storage means 120. The capacity becomes really large.

例えば、１つの検索条件論理式に５個の検索文字列が含
まれるとし、各検索文字列が５個の同義語（シノニム）
を持つとすると、５５個（３１２５個）の組合せの検索
条件論理式が存在する。その時、２５個の検索文字列を
文字列照合手段１１０に登録すると共に、第２記憶手段
１４０のＸデコーダ１４１に２５ピントのアドレスコー
ドを与える必要があり、かつＹデコーダ１４２にはｌｏ
ｇ２３１２５以上、すなわち１２ビツトもの選択コード
を与える必要がある。メモリマトリクスのセル数は２２
５Ｘ３１２５個でなければならい。これは約１０００億
個に及び、実現不可能である。もし２５個の検索文字列
を５個の概念コードにまとめる第１記憶手段１２０を使
えば、第２記憶手段１４０はＸデコーダ１４１から５ビ
ツトのアドレスコードを受けて、１本のビット線に論理
式を貯えるので、使用メモリセル数は１２５個にすぎな
い。これは１０００億個に比べ、５億分の１以下である
。本発明を用いると論理式の個数が８０００個に増えて
も、ＩＭｂのＲＡＭチップ１個で対応できる。論理式が
６４個程度であるとすれば、ＩＭｂの１チプで概念コー
ドの数を１４個に増やせる。For example, suppose one search condition logical expression contains 5 search strings, and each search string has 5 synonyms.
, there are 55 (3125) combinations of search condition logical expressions. At that time, it is necessary to register 25 search character strings in the character string matching means 110, give a 25-pinto address code to the X decoder 141 of the second storage means 140, and give the Y decoder 142 a lo
It is necessary to provide a selection code of g23125 or more, that is, as many as 12 bits. The number of cells in the memory matrix is 22
Must be 5x3125 pieces. This amounts to about 100 billion, which is impossible. If the first storage means 120 is used to combine 25 search character strings into 5 concept codes, the second storage means 140 receives a 5-bit address code from the X decoder 141 and stores the logic on one bit line. Since formulas are stored, only 125 memory cells are used. This is less than 1/500 million compared to 100 billion. Using the present invention, even if the number of logical expressions increases to 8000, it can be handled with one IMb RAM chip. Assuming that there are about 64 logical expressions, the number of conceptual codes can be increased to 14 with one IMb chip.

１６個に増やすとしても４Ｍｂのチップ１個に収まる。Even if the number is increased to 16, it will still fit in one 4Mb chip.

この効果は天文学的数字になる。The effect is astronomical.

以上のように、本発明によれば、従来の文字列検索装置
での論理式による文字列データの検索にソフトウェアを
介在させる必要のある問題点が容易に解決される。As described above, according to the present invention, the problem of requiring software to intervene in the search of character string data using logical expressions in conventional character string search devices can be easily solved.

[Brief explanation of the drawing]

第１図は本発明の一実施例の構成図である。 FIG. 1 is a block diagram of an embodiment of the present invention.

Claims

[Claims]

(1) a string matching means for storing a plurality of search strings and determining which of the plurality of search strings an input string of a text record matches; an access address is determined by the output of the string matching means; a first storage means for storing a concept code associated with each search string; a mark bit register for temporarily storing which concept code is included in each text record; an access address is determined by the output of the mark bit register, and the search 1. A character string search device using a logical formula, comprising: second storage means for storing a logical formula of a condition.

(2) A control method for a character string search device using a logical formula according to claim 1, wherein a search character string included in the logical formula and its concept code are associated with each other and registered in a character string matching means and a first storage means. Character string initial registration process, character string matching means for search strings included in true (non-negated form) in each product term of the logical expression converted to product-sum standard form, output of first storage means, and mark bit register is used to determine the access address of the second storage means and write the true logical signals in order of product terms, and the search string included in the new logical formula is registered in the string matching means. A process for adding a search string and registering it as a new search string when it does not match a search string that has been previously searched, and adding it to the string matching means and first storage means in association with the concept code; The access address of the second storage means is determined by using the character string matching means for the search string included in the product term as true, the output of the first storage means, and the mark bit register, and the true search string is stored in another storage area in the order of the product terms. 1. A control method for a character string search device using logical expressions, characterized in that registration control is performed including a logical expression additional registration process of writing logical signals.

(3) A control method for a character string search device using a logical formula according to claim 1, wherein during a full text search after registering a search condition logical formula, the character string data is sequentially checked in text record units by a character string matching means. A character string matching process that accesses the first storage means and sets the output in the mark bit register when there is reception and character string matching, and a second storage means that uses the output of the mark bit register when switching text records A logical expression retrieval process that accesses the storage means and outputs the matching result for the logical expression; and after accessing the second storage means, only the mark bit register corresponding to the registered concept code is returned to the reset state, and the rest are returned to the marked state. 1. A control method for a character string search device using a logical expression, characterized in that a search control including a reset process in which the string remains unchanged is performed.