JPH0462664A

JPH0462664A - Data retrieving device

Info

Publication number: JPH0462664A
Application number: JP2173807A
Authority: JP
Inventors: Hiroichi Yoshida; 広市吉田
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1990-06-29
Filing date: 1990-06-29
Publication date: 1992-02-27

Abstract

PURPOSE:To retrieve a character string having the number of characters equal to the retrieved characters out of a sentence in a time shorter than the time required for retrieving a full sentence by dividing a sentence into paragraphs and retrieving only the character string part including the same number of characters as the retrieved characters out of the head of a paragraph with skipping the character string part for each paragraph. CONSTITUTION:A sentence stored in a sentence storage means 101 is divided into paragraphs by a syntax analyzing means 102, and the number of characters of each paragraph are stored in a paragraph length storage means 103 as the paragraph storage means 103. When an optional character string is inputted from an input means 104, a retrieving means 105 refers to the paragraph length stored in the means 103 and retrieves only the character string part having the same number of characters as the retrieved characters out of the head of a paragraph after skipping the character string parts for each paragraph. Therefore the character strings having the same number of characters as the retrieved characters are skipped for each paragraph and retrieved. Thus the corresponding character string can be retrieved in a time shorter than the time needed for retrieving all characters.

Description

【発明の詳細な説明】（イ）産業上の利用分野この発明は、例えば日本語ワードプロセッサや日本語文
書の作成及び記憶が可能なコンピュータなどの文書処理
装置に用いられ、記憶された文書中から所望の文字を検
索するデータ検索装置に関する。Detailed Description of the Invention (a) Industrial Application Field The present invention is used in document processing devices such as Japanese word processors and computers capable of creating and storing Japanese documents. The present invention relates to a data search device that searches for desired characters.

（ロ）従来の技術従来、この種のデータ検索装置においては、特許情報な
どの過去の文献をデータベースとしてメモリに記憶さけ
ておき、その文献を検索する場合には、検索効率を高め
るために、各々の文献に対して、その文献を代表する用
語、つまりその文章中に用いられている代表的な用語を
被検索用キーワードとしてあらかじめ登録しておき、検
索時には、指定したキーワードと各文献中の被検索用キ
ーワードとが一致するかどうかを調べて、一致した文献
をデータベースから取り出す、という検索方法が一般に
用いられている。(b) Conventional technology Conventionally, in this type of data retrieval device, past documents such as patent information are stored in memory as a database, and when searching for those documents, in order to improve search efficiency, For each document, terms that represent the document, that is, typical terms used in the text, are registered in advance as search keywords, and when searching, the specified keyword and the words in each document are registered in advance. A commonly used search method is to check whether a keyword to be searched matches, and to retrieve matching documents from a database.

（ハ）発明が解決しようとする課題しかしながら、このようなデータ検索装置においては、
上記の被検索用キーワードにどのような用語を選択する
かによってデータベースの良し悪しが決められてしまう
。また、年月の経過と共に被検索用キーワードがその文
献にマツチしなくなるという可能性らある。(c) Problems to be solved by the invention However, in such a data retrieval device,
The quality of a database is determined by what terms are selected as the keywords to be searched. Furthermore, there is a possibility that the keyword to be searched no longer matches the document as time passes.

こういった不具合を解決するｆこめには、彼倹索用キー
ワードにかかわらず、各文献の文章中の全ての文字に対
して指定したキーワードか当てはまるかどうかを誠へる
ようにすればよい。すなわち、第６図に示すように、文
献の文章中から、例えば、「プロログ」という単語を検
索するのであれば、文章中の文字を一文字ずつシフトさ
せて、「プロログ」という指定したキーワードと同じ綴
りの文字列が存在するかどうかを調べるようにすればよ
い。To solve these problems, it is best to check whether the specified keyword applies to all the characters in the text of each document, regardless of the search keyword. In other words, as shown in Figure 6, if you want to search for the word "prolog" in the text of a document, for example, shift the letters in the text one by one until it is the same as the specified keyword "prolog". All you have to do is check if the spelling string exists.

しかし、このように文献の文章全てを検索して、該当す
る単語かめるかどうかをチエツクする場合には、検索時
間が長くかかるという問題がある。However, when searching all the sentences in a document and checking whether the corresponding word can be found in this way, there is a problem that the search takes a long time.

この発明は、このような事情を考慮してなされたもので
、検索対象となる文献の文章を文節で区切り、文節単位
にスキップさせて該当する単語を検索するようにしたデ
ータ検索装置を提供するものである。The present invention has been made in consideration of the above circumstances, and provides a data retrieval device that divides the text of a document to be searched into clauses and searches for corresponding words by skipping each clause. It is something.

（ニ）課題を解決するための手段第１図：よこの発明の構成を示すブロック図である。(d) Means to solve the problem FIG. 1 is a block diagram showing the configuration of the present invention.

図Ｉこ示すように、この発明は１文章を記憶した文章記
憶手段ｌｏｔと、文章記憶手段＋０１に記憶ざわた文章
を文節単位に区切る構文解析手段＋０２と、構文解析手
段１０２によ？り区切られｆコ各文節の文字数を文節長
として記憶する文節長記憶手段１０３と、任意の文字列
を検索文字として入力するｆ二めの入力手段＋０４と、
人力手段１０４から入力された検索文字と同じ文字列を
文章記憶手段１０１に記憶され１こ文章中から検索する
に際し文節長記憶手段＋０３に記憶された文節長を参照
し文節の先頭から検索文字上同数の文字列部分のみを各
文節毎にスキップして検索する検索手段１０５を備えて
なるデータ検索装置である。As shown in FIG. a phrase length storage means 103 for storing the number of characters in each phrase as a phrase length; a second input means +04 for inputting an arbitrary character string as a search character;
The same character string as the search character inputted from the human power means 104 is stored in the sentence storage means 101, and when searching from within the sentence, the phrase length stored in the phrase length storage means +03 is referred to and the search character is searched from the beginning of the phrase. This data retrieval device includes a retrieval means 105 that skips and retrieves only the same number of character string portions for each clause.

なお、この発明において、文章記憶手段１０１及び文節
長記憶手段１０３としては、フロッピーディスク装置、
磁気ディスク装置等の外部記憶装置が主として用いられ
る。In this invention, the sentence storage means 101 and the phrase length storage means 103 include a floppy disk device,
External storage devices such as magnetic disk devices are mainly used.

構文解析手段１０２及び検索手段１０５としては、ＣＰ
Ｕ、ＲＯＭ、ＲＡＭ、Ｉ１０ボートからなるマイクロコ
ンピュータを用いるのが梗トリである。As the syntax analysis means 102 and the search means 105, CP
The most common method is to use a microcomputer consisting of U, ROM, RAM, and I10 board.

入力手段１０４としては、任きの文字列を検索文字とし
て入力できる乙のであればよく、キーボード装置、タブ
レット装置等や、データの一括入力か可能なフロッピー
ディスク装置や磁気ディスク装置等が用いられる。The input means 104 may be any device that can input any character string as a search character, such as a keyboard device, a tablet device, or a floppy disk device or magnetic disk device that can input data all at once.

（ホ）作用第１図に示すように、この発明によれば、文章記憶手段
１０１に記憶された文章は、構文解析手段１０２によっ
て文節単位に区切られ、区切られた各文節の文字数が、
文節長として文節長記憶手段１０３に記憶される。(E) Effect As shown in FIG. 1, according to the present invention, the sentence stored in the sentence storage means 101 is divided into clauses by the syntax analysis means 102, and the number of characters in each divided clause is
The phrase length is stored in the phrase length storage means 103.

そして、入力手段１０４から任意の文字列か検索文字と
して入力されると、検索手段１０５により、文節長記憶
手段１０３に記憶された文節長が参照され、文節の先頭
から検索文字と同数の文字列部分のみが各文節毎にスキ
ップされて検索が行われる。When an arbitrary character string or a search character is input from the input means 104, the search means 105 refers to the phrase length stored in the phrase length storage means 103, and searches for the same number of character strings as the search characters from the beginning of the phrase. The search is performed by skipping only parts for each clause.

従って、検索文字と同じ文字列は、文節単位でスキップ
されて検索されるので、全文字検索よりも短い検索時間
で該当文字列を検索することかできる。Therefore, since the same character string as the search character is searched by skipping in clause units, it is possible to search for the corresponding character string in a shorter search time than when searching all characters.

（へ）実施例以下、図面に示す実施例に基づいてこの発明を詳述する
。なお、これによってこの発明か１℃定されるものでは
ない。(f) Examples Hereinafter, the present invention will be described in detail based on examples shown in the drawings. Note that this does not mean that the temperature of this invention is determined by 1°C.

第２図はこの発明を日本語ワードプロセッサに適用した
一実施例の構成を示すブロック図である。FIG. 2 is a block diagram showing the configuration of an embodiment in which the present invention is applied to a Japanese word processor.

この図において、１ほかなキーやファノクノヨンキー等
を備えたキーボードでうり、各種の文章や、任意の文字
列からなる検索文字を制御部２に入力する。In this figure, various sentences and search characters consisting of arbitrary character strings are input into the control unit 2 using a keyboard equipped with keys such as 1 and other keys, fan keys, etc.

制御部、２は、ＣＰＵ、ＲＯＭ、ＲＡＭ、Ｉ１０ポート
からなるマイクロコンピュータによって構成され、ＲＯ
Ｍに書き込まれている制御プログラムに従い、後述する
各種のデータ処理を行う。The control unit 2 is composed of a microcomputer consisting of a CPU, ROM, RAM, and I10 port, and the RO
According to the control program written in M, various data processing described below is performed.

３は外部からデータを入力するためのフロッピーディス
ク装置、磁気ディスク装置等の磁気記憶媒体や、ボイン
ティノグデバイスなどの人力装置である。Reference numeral 3 denotes a magnetic storage medium such as a floppy disk device or a magnetic disk device for inputting data from the outside, or a human-powered device such as a Bointinog device.

４はＣＲＴデイスプレィ装置やＬＣ（液晶）デイスプレ
ィ装置、あるいはＥＬデイスプレィ装置等からなる表示
装置である。Reference numeral 4 denotes a display device such as a CRT display device, an LC (liquid crystal) display device, or an EL display device.

５（よキーボードＩ及び入力装置３から入力された各種
の文章を記憶するための粧気ディスク装置からなるデー
タベースである。5 (a database consisting of a keyboard I and a disk device for storing various texts input from the input device 3).

６は高級日本語ワードプロセッサのような文章校正支援
ンステムなどに用いられ、入力された漢字かな交じり文
を文節単位に区切る構文解析装置である。Reference numeral 6 is a syntactic analysis device that is used in grammar proofing support systems such as high-end Japanese word processors, and divides input sentences containing kanji and kana into phrases.

７は漢字かな交じり文が文節単位：こ区切られる際に構
文解析装置６によって参照されるＲＯＭからなる構文解
析用辞書である。Reference numeral 7 denotes a parsing dictionary consisting of a ROM that is referred to by the parsing device 6 when a sentence containing kanji and kana is divided into bunsetsu units.

８はデータベース５に記憶された文章中にキーボードｌ
から入力された検索文字があるかどうかを検索する検索
装置である。8 indicates the keyboard l in the text stored in the database 5.
This is a search device that searches to see if there is a search character input from .

９はＲＡＭからなる結果バッファであり、検索装置８に
よって検索された被検索文字の位置を示すポインタを１
己憶している。9 is a result buffer consisting of RAM, and a pointer 1 indicates the position of the searched character searched by the search device 8.
I remember it myself.

構文解析装置６及び検索装置８はマイクロプロセッサか
ら構成されてし）る。The parsing device 6 and the searching device 8 are each composed of a microprocessor.

このような本発明のデータ検索装置においては、まず、
文章データかフロッピーディスクなどの形で入力装置３
から人力され、この文章データに、キーホード１から第
３図（ａ）に示すようなキーワードが付加される。この
キーワードはそれを付加する人間の主観によって決めら
れる。次に、制御部２は、この文章データを構文解析装
置６に送る。In such a data search device of the present invention, first,
Input device 3 in the form of text data or floppy disk, etc.
, and keywords as shown in FIG. 3(a) are added to this text data from the keyboard 1. This keyword is determined by the subjectivity of the person adding it. Next, the control unit 2 sends this text data to the syntactic analysis device 6.

構文解析装Ｗ６は、送られてきた文章データを構文解析
用辞書７を用いて、第３図（ｃ）に“／°で示すように
文節単位に区切り、区切った文節の文字数の情報を、第
３図（ｂ）に示すような文節長データとして、キーワー
ドと文章データと共にデータベース５に登録する。すな
わち、検索文字は通常、名詞で指定され、その名詞は必
ず文節の先頭に位置することに着目して、文章をあらか
じめ文節単位に区切っておく。The syntactic analysis device W6 uses the syntactic analysis dictionary 7 to divide the received text data into clause units as shown by "/°" in FIG. It is registered in the database 5 together with keywords and sentence data as clause length data as shown in Figure 3(b).In other words, the search character is usually specified by a noun, and the noun is always located at the beginning of the clause. Pay attention to the text and divide it into clauses in advance.

そして、データベース５の検索時には、キーボードｌか
ら、キーワードによる検索か文章全体の検索かが選択さ
れる。キーワードによる検索の場合は一般的なデータベ
ース検索と同じである。文章全体の検索の場合は、検索
する文字列か入力されて、スタートキーが押されると、
制ｆｉｔ部２は、検索する文字列を検索装置８に送る。When searching the database 5, a keyword search or an entire sentence search is selected from the keyboard 1. A keyword search is the same as a general database search. When searching for the entire text, when the search string is entered and the start key is pressed,
The fit control unit 2 sends the character string to be searched to the search device 8.

検索装置８は、文章の先頭から１文字ずつ検索文字と一
致するか否かを調べる。一致しない場合は、第４図に示
すように、文節長データから計算して１度に次の文節の
先頭まで検索位置を進める。このようにして、−船釣な
データベース検索よりも高速な検索を行う。The search device 8 checks whether each character from the beginning of the sentence matches the search character. If they do not match, as shown in FIG. 4, the search position is calculated from the clause length data and advances to the beginning of the next clause at a time. In this way, a search is performed that is faster than a casual database search.

次に、このような制御部２の処理動作の内容を第５図に
示すフローチャートに従い説明する。Next, the contents of the processing operation of the control section 2 will be explained according to the flowchart shown in FIG.

データ検索時には、制御部２は、第３図（ｂ）で示す文
節長データのポインタと、第３図（ｃ）で示す文章デー
タのポインタを、それぞれ先頭へ移動しくステップ２１
　２２）、さらに、キーボード１から入力された検索文
字列のポインタを先頭に移動する（ステップ２３）。At the time of data retrieval, the control unit 2 moves the phrase length data pointer shown in FIG. 3(b) and the sentence data pointer shown in FIG. 3(c) to the beginning in step 21.
22), and further moves the pointer of the search string input from the keyboard 1 to the beginning (step 23).

そして、文章データの終了であるのか否かを凋へ（ステ
ップ２４）、終了でない場合には以下の処理を行う。Then, it is determined whether or not the text data has ended (step 24), and if it has not ended, the following processing is performed.

まず、文章データの先頭の文字と検索文字列の先頭の文
字とを比較しくステップ２５）、一致した場合には、文
章データのポインタを１つ進めると共に（ステップ２６
）、検索文字列のポインタを１つ進めて（ステップ２７
）、文節長データの文字数を１減しくステップ２８）、
ステップ２４に戻る。First, compare the first character of the text data with the first character of the search string (step 25), and if they match, advance the text data pointer by one (step 26).
), advance the search string pointer by one (step 27
), reduce the number of characters in the bunsetsu length data by 1 (step 28),
Return to step 24.

そして、ステップ２５において、文章データの文字と検
索文字列の文字とか一致しなかった場合には、文章デー
タのポインタを、残りの文節長データふんスキップさ＋
ｉ−（ステップ２９）、文節長データポインタを次の文
字数の所に移動する（ステップ３０）。Then, in step 25, if the characters of the sentence data and the characters of the search string do not match, the pointer of the sentence data is moved to the remaining bunsetsu length data.
i- (step 29), move the clause length data pointer to the next number of characters (step 30).

また、ステップ２５において、文章データの文字列と検
索文字列とか全て一致した場合には、−致した被検索文
字列の位置を結果バッファ９にセットして（ステップ３
１）、ステップ３０に進む。In addition, in step 25, if the character string of the text data and the search character string all match, the position of the matched character string to be searched is set in the result buffer 9 (step 3
1), proceed to step 30.

このようにして、データベース登録時に、キーワード以
外に文章の文節長データを登録し、検索時にはその文節
長データを参照して、文節毎にスキップして検索するこ
とにより、文章データを単純に全て検索するよりも高速
に検索することかできろ。In this way, when registering the database, in addition to keywords, you can register the phrase length data of sentences, and when searching, you can refer to the phrase length data and search by skipping each phrase, allowing you to simply search all of the sentence data. Can you search faster than you can?

（ト）発明の効果この発明によれば、検索文字と同じ文字列を文章中から
検索するときには、文章を文節単位に区切って、文節の
先頭から検索文字と同数の文字列部分のみを文節毎にス
キップして検索するようにしたので、文章を全て検索す
るよりも短い時間で検索文字と同じ文字列を文章中から
検索することができる。(g) Effects of the Invention According to this invention, when searching for the same character string as a search character in a text, the text is divided into clauses, and only the part of the character string with the same number of search characters from the beginning of the clause is searched for each clause. Since the search is now performed by skipping to the beginning, you can search for the same string of characters in a text in a shorter time than searching the entire text.

[Brief explanation of drawings]

第１図はこの発明の構成を示すブロック図、第２−はこ
の発明の一実施例の構成を示すブロック図、第３図の（
ａ）〜（Ｃ）はキーワード、文節長データ及び文章デー
タのそれぞれの記憶内容の一例を示す説明図、第４図は
この発明の検索方法を示す説明図、第５図は実施例の動
作を示すフローチャート、第６図は従来の全文字検索の
方法を示す説明図である。ｌ　・・キーホード、２・　制御部、３　　・入力装置、４　　表示装置、５　・データヘース、６・・・構文解析装置、７・・・
・構文解析用辞書、８・・・　検索装置、９・・・・・
結果バッファ。１１図（ａ）嬉３図（ｂ）（Ｃ）１２図第４図第６しｆ工【■コFigure 1 is a block diagram showing the configuration of this invention, Figure 2- is a block diagram showing the configuration of an embodiment of this invention, and Figure 3 (
a) to (C) are explanatory diagrams showing examples of the storage contents of keywords, phrase length data, and text data, FIG. 4 is an explanatory diagram showing the search method of the present invention, and FIG. 5 is an explanatory diagram showing the operation of the embodiment. The flowchart shown in FIG. 6 is an explanatory diagram showing a conventional all-character search method. 1.keyboard, 2.control unit, 3.input device, 4.display device, 5.database, 6.syntax analysis device, 7.
・Dictionary for syntactic analysis, 8... Search device, 9...
result buffer. Figure 11 (a) Figure 3 (b) (C) Figure 12 Figure 4 Figure 6

Claims

[Claims]

1. A sentence storage means that stores sentences, a syntactic analysis means that divides the sentences stored in the sentence storage means into clauses, and a clause length storage means that stores the number of characters of each clause divided by the syntax analysis means as a clause length. an input means for inputting an arbitrary character string as a search character; and a character string that is the same as the search character inputted from the input means and stored in the phrase length storage means when searching from a sentence stored in the sentence storage means. A data retrieval device comprising a retrieval means for skipping and retrieving only a character string portion of the same number as the search characters from the beginning of a bunsetsu for each bunsetsu by referring to the bunsetsu length.