JPH06301725A - Retrieval device for character-string of hierarchized document - Google Patents

Retrieval device for character-string of hierarchized document

Info

Publication number
JPH06301725A
JPH06301725A JP5110900A JP11090093A JPH06301725A JP H06301725 A JPH06301725 A JP H06301725A JP 5110900 A JP5110900 A JP 5110900A JP 11090093 A JP11090093 A JP 11090093A JP H06301725 A JPH06301725 A JP H06301725A
Authority
JP
Japan
Prior art keywords
document
degree
coincidence
character string
stage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP5110900A
Other languages
Japanese (ja)
Other versions
JP3315755B2 (en
Inventor
Motoyoshi Sawatani
元喜 澤谷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Steel Corp
Original Assignee
Nippon Steel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Steel Corp filed Critical Nippon Steel Corp
Priority to JP11090093A priority Critical patent/JP3315755B2/en
Publication of JPH06301725A publication Critical patent/JPH06301725A/en
Application granted granted Critical
Publication of JP3315755B2 publication Critical patent/JP3315755B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

PURPOSE:To constitute the device so that not only each document retrieval result bus also a retrieval result of each stage can be obtained by a single retrieval by displaying stepwise or simultaneously the degree of coincidence of every document and the degree of coincidence of every character-string set of each stage. CONSTITUTION:A client machine 4 executes an access to a server 1, and by a retrieving part 11 of the server 1, the retrieval is executed to all documents. In this case, by generating self-correlation information and only collating it with a map, the retrieval can be executed at a high speed. A result of retrieval executed by the retrieving part 11 is sent as it is to a sum-up processing part 12. By this sum-up processing part 12, from the result of retrieval from the retrieving part 11, at every document, the degree of coincidence to a character-string set of each stage is summed up and the processing for setting the highest degree of coincidence to the degree of coincidence of its stage, and also, setting the degree of coincidence of the uppermost stage of each document to the degree of coincidence of its document is executed. Its result is sent to the client machine 4, and the document having the degree of coincidence exceeding a set threshold are displayed in the lump on its display.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は階層化文書の文字列検索
装置に関し、特に各段の文字列集合が、その下段側の1
つ若しくは2つ以上の文字列集合から構成された1段若
しくは2段以上の階層化文書が1文書若しくは2文書以
上記憶された記憶装置に於て、各文書及び各文書中の各
段の各文字列集合に対して、対象文字列の不完全一致を
も含むあいまい検索を行うための階層化文書の文字列検
索装置に関するものである。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character string search device for a hierarchical document, and in particular, a character string set in each stage is located on the lower side of the sequence.
In a storage device in which one or two or more layered documents each composed of one or two or more character string sets are stored in one document or two or more documents, each document and each column in each document are stored. The present invention relates to a character string search device for a hierarchical document for performing a fuzzy search including an incomplete match of a target character string with respect to a character string set.

【0002】[0002]

【従来の技術】従来、例えば多数の文書が記憶された記
憶装置に於ける各文書内に特定の文字列が含まれている
か否かを調べる場合、通常はその文字列全てが含まれて
いるか否かのみを調べる完全一致の検索が行われていた
が、特にカタカナ表記された長い外来語等の検索を行う
場合、表記の微妙な違いにより検索できないことがあっ
た。また、検索対象の文書が多いなど、検索対象となる
文書の全体量が大きいと検索が著しく遅くなると云う問
題があった。
2. Description of the Related Art Conventionally, when checking whether or not a particular character string is included in each document in a storage device in which a large number of documents are stored, it is usual that all the character strings are included. Although an exact match search was performed to check only whether or not there was a case where the search could not be performed due to a subtle difference in the description, especially when searching for a long foreign word written in katakana. Further, there is a problem that the search becomes significantly slow when the total amount of documents to be searched is large, such as the number of documents to be searched.

【0003】そこで、本願出願人と同一出願人による特
開平4−326164号公報には、文書の記憶時に、同
時に各文字(コード)の自己相関情報を文書毎に記憶し
ておき、検索時に検索文字列の各文字の自己相関情報を
求めて、その有無を検出する構造とすることで、各検索
対象文書内に於ける検索文字列の有無のみならずその一
致度をも容易に、かつ高速に調べることが可能な検索シ
ステムが開示されている。
Therefore, in Japanese Patent Application Laid-Open No. 4-326164 filed by the same applicant as the present applicant, the autocorrelation information of each character (code) is stored for each document at the same time when the document is stored, and the document is searched at the time of retrieval. By obtaining the autocorrelation information of each character in the character string and detecting the presence or absence of it, not only the presence or absence of the search character string in each search target document but also the degree of matching can be easily and quickly A search system that can be searched is disclosed.

【0004】上記システムにより各文書に対する特定文
字列の検索が高速化されるが、例えば1文書が非常に大
きく、「タイトル」、「前書き」、「本文1」、「本文
2」、「後書き」などの項目に分かれている階層化文書
の場合、そのいずれの項目に所望の文字列があるのかを
知ることができれば後の処理が容易になる場合がある。
また一致度の高いものがない場合、どの項目にどの程度
特定文字列と一致する文字列があるのかが検索終了を判
断する際に重要になる場合がある。
The above system speeds up the search for a specific character string in each document. For example, one document is very large, and "title", "preface", "text 1", "text 2", "postscript". In the case of a hierarchical document divided into items such as, if it is possible to know which item has a desired character string, subsequent processing may be facilitated.
In addition, when there is no item with a high degree of matching, it may be important in determining the end of the search as to which item has a character string that matches the specific character string.

【0005】[0005]

【発明が解決しようとする課題】本発明は上記したよう
な従来技術の問題点に鑑みなされたものであり、その主
な目的は、単に文書中に特定の文字列があるか否かを判
断するのみでなく、検索対象となる各文書のどの項目
に、特定の文字列とどの程度一致する文字列があるのか
を容易に、かつ高速に検索することが可能な階層化文書
の文字列検索装置を提供することにある。
SUMMARY OF THE INVENTION The present invention has been made in view of the problems of the prior art as described above, and its main purpose is simply to determine whether or not there is a specific character string in a document. In addition to the above, you can easily and quickly search for which item in each document to be searched has a character string that matches a specific character string. To provide a device.

【0006】[0006]

【課題を解決するための手段】上述した目的は本発明に
よれば、1段若しくは2段以上に階層化された文字列集
合からなり、かつ前記各段の文字列集合が、その下段側
の1つ若しくは2つ以上の文字列集合から構成された階
層化文書が1文書若しくは2文書以上記憶された記憶装
置に於ける前記各文書及び前記各文書中の前記各文字列
集合に対して、特定文字列の不完全一致をも含むあいま
い検索を行うための階層化文書の文字列検索装置であっ
て、前記各文書の全ての文字を対象として前記特定文字
列を検索し、その一致度を判断する検索部と、前記各文
書毎に、最下段側から各段の文字列集合に対する前記一
致度を集計して最も高い一致度をその段の一致度とし、
更に前記各文書の最上段の一致度をその文書の一致度と
する集計処理部とを有し、前記各文書毎の一致度及びそ
の文書の各段の文字列集合毎の一致度を段階的に、また
は同時に表示することを特徴とする階層化文書の文字列
検索装置を提供することにより達成される。
According to the present invention, the above-mentioned object is composed of a character string set which is hierarchized in one stage or two or more stages, and the character string set of each stage is on the lower side thereof. For each document and each character string set in each document in a storage device in which one or two or more layered documents composed of one or more character string sets are stored, A character string search device for a hierarchical document for performing a fuzzy search that also includes an incomplete match of a specific character string, wherein the specific character string is searched for all the characters of each document, and the degree of coincidence is searched. For each document, the search unit to determine, the highest degree of coincidence is calculated as the highest degree of coincidence by aggregating the degree of coincidence with respect to the character string set of each stage from the bottom side,
The document further includes a totalization processing unit that sets the degree of coincidence at the top of each document as the degree of coincidence of the document, and gradually calculates the degree of coincidence for each document and the degree of coincidence for each character string set at each stage of the document. The present invention is achieved by providing a character string search device for a hierarchical document, which is displayed simultaneously or simultaneously.

【0007】[0007]

【作用】このように、例えば項目などにより分けられた
複数の階層化文書の検索対象となる全ての文字に対して
あいまい検索をし、その結果を最下段の文字列集合から
集計し、各段の検索結果、更に各文書の検索結果を求め
ることで、1度の検索で各文書検索結果と共にその各段
の検索結果をも得られる。
In this way, for example, fuzzy search is performed for all the characters to be searched in a plurality of hierarchical documents divided by items, and the results are totaled from the character string set at the bottom, By further obtaining the search result of, and the search result of each document, the search result of each stage can be obtained together with the search result of each document by one search.

【0008】[0008]

【実施例】以下、本発明の好適実施例を添付の図面につ
いて詳しく説明する。
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT A preferred embodiment of the present invention will now be described in detail with reference to the accompanying drawings.

【0009】図1は、本発明が適用されたサーバ・クラ
イアント型のワークステーションのシステム構成を示す
ブロック図である。このシステムは、大容量記憶装置2
を有するサーバ1と、このサーバ1に公知のネットワー
ク3を介して接続された複数のクライアント機4とを有
している。
FIG. 1 is a block diagram showing the system configuration of a server / client type workstation to which the present invention is applied. This system is a mass storage device 2
And a plurality of client machines 4 connected to the server 1 via a known network 3.

【0010】記憶装置2内には多数の文書が記憶されて
いる。ここで、図2に示すように、各文書は1段若しく
は2段以上に階層化された文字列集合からなり、かつそ
の各段の文字列集合が、その下段側の1つ若しくは2つ
以上の文字列集合から構成された階層化文書からなる。
本実施例では「出願書類」なる文書が多数記憶されてい
るものとする。この「出願書類」は「願書」、「明細
書」及び「要約書」から構成され、更に「願書」は、
「書類名」、「整理番号」、「発明者」、「特許出願
人」、「代理人」などから構成され、例えば「特許出願
人」は、「識別番号」、「郵便番号」、「住所又は居
所」、「氏名又は名称」、「代表者」などから構成され
ている。また、「明細書」は、「書類名」、「発明の名
称」、「特許請求の範囲」、「発明の詳細な説明」、
「図面の簡単な説明」などから構成され、例えば「発明
の詳細な説明」は、「産業上の利用分野」、「従来の技
術」、「発明が解決しようとする課題」、「課題を解決
するための手段」、「作用」、「実施例」、「発明の効
果」から構成されている。また、記憶装置2内に記憶さ
れた文書には、記憶時に同時に各文字(コード)の自己
相関情報がマップとして作成され、一種のインデックス
としてその文書と共に記憶され、サーバ1に管理されて
いる。
A large number of documents are stored in the storage device 2. Here, as shown in FIG. 2, each document is composed of a set of character strings hierarchically arranged in one stage or two or more stages, and the character string set in each stage is one or more in the lower stage. It consists of a layered document composed of a set of character strings.
In this embodiment, it is assumed that a large number of documents “application documents” are stored. This "application document" consists of "application", "specification" and "abstract", and "application" is
It consists of "document name", "reference number", "inventor", "patent applicant", "agent", etc. For example, "patent applicant" means "identification number", "zip code", "address". Or "place of residence", "name or name", "representative", etc. Further, the "specification" means "document name", "title of invention", "claims", "detailed description of invention",
"Detailed description of the invention" includes, for example, "industrial application field", "prior art", "problem to be solved by the invention", and "solve the problem". "Means for carrying out", "action", "example", "effect of the invention". Further, in the document stored in the storage device 2, autocorrelation information of each character (code) is simultaneously created as a map at the time of storage and stored as a kind of index together with the document and managed by the server 1.

【0011】図3に示すように、サーバ1には上記した
自己相関情報から特定文字列を検索し、その一致度を判
断するための検索部11と、該検索部11からの検索結
果から、各文書毎に、その最下段側から各段の文字列集
合に対する一致度を集計して最も高い一致度をその段の
一致度とし、かつ各文書の最上段の一致度をその文書の
一致度とする集計処理部12とが設けられている。
As shown in FIG. 3, the server 1 searches for a specific character string from the above-mentioned autocorrelation information, and a search unit 11 for determining the degree of matching, and a search result from the search unit 11, For each document, the matching scores for the character string sets in each column are added up from the bottom, and the highest matching score is set as the matching score for that document, and the matching score at the top of each document is the matching score for that document. And a totalization processing unit 12 that is provided.

【0012】以下に、本実施例の作動要領の概略につい
て説明する。或るクライアント機4から特定の文字列、
例えば文字列「フィードフォワード」の検索を記憶装置
2に記憶された全文書に対して行う場合、クライアント
機4から文字列を「検索キー」として入力すると共に後
記する一致度の閾値を例えば70%以上と設定する。そ
して、このクライアント機4がサーバ1にアクセスし、
サーバ1の検索部11にて全文書に対して検索が行われ
る。このとき、上記したように予め各文書の自己相関情
報がマップとして作成され記憶されていることから、文
字列「フィードフォワード」についても自己相関情報を
作成して上記マップに照合するのみで高速な検索を行う
ことができるようになっている。この検索の速度は全文
書の容量には殆ど依存せず、検索する文字列の長さに依
存するものである。
The outline of the operating procedure of this embodiment will be described below. A specific character string from a client machine 4,
For example, when performing a search for the character string “feedforward” on all the documents stored in the storage device 2, the character string is input as a “search key” from the client machine 4 and the threshold of the degree of coincidence described later is, for example, 70%. Set as above. Then, this client machine 4 accesses the server 1,
The search unit 11 of the server 1 searches all documents. At this time, since the autocorrelation information of each document is created and stored in advance as a map as described above, only the autocorrelation information of the character string "feedforward" is also created and collated with the above map for high speed. You can search. The speed of this search hardly depends on the capacity of the entire document, but depends on the length of the character string to be searched.

【0013】検索部11にて行われた検索結果はそのま
ま集計処理部12に送られる。この集計処理部12にて
検索部11からの検索結果から、各文書毎に、その最下
段側から各段の文字列集合に対する一致度を集計して最
も高い一致度をその段の一致度とし、かつ各文書の最上
段の一致度をその文書の一致度とする処理が行われる。
そして、その結果が図4に示すように、クライアント機
4に送られ、そのディスプレイに、まず上記設定閾値以
上の一致度の文書を一括表示する。そして、操作者が例
えば図4に於ける「ソート」キーをマウスなどのポイン
ティングデバイスによりクリックすることにより一致度
の高い順に並べ換えて表示する。そして、操作者は表示
された文書のうちの一つ、例えば「浮上支持装置」を選
択する。すると、図5(a)に示すように、「願書」、
「明細書」及び「要約書」の各々についての一致度がサ
ーバ1からクライアント機4に送られ、それが表示され
る。次に、例えば「明細書」を選択すると図5(b)に
示すように、「書類名」、「発明の名称」、「特許請求
の範囲」、「発明の詳細な説明」及び「図面の簡単な説
明」の各々についての一致度がサーバ1からクライアン
ト機4に送られ、それが表示される。更に、例えば「発
明の詳細な説明」を選択すると図5(c)に示すよう
に、「産業上の利用分野」、「従来の技術」、「発明が
解決しようとする課題」、「課題を解決するための手
段」、「作用」、「実施例」及び「発明の効果」の各々
についての一致度がサーバ1からクライアント機4に送
られ、それが表示される。このようにして、操作者は検
索したい文字列「フィードフォワード」の含まれる部分
を徐々に絞り込むことができ、例えば「課題を解決する
ための手段」及び「作用」の部分には文字列「フィード
フォワード」があるが、「実施例」の部分には文字列
「フィードフォワド」があり、「発明の効果」の部分に
は文字列「フィードホワード」があるなど、同じ文書内
で表現が一致しておらず、これを修正したい場合などに
有効である。
The search result obtained by the search unit 11 is sent to the totalization processing unit 12 as it is. From the search result from the search unit 11 in the totalization processing unit 12, the degree of coincidence with respect to each character string set for each document is aggregated for each document, and the highest degree of coincidence is taken as the degree of coincidence for that stage. In addition, processing is performed in which the top-level matching degree of each document is set as the matching degree of the document.
Then, as shown in FIG. 4, the result is sent to the client machine 4, and the documents having the degree of coincidence equal to or higher than the above-mentioned set threshold value are collectively displayed on the display. Then, the operator clicks the "sort" key in FIG. 4 with a pointing device such as a mouse to rearrange and display the images in descending order of the degree of coincidence. Then, the operator selects one of the displayed documents, for example, "levitation support device". Then, as shown in FIG. 5A, the “application”,
The degree of coincidence for each of the “specification” and the “summary” is sent from the server 1 to the client machine 4 and displayed. Next, for example, when “specification” is selected, as shown in FIG. 5B, “document name”, “invention title”, “claims”, “detailed description of invention” and “drawing” are shown. The degree of coincidence for each of the "brief explanations" is sent from the server 1 to the client machine 4 and displayed. Further, for example, when “Detailed description of the invention” is selected, as shown in FIG. 5C, “industrial application field”, “conventional technology”, “problem to be solved by the invention”, “problem to be solved” The degree of coincidence for each of "means for solving", "action", "embodiment" and "effect of the invention" is sent from the server 1 to the client machine 4 and displayed. In this way, the operator can gradually narrow down the part that includes the character string "feed forward" that he / she wants to search. For example, in the "means for solving problems" and "action" parts, the character string "feed "Forward" is included, but the character string "Feedforward" is included in the "Example" portion, and the character string "Feed Howard" is included in the "Effect of invention". It is effective when you do not do it and want to correct this.

【0014】本実施例では各文書毎の一致度及びその文
書の各段の文字列集合毎の一致度を段階的に表示した
が、表示可能であれば、これを同時に表示しても良いこ
とは云うまでもない。
In the present embodiment, the degree of coincidence for each document and the degree of coincidence for each character string set at each stage of the document are displayed stepwise, but if they can be displayed, they may be displayed simultaneously. Needless to say.

【0015】一方、図4に示すような画面上で検索する
特定文字列(検索キー)を複数個入力し、AND、O
R、ANDNOTの条件で複合検索することも容易にで
きる。例えば文字列「微分」と、文字列「フィードフォ
ワード」と、文字列「制御」とをAND条件で検索した
ときに「従来の技術」の部分には文字列「微分」のみが
あり、「作用」の部分には文字列「フィードフォワー
ド」のみがあり、「発明が解決しようとする課題」の部
分には文字列「制御」のみがある場合、文字列「微分」
と、文字列「フィードフォワード」と、文字列「制御」
とを各々別々に検索し、その一致度同士をたし合わせて
検索文字列の数(この場合は3)で割った結果(この場
合は33%)をAND条件での検索結果とする。また、
その上段の集合「発明の詳細な説明」では、最下段の文
字列集合の検索文字列の一致度同士をたし合わせて検索
文字列の数(この場合は3)で割った結果(この場合は
100%)をAND条件での検索結果とする。即ち、A
ND、OR、ANDNOTの条件で複合検索する場合も
最下段の文字列集合に於ける検索文字列の一致度のみ求
めれば良く、インデックスとしての自己相関情報のマッ
プも1つあれば良いこととなる。ここで、AND、O
R、ANDNOTの条件での複合検索結果の出し方は上
記に限定されず、用途に応じて様々な方法があることは
云うまでもなく、例えば一度検索した結果に更にAN
D、OR、ANDNOTの条件で検索を行う場合と、一
度に全ての条件を入力してAND、OR、ANDNOT
の条件で検索を行う場合とでその一致度を同じにしても
変えても良い。
On the other hand, by inputting a plurality of specific character strings (search keys) to be searched on the screen as shown in FIG. 4, AND, O
It is also possible to easily perform a composite search under the conditions of R and ANDNOT. For example, when searching for the character string "differential", the character string "feedforward", and the character string "control" under the AND condition, there is only the character string "differential" in the "conventional technique" part, and When there is only the character string "feedforward" in the part of "" and only the character string "control" in the part of "issue to be solved by the invention", the character string "differential"
And the string "feedforward" and the string "control"
And are searched separately, and the matching degrees are added together and divided by the number of search character strings (3 in this case) (33% in this case) to be the search result under the AND condition. Also,
In the set "Detailed description of the invention" in the upper row, the result of dividing the matching degrees of the search character strings in the character string set in the bottom row by the number of search character strings (in this case, 3) (in this case, Is 100%) as the search result under the AND condition. That is, A
In the case of a composite search under the conditions of ND, OR, and AND, only the degree of coincidence of the search character string in the character string set at the bottom is required, and only one map of autocorrelation information as an index is required. . Where AND, O
It is needless to say that the method of outputting the composite search result under the conditions of R and ANDNOT is not limited to the above, and there are various methods depending on the application.
When searching with D, OR, ANDNOT conditions, and when all conditions are entered at once, AND, OR, ANDNOT
The degree of coincidence may be the same or different when the search is performed under the condition of.

【0016】[0016]

【発明の効果】以上の説明により明らかなように、本発
明による階層化文書の文字列検索装置によれば、階層化
文書の全ての文字を対象として特定文字列を検索し、そ
の一致度を判断各文書毎に、最下段側から各段の文字列
集合に対する一致度を集計して最も高い一致度をその段
の一致度とし、更に各文書の最上段の一致度をその文書
の一致度とし、各文書毎の一致度及びその文書の各段の
文字列集合毎の一致度を段階的に、または同時に表示す
ることにより、1度の検索で各文書検索結果と共にその
各段の検索結果をも得られ、検索対象となる各文書のど
の項目に、特定の文字列とどの程度一致する文字列があ
るのかを容易に、かつ高速に検索することが可能とな
る。
As is apparent from the above description, according to the character string search device for a hierarchical document according to the present invention, a specific character string is searched for for all the characters of a hierarchical document, and the matching degree is searched. Judgment For each document, the degree of coincidence for the character string set of each row is aggregated from the bottom, the highest degree of coincidence is taken as the degree of coincidence, and the degree of coincidence at the top of each document is the degree of coincidence of that document. Then, by displaying the degree of coincidence of each document and the degree of coincidence of each character string set of each stage of the document stepwise or simultaneously, the retrieval result of each stage together with the retrieval result of each document can be obtained by one retrieval. Thus, it is possible to easily and quickly search which item in each document to be searched has a character string that matches a specific character string.

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明が適用されたサーバ・クライアント型の
ワークステーションのシステム構成を示すブロック図で
ある。
FIG. 1 is a block diagram showing a system configuration of a server / client type workstation to which the present invention is applied.

【図2】記憶装置に記憶された階層化文書の構造を示す
説明図である。
FIG. 2 is an explanatory diagram showing a structure of a hierarchical document stored in a storage device.

【図3】本発明が適用されたサーバ・クライアント型の
ワークステーションに於けるサーバ及びクライアント機
の機能構成の一部を示すブロック図である。
FIG. 3 is a block diagram showing a part of a functional configuration of a server and a client machine in a server / client type workstation to which the present invention is applied.

【図4】クライアント機のディスプレイ画面の表示状態
を示す説明図である。
FIG. 4 is an explanatory diagram showing a display state of a display screen of a client machine.

【図5】(a)〜(c)は図4の要部のみを示す説明図
である。
5 (a) to (c) are explanatory views showing only the main part of FIG.

【符号の説明】[Explanation of symbols]

1 サーバ 2 記憶装置 3 ネットワーク 4 クライアント機 11 検索部 12 集計処理部 1 Server 2 Storage Device 3 Network 4 Client Device 11 Search Unit 12 Aggregation Processing Unit

Claims (1)

【特許請求の範囲】[Claims] 【請求項1】 1段若しくは2段以上に階層化された
文字列集合からなり、かつ前記各段の文字列集合が、そ
の下段側の1つ若しくは2つ以上の文字列集合から構成
された階層化文書が1文書若しくは2文書以上記憶され
た記憶装置に於ける前記各文書及び前記各文書中の前記
各文字列集合に対して、特定文字列の不完全一致をも含
むあいまい検索を行うための階層化文書の文字列検索装
置であって、 前記各文書の全ての文字を対象として前記特定文字列を
検索し、その一致度を判断する検索部と、 前記各文書毎に、最下段側から各段の文字列集合に対す
る前記一致度を集計して最も高い一致度をその段の一致
度とし、更に前記各文書の最上段の一致度をその文書の
一致度とする集計処理部とを有し、 前記各文書毎の一致度及びその文書の各段の文字列集合
毎の一致度を段階的に、または同時に表示することを特
徴とする階層化文書の文字列検索装置。
1. A character string set hierarchically arranged in one stage or two or more stages, and the character string set in each stage is composed of one or two or more character string sets on the lower stage side. A fuzzy search including an incomplete match of a specific character string is performed for each document and each character string set in each document in a storage device in which one or more hierarchical documents are stored. A character string search device for a hierarchical document for searching the specific character string for all the characters of each document, and a search unit for determining the degree of coincidence, and for each document, the lowest stage From the side, the degree of coincidence with respect to the character string set of each stage is totaled, the highest degree of coincidence is set to the degree of coincidence of that stage, and the highest degree of coincidence of each document is set to the degree of coincidence of the document. And the degree of agreement for each document and each of the documents. A character string search device for a hierarchical document, wherein the degree of coincidence for each set of character strings in columns is displayed stepwise or simultaneously.
JP11090093A 1993-04-13 1993-04-13 Character string search device for hierarchical documents Expired - Lifetime JP3315755B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP11090093A JP3315755B2 (en) 1993-04-13 1993-04-13 Character string search device for hierarchical documents

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP11090093A JP3315755B2 (en) 1993-04-13 1993-04-13 Character string search device for hierarchical documents

Publications (2)

Publication Number Publication Date
JPH06301725A true JPH06301725A (en) 1994-10-28
JP3315755B2 JP3315755B2 (en) 2002-08-19

Family

ID=14547528

Family Applications (1)

Application Number Title Priority Date Filing Date
JP11090093A Expired - Lifetime JP3315755B2 (en) 1993-04-13 1993-04-13 Character string search device for hierarchical documents

Country Status (1)

Country Link
JP (1) JP3315755B2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005122726A (en) * 2003-10-01 2005-05-12 Fuji Xerox Co Ltd Method and system for searching contact information of context base
JP2008146209A (en) * 2006-12-07 2008-06-26 Just Syst Corp Document retrieval device, document retrieval method and document retrieval program
WO2009048130A1 (en) * 2007-10-12 2009-04-16 Nec Corporation Document rating calculation system, document rating calculation method and program

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0484271A (en) * 1990-07-26 1992-03-17 Nippon Telegr & Teleph Corp <Ntt> Intra-information retrieval device
JPH04326164A (en) * 1991-04-25 1992-11-16 Nippon Steel Corp Data base retrieval system
JPH0520371A (en) * 1991-07-11 1993-01-29 Nippon Telegr & Teleph Corp <Ntt> Information retrieval result display method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0484271A (en) * 1990-07-26 1992-03-17 Nippon Telegr & Teleph Corp <Ntt> Intra-information retrieval device
JPH04326164A (en) * 1991-04-25 1992-11-16 Nippon Steel Corp Data base retrieval system
JPH0520371A (en) * 1991-07-11 1993-01-29 Nippon Telegr & Teleph Corp <Ntt> Information retrieval result display method

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005122726A (en) * 2003-10-01 2005-05-12 Fuji Xerox Co Ltd Method and system for searching contact information of context base
JP2008146209A (en) * 2006-12-07 2008-06-26 Just Syst Corp Document retrieval device, document retrieval method and document retrieval program
WO2009048130A1 (en) * 2007-10-12 2009-04-16 Nec Corporation Document rating calculation system, document rating calculation method and program
JP5187313B2 (en) * 2007-10-12 2013-04-24 日本電気株式会社 Document importance calculation system, document importance calculation method, and program
US8983965B2 (en) 2007-10-12 2015-03-17 Nec Corporation Document rating calculation system, document rating calculation method and program

Also Published As

Publication number Publication date
JP3315755B2 (en) 2002-08-19

Similar Documents

Publication Publication Date Title
EP0722145B1 (en) Information retrieval system and method of operation
US5787420A (en) Method of ordering document clusters without requiring knowledge of user interests
US6199061B1 (en) Method and apparatus for providing dynamic help topic titles to a user
CN103699700B (en) A kind of generation method of search index, system and associated server
JP2832988B2 (en) Data retrieval system
US20080177717A1 (en) Support for reverse and stemmed hit-highlighting
NO335144B1 (en) Phrase-based generation of document descriptions
NO335440B1 (en) Phrase-based indexing in an information retrieval system
CN101246484A (en) Electric text similarity processing method and system convenient for query
EP1154355B1 (en) Document processing method, system and computer readable storage medium
JP2006099428A (en) Document summary preparation system, method, and program
JPH08147320A (en) Information retrieving method and system
JPH0484271A (en) Intra-information retrieval device
JPH06290217A (en) Document retrieval system
JPH06301725A (en) Retrieval device for character-string of hierarchized document
JPH08314966A (en) Method for generating index of document retrieving device and document retrieving device
JPH064584A (en) Text retriever
JPH11154164A (en) Adaptability calculating method in whole sentence search processing and storage medium storing program related to the same
JP2002324077A (en) Apparatus and method for document retrieval
JP2004342016A (en) Information retrieval program and medium having information retrieval program recorded thereon
JP2519129B2 (en) Multi-word information retrieval processing method and retrieval file creation device
JP2004192368A (en) Method and device for extracting relevant class
JPH07146872A (en) Document retrieval device
JPH08314950A (en) Retrieval method and device for text
JPH08305726A (en) Information retrieving device

Legal Events

Date Code Title Description
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20020521

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20080607

Year of fee payment: 6

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090607

Year of fee payment: 7

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100607

Year of fee payment: 8

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100607

Year of fee payment: 8

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20110607

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20110607

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20120607

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20120607

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20130607

Year of fee payment: 11

EXPY Cancellation because of completion of term