JPH0782497B2

JPH0782497B2 - Document processor

Info

Publication number: JPH0782497B2
Application number: JP63134714A
Authority: JP
Inventors: 直樹水谷; 育雄芥子
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1988-06-01
Filing date: 1988-06-01
Publication date: 1995-09-06
Anticipated expiration: 2010-09-06
Also published as: JPH01304575A

Description

【発明の詳細な説明】〈産業上の利用分野〉本発明は、利用者が作成した文書や電子メールから入力
された文書の書式分割および内容による分類を自動的に
行なう文書処理装置に関する。TECHNICAL FIELD The present invention relates to a document processing apparatus for automatically performing format division and classification of a document created by a user or a document input from an electronic mail by content.

〈従来の技術〉日本語ワードプロセッサ等の文書処理装置においては、
ビジネス文書などの比較的定型の文書を作成する場合、
過去に作成した文書をそのまま使ったり、一部修正する
だけで新たな文書が作成できることが往々にしてあり、
このことがワードプロセッサの大きな利点でもある。そ
こで、利用者は、自分が作成した文書を、一定の分類体
系に従って分類し、これに文書内容を象徴するような分
類インデックス（文書名）を付加してメモリに記憶させ
る一方、上記分類インデックスを用いてメモリから所望
の文書を検索させ、呼び出している。<Prior Art> In a document processing device such as a Japanese word processor,
When creating a relatively standard document such as a business document,
It is often the case that you can use a document created in the past as it is or create a new document by just partially modifying it.
This is also a great advantage of word processors. Therefore, the user classifies the documents created by him / her according to a certain classification system, adds a classification index (document name) that symbolizes the document contents to this, and stores it in the memory. It is used to retrieve a desired document from the memory and call it.

〈発明が解決しようとする課題〉ところが、上記従来のワードプロセッサにおける文書分
類方式は、利用者の判断に基づくマニュアル作業による
ものであるため、的確な分類を行なうには利用者が分類
体系一覧表を完全に把握しておく必要があるうえ、一覧
表にないものについての判断基準が利用者毎に異なると
いう問題がある。そのため、作成された文書をその内容
によって画一的かつ能率的に分類することが非常に難か
しいという欠点がある。また、同一カテゴリーに分類さ
れた文書の数が多くなると、同じような文書名が増加し
て、文書名だけでは文書内容を明確に判断できず、的確
な検索ができなくなるという欠点がある。例えば、「祝
賀状」という文書名をもつ文書には、大きく分類しても
（１）開店，開業の祝賀状（２）新築落成の祝賀状
（３）栄転，昇進の祝賀状などがあり、各文章の内容は
互いに大きく異なり、さらに上記同一分類中でも例えば
開店と開業では文章を構成する前文，主文，末文の表現
に差がある。<Problems to be Solved by the Invention> However, the document classification method in the above-mentioned conventional word processor is based on manual work based on the judgment of the user. Therefore, in order to perform accurate classification, the user needs to display the classification system list. There is a problem that it is necessary to fully understand it, and different users have different criteria for judging items not listed. Therefore, it is very difficult to classify the created documents uniformly and efficiently according to their contents. Further, when the number of documents classified into the same category increases, the number of similar document names increases, and the document contents cannot be clearly determined only by the document name, which makes it impossible to perform an accurate search. For example, a document with the document name "Celebratory letter" can be broadly classified into (1) Opening, Opening congratulatory letter, (2) Newly constructed congratulatory letter (3) Eiko, promotion letter, etc. The contents of each sentence are greatly different from each other, and even in the same classification, there is a difference in the expressions of the preceding sentence, the main sentence, and the end sentence that compose the sentence, for example, when opening a store and when opening a store.

そこで、本発明の目的は、文書処理装置自体に、文書を
その発生原因，作成目的，書式段落パターン等に基づい
て多面的に自動分類させ、適切な分類インデックスを付
加して登録させることによって、所望の文書を確実かつ
能率的に検索することができる新規な文書処理装置を提
供することである。Therefore, an object of the present invention is to cause the document processing apparatus itself to automatically classify a document in multiple ways based on the cause of the generation, the purpose of creation, the format paragraph pattern, etc., and add and register an appropriate classification index. It is an object of the present invention to provide a novel document processing device capable of surely and efficiently searching for a desired document.

〈課題を解決するための手段〉上記目的を達成するため、本発明の文書処理装置は、利
用者が入力した任意の文書を、書式段落パターンを識別
して複数の文章に分割する書式制御手段と、上記文章に
含まれる単語を解析してその単語の意味内容を象徴する
キーワード候補を抽出するキーワード抽出手段と、キー
ワードとそのキーワードを含む文章の発生原因を表わす
トピック名とを対にしたトピック・キーワード関係表を
格納するトピック・キーワード関係表格納手段と、各キ
ーワードについて、所定キーワードとこの所定キーワー
ドを含む文章中に現われる他の関連キーワードとを上記
所定キーワードに関連する重要トピック名と関係づける
規則表を格納する規則表格納手段と、上記キーワード抽
出手段で抽出されたキーワード候補の中から、上記トピ
ック・キーワード関係表にあるキーワードおよびそのト
ピック名を選び出し、選び出したキーワードに対応する
単語の前方の文章に上記規則表に基づいて関連キーワー
ドが有るか否かにより、上記トピック名が重要トピック
名であるか否かを判断するとともに、否と判断した場合
は、上記規則表により重要トピック名への変更を行なう
トピック分類・解析手段と、文章中に現われる慣用的表
現およびキーワードとその文章の目的とを対にして格納
する文書目的・キーワード，慣用表現関係表手段と、文
章中に含まれる慣用的表現を辞書を用いて抽出し、抽出
した慣用的表現と上記キーワード候補により上記文書目
的・キーワード，慣用的表現関係表を参照して、文書の
作成目的候補を選び出す文書目的分類手段と、各トピッ
ク名と各文書目的の組み合わせの適否を表わすトピック
・文書目的対応表格納手段と、上記トピック分類・解析
手段で求められたトピックと上記文書目的分類手段で選
び出された文書目的との整合性を、上記トピック・文書
目的対応表に基づいて検査し、文書概念をなすトピック
・文書目的の組み合わせを決定する文書概念制御手段と
を備えて、入力された文書を分割し、分割された文章の
夫々に分類インデックスとしてトピック名，文書目的名
を付加して記憶装置に格納する一方、与えられたトピッ
ク名等に基づいて記憶装置から文章を検索して、文書を
作成し得るようにしている。<Means for Solving the Problems> In order to achieve the above object, the document processing apparatus of the present invention is a format control means for dividing an arbitrary document input by a user into a plurality of sentences by identifying a format paragraph pattern. And a keyword extraction means for analyzing a word contained in the sentence to extract a keyword candidate that symbolizes the meaning of the word, and a topic paired with a keyword and a topic name representing the cause of the sentence containing the keyword. A topic / keyword relationship table storage means for storing a keyword relationship table, and for each keyword, a predetermined keyword and another related keyword appearing in a sentence including the predetermined keyword are related to an important topic name related to the predetermined keyword. Rule table storage means for storing the rule table, and whether the keyword candidates extracted by the keyword extraction means From the above, select the keyword and its topic name in the topic / keyword relation table, and the topic name is important depending on whether there is a related keyword based on the above rule table in the sentence preceding the word corresponding to the selected keyword. In addition to determining whether it is a topic name, if not, topic classification / analysis means that changes to the important topic name according to the above rule table, conventional expressions and keywords appearing in the sentence, and the sentence Document purpose / keyword, idiomatic expression relation table means for storing the object and the object of interest, and idiomatic expressions contained in the sentence are extracted using a dictionary, and the document object is extracted by the extracted idiomatic expression and the keyword candidate. -Document purpose classifying means for selecting candidate creation purposes of documents by referring to keywords and idiomatic expression relation tables, and each topic name The consistency between the topic / document purpose correspondence table storage means showing the suitability of each combination of document purposes, the topic determined by the topic classification / analysis means, and the document purpose selected by the document purpose classification means is described above. The input document is divided and classified into each of the divided sentences by providing a document concept control means for inspecting based on the topic / document objective correspondence table and determining the combination of the topic / document objective forming the document concept. While adding a topic name and a document purpose name as an index and storing them in a storage device, a document can be created by retrieving a sentence from the storage device based on a given topic name or the like.

〈作用〉利用者によって入力さえた文書は、まず書式制御手段に
より書式段落パターンが識別されて複数の文章に分割さ
れ、この文章中の単語は、キーワード抽出手段により解
析されて、その意味内容を象徴するキーワード候補が抽
出される。トピック・キーワード関係表格納手段には、
キーワードとそのキーワードを含む文章の発生原因を表
わすトピック名とを対にしたトピック・キーワード関係
表が格納され、規則表格納手段には、各キーワードにつ
いて、所定キーワードとこの所定キーワードを含む文章
中に現われる他の関連キーワードとを上記所定キーワー
ドに関連する重要トピック名と関係づける規則表が格納
されている。<Operation> The document input by the user is first divided into a plurality of sentences by the format control unit by identifying the format paragraph pattern, and the words in this sentence are analyzed by the keyword extraction unit to determine the meaning content. The symbolic keyword candidates are extracted. In the topic / keyword relationship table storage means,
A topic / keyword relationship table in which a keyword and a topic name indicating a cause of a sentence including the keyword are paired is stored, and the rule table storage means stores a predetermined keyword for each keyword and a sentence including the predetermined keyword. A rule table for associating other related keywords that appear with important topic names related to the predetermined keyword is stored.

そこで、トピック分類・解析手段は、上記キーワード抽
出手段によって抽出されたキーワード候補の中から、上
記トピック・キーワード関係表にあるキーワードおよび
そのトピック名を選び出し、選び出したキーワードに対
応する単語の前方の文章に上記規則表に基づいて関連キ
ーワードが有るか否かにより、上記トピック名が重要ト
ピック名であるか否かを判断するとともに、否と判断し
た場合は、上記規則表により重要トピック名への変更を
行なう。一方、文書目的分類手段は、上記文章中に含ま
れる慣用的表現を抽出し、抽出した慣用的表現と上記ト
ピック分類・解析手段で見つけ出されたキーワード候補
に基づき、文章中に現われる慣用的表現およびキーワー
ドとその文章の目的とを対にして格納した文書目的・キ
ーワード，慣用表現関係表手段を参照して、文書の作成
目的候補を選び出す。Therefore, the topic classification / analysis means selects a keyword and its topic name in the topic / keyword relationship table from the keyword candidates extracted by the keyword extraction means, and a sentence preceding the word corresponding to the selected keyword. If there is a related keyword based on the above rule table, it is determined whether the topic name is an important topic name.If not, change to the important topic name according to the rule table. Do. On the other hand, the document purpose classifying means extracts the idiomatic expressions contained in the sentence, and based on the extracted idiomatic expressions and the keyword candidates found by the topic classifying / analyzing means, the idiomatic expressions appearing in the sentence. A document creation purpose candidate is selected by referring to the document purpose / keyword and idiomatic expression relation table means that store the keyword and the purpose of the sentence in pairs.

最後に、文書概念制御手段は、上記トピック分類・解析
手段で求められたトピックと上記文書目的分類手段で選
び出された文書目的との整合性を、各トピック名と各文
書目的の組み合わせの適否を表わすトピック・文書目的
対応表に基づいて検査し、文書概念をなすトピック・文
書目的の組み合わせを決定する。Finally, the document concept control means determines the consistency between the topic obtained by the topic classification / analysis means and the document purpose selected by the document purpose classification means, whether or not the combination of each topic name and each document purpose is appropriate. Is checked based on the topic / document purpose correspondence table that represents, and the combination of the topic / document purpose that constitutes the document concept is determined.

こうして決定されたトピック名，文書目的名は、分割さ
れた文章の夫々に分類インデックスとして付加されて、
記憶装置に格納される一方、検査時には、与えられたト
ピック名等に応じた文章が記憶装置から読み出され、こ
れによって容易に新たな文書が作成できる。The topic name and document purpose name thus determined are added as classification indexes to each of the divided sentences,
While being stored in the storage device, a text corresponding to a given topic name or the like is read from the storage device at the time of inspection, whereby a new document can be easily created.

〈実施例〉以下、本発明を図示の実施例により詳細に説明する。<Examples> Hereinafter, the present invention will be described in detail with reference to illustrated examples.

第１図は本発明の文書処理装置の一例たる日本語ワード
プロセッサの構成模式図であり、中央処理装置１は、後
述する種々の手段やテーブルを備え、入力装置２から入
力される文書データを仮名漢字変換等して編集処理し、
文書内容によって自動分類する一方、入力された文書デ
ータや処理された文書データを、表示装置３に表示させ
るとともに補助記憶装置４に記憶させる。FIG. 1 is a schematic diagram of the configuration of a Japanese word processor which is an example of the document processing apparatus of the present invention. The central processing unit 1 is provided with various means and tables described below, and the document data input from the input device 2 is pseudonym. Edited by converting kanji,
While automatically classifying according to the document content, the input document data and the processed document data are displayed on the display device 3 and stored in the auxiliary storage device 4.

上記中央処理装置１は、入力された文書データを段落構
成，インデント，起辞等の書式段落パターンによって前
文，主文，末文等に分割する書式制御手段５と、分割さ
れた主文中からキーワード辞書６を参照しつつ複数の単
語を選び出し、選び出した単語の上位概念語即ちキーワ
ード候補を抽出するキーワード抽出手段７と、抽出され
たキーワード候補の中からその文書の発生原因たるトピ
ックを表現する重要なキーワードを見つけ出し、見つけ
出したキーワードに対応する単語の前方の文章を解析し
て上記トピックの妥当性を検査するトピック解析手段11
を備える。The central processing unit 1 divides the input document data into a preamble, a main sentence, a final sentence, etc. by a format paragraph pattern such as a paragraph structure, indentation, and an inscription, and a keyword dictionary from the divided main sentences. 6, a plurality of words are selected, keyword extracting means 7 for extracting a superordinate concept word of the selected words, that is, a keyword candidate, and an important topic for expressing a topic which is a cause of the document from the extracted keyword candidates Topic analysis means 11 for finding a keyword and analyzing the sentence in front of the word corresponding to the found keyword to check the validity of the topic 11
Equipped with.

さらに、上記中央処理装置１は、慣用的表現抽出手段13
が慣用的表現辞書12を参照して文章中から抽出した慣用
的表現と抽出されたキーワード候補とに基づいて文書の
作成目的候補を決定する文書目的分類手段15と、上記ト
ピック解析手段11で妥当とされたトピックの中から適正
なものをトピック階層関係表16（第４図参照）を参照し
て最終選定し、選定したトピックと上記文書目的分類手
段15で決定された作成目的候補との整合性をトピック・
文書目的対応表17（第５図参照）を参照して検査し、両
者の妥当な組合せを最終決定する文書概念制御手段18を
備える。Further, the central processing unit 1 has the conventional expression extracting means 13
Is referred to by the document purpose classification means 15 for determining a candidate for creating a document based on the idiomatic expression extracted from the sentence by referring to the idiomatic expression dictionary 12 and the extracted keyword candidates, and the topic analysis means 11 From the selected topics, a proper one is finally selected by referring to the topic hierarchy table 16 (see FIG. 4), and the selected topic and the creation purpose candidate determined by the document purpose classification means 15 are matched. Sex
The document purpose correspondence table 17 (see FIG. 5) is inspected, and a document concept control means 18 for finally determining an appropriate combination of the two is provided.

上記トピック解析手段11における重要なキーワードの見
つけ出しは、トピック分類手段９が、キーワード抽出手
段７で抽出されたキーワード候補に適したトピックをま
ずトピック・キーワード関係表８に基づいて決定し、決
定されたキーワードとトピックの組合せの中から規則表
10に載っている特に重要なキーワードを選ぶことによっ
て行なわれる。また、上記文書目的分類手段15は、慣用
的表現抽出手段13が抽出した慣用的表現とキーワード抽
出手段７が抽出したキーワード候補とに基づき、文書目
的・キーワード，慣用的表現関係表14を参照して文書作
成目的候補を決定する。そして、中央処理装置１は、こ
うして文書概念制御手段18で最終決定された各トピック
名と文書目的名および書式分類名を、書式制御手段５で
分割された主文データの文章段落ごとにインデックスと
して付加して、補助記憶装置４に記憶させて登録するよ
うになっている。一方、上記中央処理装置１は、検索時
に入力装置２から入力された上記インデックスに該当す
る文書を、補助記憶装置４中で検索し、検索結果たる文
書を表示装置３に表示させる。The finding of an important keyword in the topic analysis unit 11 is determined by the topic classification unit 9 first determining a topic suitable for the keyword candidate extracted by the keyword extraction unit 7 based on the topic / keyword relationship table 8. Rule table from combinations of keywords and topics
This is done by choosing the most important keywords listed in 10. Further, the document purpose classifying means 15 refers to the document purpose / keyword / conventional expression relation table 14 based on the conventional expressions extracted by the conventional expression extracting means 13 and the keyword candidates extracted by the keyword extracting means 7. Determine the candidate for document creation purpose. Then, the central processing unit 1 adds each topic name, document purpose name and format classification name finally determined by the document concept control means 18 as an index for each text paragraph of the main sentence data divided by the format control means 5. Then, it is stored in the auxiliary storage device 4 and registered. On the other hand, the central processing unit 1 searches the auxiliary storage device 4 for a document corresponding to the index input from the input device 2 at the time of search, and displays a document as a search result on the display device 3.

上記構成の日本語ワードプロセッサいよる文書の自動分
類処理について次に述べる。The automatic classification processing of documents by the Japanese word processor having the above configuration will be described below.

利用者が入力装置２から文書を入力すると、中央処理装
置１は、入力された文書を書式制御手段５により、第２
図に示すように前文，主文，末文等に分割し、分割した
主文について文書概念を決めるための以下の解析を行な
う。いま、分割された主文が第３図の文章20に示すもの
であったとする。なお、図中の縦線はキーワード辞書ま
たは関係表による参照を示す。すると、キーワード抽出
手段７は、キーワード辞書６を用いて文章20中の単語を
夫々キーワードに変換する。この例では、図中のキーワ
ード21の欄に示すように「会社，組織，変革，店舗，役
職，就任，慶」の７つのキーワードが抽出される。続い
て、トピック分類手段９は、抽出された上記７つのキー
ワードからトピック・キーワード関係表８を用いて、ト
ピック候補22として「変更」と「就任」を選び出し、ト
ピック候補「変更」に対してキーワード「変革」を、ト
ピック候補「就任」に対してキーワード「役職」，「就
任」を夫々抽出する。When the user inputs a document from the input device 2, the central processing unit 1 causes the format control unit 5 to input the input document to the second document.
As shown in the figure, the sentence is divided into the preceding sentence, the main sentence, the end sentence, and the like, and the following analysis is performed to determine the document concept for the divided main sentence. Now, it is assumed that the divided main sentence is the sentence 20 shown in FIG. The vertical line in the figure indicates reference by a keyword dictionary or a relationship table. Then, the keyword extracting means 7 uses the keyword dictionary 6 to convert each word in the sentence 20 into a keyword. In this example, seven keywords “company, organization, change, store, post, inauguration, Kei” are extracted as shown in the keyword 21 column in the figure. Subsequently, the topic classifying means 9 selects “change” and “appointment” as topic candidates 22 from the extracted seven keywords using the topic / keyword relationship table 8 and selects keywords for the topic candidates “change”. For "reform", the keywords "position" and "appointment" are extracted from the topic candidate "appointment", respectively.

次に、トピック解析手段11は、抽出された上記トピック
候補・キーワードの組合せのうち規則表10に記述されて
いるキーワードについて、そのキーワードに対応する単
語の前文の文章を解析してトピック候補の妥当性を検査
する。上記規則表10には、例えばキーワード「変革」に
ついて次のような規則が登録されている。Next, the topic analysis means 11 analyzes the preamble sentences of the words corresponding to the keywords for the keywords described in the rule table 10 among the extracted combinations of the topic candidates and the keywords, and determines the validity of the topic candidates. Inspect sex. In the rule table 10 described above, for example, the following rule is registered for the keyword “reform”.

この規則は、文章中でキーワード「変革」に変換された
単語の何文字か前方に、キーワード「組織」に変換され
た単語がある場合、トピック「変更」を単なる住所や電
話番号の変更とは異質のトピック「人事異動」と解釈せ
よということを意味する。また、トピック「就任」につ
いても、キーワード「就任」に変換された単語の前方に
キーワード「役職」に変換された単語（部長）が必要で
ある等の規則が登録されており、これらの規則に基づい
てトピック候補を厳選するのである。 This rule states that if there is a word converted to the keyword "organization" in front of some words in the sentence converted to the keyword "transformation", the topic "change" is not simply a change of address or phone number. It means that it should be interpreted as a foreign topic “personnel change”. Also, for the topic "appointment", rules are registered such that the word (department manager) converted to the keyword "post" is required in front of the word converted to the keyword "appointment". Carefully select topic candidates based on this.

次に、慣用的表現抽出手段13は、文章中に含まれる文章
の目的を象徴する慣用的表現を慣用的表現辞書12を用い
て抽出する。第３図の例では、文章20中に下線で示すよ
うに「承りますところ」と「なられたとのこと」が手紙
文特有の慣用表現として抽出される。そして、文書目的
分類手段15は、文書目的・キーワード，慣用的表現関係
表14を用いて、上記２つの慣用表現はいずれも文書目的
「祝賀状」または「見舞状」にみられる表現であり、キ
ーワード「慶」は文書目的「祝賀状」を示すものだとし
て、第３図の文書目的候補23の欄に示すように、文書目
的の第１候補を「祝賀状」と決定する。Next, the conventional expression extraction means 13 extracts the conventional expression that is included in the sentence and symbolizes the purpose of the sentence by using the conventional expression dictionary 12. In the example of FIG. 3, as shown by the underline in the sentence 20, "accepting place" and "had been trained" are extracted as idiomatic expressions peculiar to the letter sentence. Then, the document purpose classifying means 15 uses the document purpose / keyword and the idiomatic expression relation table 14, and both of the above two idiomatic expressions are expressions found in the document purpose “celebration letter” or “sympathy letter”, Since the keyword "Kei" indicates a document purpose "celebration letter", the first candidate for the document purpose is determined to be a "celebration letter" as shown in the column of the document purpose candidate 23 in FIG.

さらに、文書概念制御手段18は、厳選された上記トピッ
ク候補の中から第４図に示すトピック階層関係表16を参
照して、より具体的なものをトピックの最終候補とす
る。例文の場合、トピック「人事異動」よりもトピック
「就任」の方が具体的（下位概念）であるので、トピッ
クは「就任」と決定される。次に、第５図に示すトピッ
ク・文書目的対応表17を参照して、トピック候補と文書
目的候補の組合せの妥当性を検査する。例文の場合はト
ピック「就任」と文書目的「祝賀状」の組合せは妥当と
されるが、第５図中の×印で示す組合せの場合は、妥当
でないとしてトピック・文書目的候補の再検討が行なわ
れる。Further, the document concept control means 18 refers to the topic hierarchy relation table 16 shown in FIG. 4 from the above-mentioned carefully selected topic candidates, and determines a more specific one as a final topic candidate. In the case of the example sentence, the topic “initiation” is more specific (subordinate concept) than the topic “personnel change”, and thus the topic is determined to be “invocation”. Next, referring to the topic / document purpose correspondence table 17 shown in FIG. 5, the validity of the combination of the topic candidate and the document purpose candidate is checked. In the case of the example sentence, the combination of the topic "Inauguration" and the document purpose "Celebration letter" is valid, but in the case of the combination indicated by the cross in Fig. 5, it is considered invalid and the topic / document purpose candidate is reexamined. Done.

こうして、トピック名と文書目的名が例えば「就任」，
「祝賀状」と最終決定されると、中央処理装置１は、書
式制御手段５によって分割された主文の文章データの段
落ごとに上記トピック名と文書目的名を分類インデック
スとして付加して、補助記憶装置４に記憶させ、登録を
行なう。一方、こうして登録された文書を用いて新たな
文書を作成する場合、利用者は、入力装置２から所望の
トピック名や文書目的名を検索条件として入力する。す
ると、中央処理装置１は、この検索条件に該当する分類
インデックスをもつ文書データを補助記憶装置４から読
み出して、表示装置３に表示させる。従って、利用者
は、表示された文書を修正しあるいは参考にして、容易
かつ能率的に新たな文書を作成することができる。Thus, the topic name and document purpose name are, for example, "appointment",
When the "Celebration letter" is finally determined, the central processing unit 1 adds the topic name and the document purpose name as a classification index to each paragraph of the sentence data of the main sentence divided by the format control means 5, and stores the auxiliary storage. It is stored in the device 4 and registered. On the other hand, when creating a new document using the document registered in this way, the user inputs a desired topic name or document purpose name from the input device 2 as a search condition. Then, the central processing unit 1 reads out the document data having the classification index corresponding to the search condition from the auxiliary storage device 4 and displays it on the display device 3. Therefore, the user can modify the displayed document or refer to it to easily and efficiently create a new document.

上記実施例では、中央処理装置１に備えられた各処理手
段7,11,15,18に、その処理を能率化させる種々の辞書6,
12やテーブル8,9,14,16,17および補助処理手段9,13を設
けているので、文書の自動分類を一層迅速化することが
できる。In the above embodiment, the various processing means 7, 11, 15, 18 provided in the central processing unit 1 are provided with various dictionaries 6, which make the processing efficient.
Since 12, the tables 8, 9, 14, 16, 17 and the auxiliary processing means 9, 13 are provided, the automatic classification of documents can be further speeded up.

〈発明の効果〉以上の説明で明らかなように、本発明の文書処理装置
は、利用者が入力した文書を複数の文章に段落分割する
書式制御手段と、この文章中の単語からキーワード候補
を抽出するキーワード抽出手段と、上記キーワード候補
の中から，トピック・キーワード関係表（キーワードと
このキーワードを含む文章の発生原因を表わすトピック
名とを対にして格納した表）にあるキーワードおよびそ
のトピック名を選び出し、選び出したキーワードに対応
する単語の前方の文章に規則表（各キーワードについ
て、所定キーワードとこの所定キーワードを含む文章中
に現われる他の関連キーワードとを上記所定キーワード
に関連する重要トピック名と関係ずけて格納した表）に
基づいて関連キーワードが有るか否かにより、上記トピ
ック名が重要トピック名であるか否かを判断し、かつ否
と判断した場合は、上記規則表により重要トピック名へ
の変更を行なうトピック分類・解析手段と、文章中から
辞書を用いて抽出した慣用的表現と上記キーワード候補
により文書目的・キーワード，慣用的表現関係表を参照
して、文書の作成目的候補を選び出す文書目的分類手段
と、上記トピック分類・解析手段で求められたトピック
と上記文書目的分類手段で選び出された文書目的との整
合性を、トピック・文書目的対応表に基づいて検査し、
文書概念をなすトピック・文書目的の組み合わせを決定
する文書概念制御手段とを備えて、決定されたトピック
・文書目的名を上記分割された文章に分類インデックス
として付加して記憶する一方、与えられた分類インデッ
クスに該当する文章を検索して文書を作成し得るように
しているので、作成された文書を迅速かつ多面的に自動
分類して登録した後、所望の文書を的確かつ能率的に検
索でき、これを参考にして新たな文書を容易かつ能率的
に作成でき、文書管理の飛躍的効率化に大いに貢献す
る。<Effects of the Invention> As is apparent from the above description, the document processing apparatus of the present invention includes a format control unit that divides a document input by the user into a plurality of sentences and a keyword candidate from the words in the sentence. The keyword and its topic name in the topic / keyword relationship table (a table that stores a keyword and a topic name indicating the cause of the sentence containing this keyword as a pair) from the keyword candidates that are extracted The rule table in the sentence in front of the word corresponding to the selected keyword (for each keyword, the predetermined keyword and other related keywords appearing in the sentence including this predetermined keyword as important topic names related to the predetermined keyword). Depending on whether there is a related keyword based on the table stored independently), the topic name above Is a key topic name, and if it is not, a topic classification / analysis means that changes to a key topic name according to the above rule table, and a convention extracted from a sentence using a dictionary Document purpose / keywords and idiomatic expression relation table based on dynamic expressions and the above keyword candidates, and document purpose classification means for selecting candidate creation purposes for documents, topics determined by the above topic classification / analysis means, and the above document purposes. The consistency with the document purpose selected by the classification means is checked based on the topic / document purpose correspondence table,
A document concept control means for deciding a combination of topics and document purposes that form a document concept is provided, and the determined topic and document purpose names are added to the divided sentences as a classification index and stored, while being given. Since the document corresponding to the classification index can be searched to create a document, the created document can be quickly and multi-directionally automatically registered and registered, and then the desired document can be searched accurately and efficiently. , With this as a reference, new documents can be created easily and efficiently, which greatly contributes to dramatic improvement in document management.

[Brief description of drawings]

第１図は本発明の一実施例たる日本語ワードプロセッサ
の構成模式図、第２図は上記実施例による書式分割例を
示す図、第３図は上記実施例の分類処理の具体例を示す
概念図、第４図はトピックの階層関係の一例を示す図、
第５図はトピックと文書目的の組合せの可否の一例を示
す図である。１…中央処理装置、２…入力装置、４…補助記憶装置、
５…書式制御手段、７…キーワード抽出手段、11…トピ
ック解析手段、15…文書目的分類手段、18…文書概念制
御手段。FIG. 1 is a schematic diagram showing the configuration of a Japanese word processor which is an embodiment of the present invention, FIG. 2 is a diagram showing an example of format division according to the above embodiment, and FIG. 3 is a concept showing a concrete example of classification processing according to the above embodiment. Fig. 4 is a diagram showing an example of hierarchical relation of topics,
FIG. 5 is a diagram showing an example of availability of a combination of topics and document purposes. 1 ... Central processing unit, 2 ... Input device, 4 ... Auxiliary storage device,
5 ... Format control means, 7 ... Keyword extraction means, 11 ... Topic analysis means, 15 ... Document purpose classification means, 18 ... Document concept control means.

Claims

[Claims]

1. Format control means for identifying a format paragraph pattern of an arbitrary document input by a user and dividing the document into a plurality of sentences, and analyzing a word included in the sentence to symbolize the meaning of the word. A keyword extraction means for extracting keyword candidates, a topic / keyword relationship table storage means for storing a topic / keyword relationship table that pairs a keyword and a topic name indicating the cause of a sentence containing the keyword, and for each keyword , A rule table storing means for storing a rule table for associating a predetermined keyword and other related keywords appearing in a sentence including the predetermined keyword with an important topic name related to the predetermined keyword, and the rule extracting means extracted by the keyword extracting means. From the keyword candidates, select the keywords in the topic / keyword relationship table Pick out the pick name, and determine whether or not the topic name is an important topic name, depending on whether there is a related keyword based on the rule table in the sentence in front of the word corresponding to the selected keyword,
If it is determined to be no, the topic classification / analysis means for changing important topic names according to the above rule table and the purpose of the document that stores the idiomatic expressions and keywords appearing in the text as a pair with the purpose of the text keyword,
The conventional expression relation table means and the conventional expression contained in the sentence are extracted using a dictionary, and the document purpose / keyword and conventional expression relationship table are referred to by the extracted conventional expression and the above keyword candidates, Document purpose classification means for selecting candidate creation objectives, topic / document purpose correspondence table storage means for indicating the suitability of each topic name and each document purpose combination, and the topic obtained by the topic classification / analysis means and the document purpose The consistency with the document purpose selected by the classification means is checked based on the above topic / document purpose correspondence table,
A document concept control means for determining a combination of topics and document objectives forming a document concept is provided, and the input document is divided, and the topic name and the document objective name are added to each of the divided sentences as a classification index. A document processing apparatus which is capable of storing a document in a storage device and retrieving a sentence from the storage device based on a given topic name or the like to create a document.