JP2019160134A

JP2019160134A - Sentence processing device and sentence processing method

Info

Publication number: JP2019160134A
Application number: JP2018049146A
Authority: JP
Inventors: 整坂入; Hitoshi Sakairi; 学菅澤; Manabu Sugasawa; 尚宏鈴木; Naohiro Suzuki; 一樹山根; Kazuki Yamane; 昌幸親松; Masayuki Chikamatsu; 敬之若山; Noriyuki Wakayama; 尚平大野; Shohei Ono
Original assignee: Hitachi Ltd; East Japan Railway Co
Current assignee: Hitachi Ltd; East Japan Railway Co
Priority date: 2018-03-16
Filing date: 2018-03-16
Publication date: 2019-09-19

Abstract

To provide a sentence processing device and a sentence processing method capable of classifying a vast amount of text into keywords corresponding to news and fashion, summarizing the sentence according to a sentence structure, and obtaining information from a specific viewpoint from a specific point of view.SOLUTION: The sentence processing device inputs a classification tag name for classification and a specification classification tag composed of keywords together with a sentence, and calculates the degree of validity when the specification classification tag is used as the classification tag of the sentence, and when the degree of the sentence with the input classification tag input is equal to or greater than the prescribed threshold, the device assigns a specified classification tag to the sentence (Phase I). In the summary of the sentence, the device analyzes the structure and meaning of the sentence, adds information on the classification in Phase I to the information, and creates a summary sentence (Phase II).SELECTED DRAWING: Figure 3

Description

本発明は、文章処理装置および文章処理方法に係り、特に、膨大な文章を、目的に沿って、即座に分類し、要約を効率的に取得するのに好適な文章処理に関する。 The present invention relates to a sentence processing apparatus and a sentence processing method, and more particularly to sentence processing suitable for quickly classifying enormous sentences according to purposes and efficiently obtaining a summary.

インターネットの爆発的な普及に伴って、各企業は、Ｗｅｂシステムをどのように利用して顧客と対応するか、が最重要な課題となりつつある。そのような状況下において、企業は、Ｗｅｂシステムを利用した情報発信のみならず、コールセンタやＳＮＳ（Social Networking Service）と連動して、顧客の要望や商品・サービスに対する書き込みなどから情報収集して、改善点を洗い出したり、そのような要望に即座に対応することが企業活動における大切なポイントとなっている。 Along with the explosive spread of the Internet, how companies use Web systems to deal with customers is becoming the most important issue. Under such circumstances, the company collects information not only from the information transmission using the Web system but also from the customer's request and writing on the product / service in conjunction with the call center and SNS (Social Networking Service), Identifying improvements and responding immediately to such requests are important points in corporate activities.

インターネットから収集した文書を企業が有効活用するためには、情報処理装置により、トピックごとに、文書を分類し、要約する技術が求められる。例えば、特許文献１には、抽出された人物属性およびトピックに基づいた質問カテゴリを候補として選択し、それに対するユーザ発話をその質問カテゴリに属する質問の回答して分類する方法が開示されている。また、特許文献２には、入力されたテキストから複数の要約文を出力し、複数の評価尺度からその要約文を評価して、最も適切な要約文を抽出する技術が開示されている。 In order for a company to effectively use documents collected from the Internet, a technology for classifying and summarizing documents for each topic by an information processing device is required. For example, Patent Document 1 discloses a method of selecting a question category based on extracted person attributes and topics as candidates and classifying user utterances as answers to questions belonging to the question category. Patent Document 2 discloses a technique for outputting a plurality of summary sentences from input text, evaluating the summary sentences from a plurality of evaluation scales, and extracting the most appropriate summary sentence.

特開２０１６−２０６８９４号公報JP 2006-206894 A 特開２０１６−１６１９６７号公報JP 2006-161967 A

例えば、鉄道事業者のような多角的な経営をおこなっている大企業では、そのようなインターネットから収集された文書は、年に何十万件というレベルになる。そのような文書情報を、適切にかつタイムリーに処理して企業活動に活かすのがよい。 For example, in a large company that performs diversified management such as a railway operator, the number of documents collected from the Internet reaches a level of hundreds of thousands per year. Such document information should be appropriately and timely processed and utilized in corporate activities.

また、特に、インターネットでトレンド入りするようなキーワードに関する文章を抜き出して、タイムリーに処理し、見解や対応情報が必要になることがある。例えば、鉄道事業者では、特定の路線の事故に関するキーワードに関する文章から情報を取得して、適切な対応をとることが必要になる。 In particular, there are cases where sentences related to keywords that are trending on the Internet are extracted, processed in a timely manner, and opinions and correspondence information are required. For example, it is necessary for a railway operator to acquire information from a sentence about keywords related to an accident on a specific route and take appropriate measures.

特許文献１に記載の技術では、そのような特定のキーワードを指定して分類するような技術については考慮されておらず、ニュースや流行に対処した文章の分類をおこなうことはできない。また、特許文献２には、ユーザの要望など特定の視点に関して、要約を作成する技術について開示されていない。例えば、ユーザより要望に関する視点に基づいた要約文を抽出することによって、ユーザの要望により、改善点を洗い出し、それに対処することにより、顧客満足度を向上することについては、考慮されていない。 The technique described in Patent Document 1 does not consider such a technique for specifying and classifying specific keywords, and cannot classify sentences that deal with news and trends. Further, Patent Document 2 does not disclose a technique for creating a summary regarding a specific viewpoint such as a user's request. For example, it is not considered to improve the customer satisfaction by extracting an improvement point according to the user's request by extracting a summary sentence based on the viewpoint related to the request from the user and dealing with it.

本発明の目的は、膨大な対象文章に対して、ニュースや流行に対処したキーワードによる分類を行うことができ、文章構造に即した文章の要約を行い、要約した文章より特定の視点から適切に情報を取得できる文章処理装置および文章処理方法を提供することにある。 The purpose of the present invention is to be able to classify a large number of target sentences with keywords that deal with news and trends, summarize sentences according to the sentence structure, and more appropriately from a specific viewpoint than the summarized sentences. An object of the present invention is to provide a sentence processing apparatus and a sentence processing method capable of acquiring information.

本発明の文章処理装置は、好ましくは、文章に分類タグを付与することにより、文章の分類を行う文章処理装置であって、文章と指定分類タグを入力し、指定分類タグとその文章について、指定分類タグをその文章の分類タグとしたときの妥当性を示す指標である当てはまり度を算出し、入力された指定分類タグとの文章の当てはまり度が規定の閾値以上のときに、文章に指定分類タグを付与するようにしたものである。 The sentence processing apparatus of the present invention is preferably a sentence processing apparatus that classifies a sentence by adding a classification tag to the sentence, and inputs the sentence and the designated classification tag, and the designated classification tag and the sentence, Calculates the degree of fit, which is an index indicating the validity of the specified classification tag as the classification tag of the sentence, and designates the sentence when the degree of sentence fit with the specified classification tag is greater than the specified threshold A classification tag is added.

また、本発明の文章処理装置の別の例では、好ましくは、文章の要約を行って、その文章の要約文を出力する文章処理装置であって、分類タグを付与された文章と要約文の対である要約実績データを保持し、入力された文章の単文に対して構造解析を行い、構造解析に関係する意味を抽出し、要約実績データの分類タグごとに、構造解析に関係する意味のある単文を抽出し、入力された文章の単文と、要約実績データの文章との類似度が規定の閾値以上のときに、要約実績データの要約文を入力された文章の要約文とするようにしたものである。
本発明はまた、上記文章処理装置における文章処理方法として構成される。 In another example of the sentence processing apparatus of the present invention, preferably, a sentence processing apparatus that performs sentence summarization and outputs a summary sentence of the sentence, the sentence with the classification tag and the summary sentence It holds summary performance data that is a pair, performs structural analysis on the single sentence of the input sentence, extracts the meaning related to structural analysis, and for each classification tag of summary performance data, the meaning related to structural analysis When a single sentence is extracted and the similarity between the sentence of the input sentence and the sentence of the summary performance data is equal to or higher than the specified threshold, the summary sentence of the summary performance data is made the summary sentence of the input sentence. It is a thing.
The present invention is also configured as a text processing method in the text processing apparatus.

本発明によれば、膨大な対象文章に対して、ニュースや流行に対処したキーワードによる分類を行うことができ、文章構造に即した文章の要約を行い、要約した文章より特定の視点から適切に情報を取得できる。 According to the present invention, it is possible to classify a large number of target sentences by keywords that deal with news and trends, summarize sentences according to the sentence structure, and more appropriately from a specific viewpoint than the summarized sentences. Information can be acquired.

文章処理装置の機能構成を示す図である。It is a figure which shows the function structure of a text processing apparatus. 文章処理装置のハードウェア・ソフトウェア構成を示す図である。It is a figure which shows the hardware software configuration of a text processing apparatus. 本発明の文章処理の特徴の概要を示す図である。It is a figure which shows the outline | summary of the characteristic of the text processing of this invention. 分類実績データの一例を示す図である。It is a figure which shows an example of classification | category performance data. 要約実績データの一例を示す図である。It is a figure which shows an example of summary performance data. 分類・要約済データの一例を示す図である。It is a figure which shows an example of classification | category / summarized data. 文章解析結果データの一例を示す図である。It is a figure which shows an example of text analysis result data. レポート作成開始画面の一例を示す図である。It is a figure which shows an example of a report creation start screen. 文章分類・要約結果画面の一例を示す図である。It is a figure which shows an example of a text classification / summary result screen. 文章分類・要約詳細一覧画面の一例を示す図である。It is a figure which shows an example of a text classification and summary detailed list screen. 分類変更画面の一例を示す図である。It is a figure which shows an example of a classification change screen. 要約変更画面の一例を示す図である。It is a figure which shows an example of a summary change screen. レポート表示画面の一例を示す図である。It is a figure which shows an example of a report display screen. 文章情報蓄積処理を示すフローチャートである。It is a flowchart which shows a text information storage process. 文章解析処理を示すフローチャートである。It is a flowchart which shows a text analysis process. 文章分類処理を示すフローチャートである。It is a flowchart which shows a sentence classification | category process. 文章要約処理を示すフローチャートである。It is a flowchart which shows a text summary process. 分類学習処理を示すフローチャートである。It is a flowchart which shows a classification | category learning process. 要約学習処理を示すフローチャートである。It is a flowchart which shows the summary learning process. レポート作成処理を示すフローチャートである。It is a flowchart which shows report preparation processing. 半自動分類処理を示すフローチャートである。It is a flowchart which shows a semi-automatic classification process.

以下、本発明に係る一実施形態を、図１ないし図２１を用いて説明する。
先ず、図１および図２を用いて一施形態に係る文章処理装置の構成について説明する。 Hereinafter, an embodiment according to the present invention will be described with reference to FIGS.
First, the configuration of a text processing apparatus according to an embodiment will be described with reference to FIGS. 1 and 2.

本実施形態に係る文章処理装置１０は、図１にその機能構成が示されるように、文章解析部１１０、文章分類部１２０、分類変更部１３０、文章要約部１４０、要約変更部１５０、レポート作成部１６０から構成される。 As shown in FIG. 1, the sentence processing apparatus 10 according to the present embodiment includes a sentence analysis unit 110, a sentence classification unit 120, a classification change unit 130, a sentence summary unit 140, a summary change unit 150, and a report creation Part 160.

文章解析部１１０は、外部から受信した文章の構造解析をして、文章解析結果データ(詳細は後述)を出力する部分である。文章分類部１２０は、文章の出現単語に基づいて、文章を分類（カテゴリ分け）して、分類タグを付与する部分である。分類変更部１３０は、文章分類部１２０で分類した分類を変更して、分類タグを付け直す部分である。文章要約部１４０は、文章を要約して、その要約文を作成する部分である。要約変更部１５０は、文章要約部１４０により作成された要約文を変更する部分である。レポート作成部１６０は、文章を分類した情報と要約した情報に基づいて、ユーザに結果を出力する部分である。 The sentence analysis unit 110 is a part that analyzes the structure of a sentence received from the outside and outputs sentence analysis result data (details will be described later). The sentence classification unit 120 is a part that classifies (categorizes) sentences based on the words appearing in the sentences and adds classification tags. The classification changing unit 130 is a part that changes the classification classified by the sentence classification unit 120 and re-adds the classification tag. The sentence summary unit 140 is a part that summarizes sentences and creates the summary sentence. The summary changing unit 150 is a part that changes the summary sentence created by the sentence summarizing unit 140. The report creation unit 160 is a part that outputs a result to the user based on the information that classifies the text and the summarized information.

文章解析部１１０は、単語解析部１１１、構造解析部１１２、文章解析結果処理部１１３からなる。単語解析部１１１は、文章を単語レベルで解析し、文章を単語に分割する部分である。構造解析部１１２は、文章を構造レベルで解析し、その意味を抽出する部分である。文章解析結果処理部１１３は、単語解析部１１１、構造解析部１１２の結果を処理する部分である。 The sentence analysis unit 110 includes a word analysis unit 111, a structure analysis unit 112, and a sentence analysis result processing unit 113. The word analysis unit 111 is a part that analyzes a sentence at a word level and divides the sentence into words. The structure analysis unit 112 is a part that analyzes the sentence at the structure level and extracts the meaning. The sentence analysis result processing unit 113 is a part that processes the results of the word analysis unit 111 and the structure analysis unit 112.

文章分類部１２０は、分類処理部１２１、半自動分類部１２２、分類学習部１２３からなる。分類処理部１２１は、文章の分類に関する計算や判断をおこなう部分である。半自動分類部１２２は、分類のためのキーワードを入力して、分類させる半自動分類をおこなう部分である。分類学習部１２３は、文章と分類タグを関連付けた情報を、学習データとして保存する部分である。 The sentence classification unit 120 includes a classification processing unit 121, a semi-automatic classification unit 122, and a classification learning unit 123. The classification processing unit 121 is a part that performs calculations and determinations regarding sentence classification. The semi-automatic classification unit 122 is a part that performs semi-automatic classification by inputting a keyword for classification and performing classification. The classification learning unit 123 is a part that stores information associating sentences and classification tags as learning data.

文章要約部１４０は、要約処理部１４１、要約学習部１４２からなる。要約処理部１４１は、文章から要約を作成する部分である。要約学習部１４２は、文章と要約文を関連付けた情報を保存する部分である。 The sentence summarizing unit 140 includes a summarizing processing unit 141 and a summarizing learning unit 142. The summary processing unit 141 is a part that creates a summary from sentences. The summary learning unit 142 is a part that stores information that associates a sentence with a summary sentence.

文章処理装置のハードウェア構成としては、例えば、図２に示されるパーソナルコンピュータのような一般的な情報処理装置で実現される。 The hardware configuration of the text processing apparatus is realized by a general information processing apparatus such as a personal computer shown in FIG.

文章処理装置１０は、ＣＰＵ（Central Processing Unit）２０２、主記憶装置２０４、ネットワークＩ／Ｆ２０６、表示Ｉ／Ｆ２０８、入出力Ｉ／Ｆ２１０、補助記憶Ｉ／Ｆ２１２が、バスにより結合された形態になっている。また、文章処理装置１０は、ネットワーク６０を介して、文章サーバ４００と接続されている。文章サーバ４００は、文章を格納しており、文章処理装置１０に文章を配布するサーバである。 The text processing apparatus 10 has a configuration in which a CPU (Central Processing Unit) 202, a main storage device 204, a network I / F 206, a display I / F 208, an input / output I / F 210, and an auxiliary storage I / F 212 are connected by a bus. ing. The text processing apparatus 10 is connected to the text server 400 via the network 60. The sentence server 400 is a server that stores sentences and distributes the sentences to the sentence processing apparatus 10.

ＣＰＵ２０２は、文章処理装置１０の各部を制御し、主記憶装置２０４に必要なプログラムをロードして実行する。主記憶装置２０４は、通常、ＲＡＭなどの揮発メモリで構成され、ＣＰＵ２０２が実行するプログラム、参照するデータが記憶される。ネットワークＩ／Ｆ２０６は、ネットワーク６０と接続するためのインタフェースである。表示Ｉ／Ｆ２０８は、ＬＣＤ（Liquid Crystal Display）などの表示装置２２０を接続するためのインタフェースである。入出力Ｉ／Ｆ２１０は、入出力装置を接続するためのインタフェースである。図２の例では、入出力装置としてキーボード２３０とポインティングデバイスのマウス２３２が接続されている。 The CPU 202 controls each unit of the text processing apparatus 10 and loads and executes a necessary program in the main storage device 204. The main storage device 204 is usually composed of a volatile memory such as a RAM, and stores a program executed by the CPU 202 and data to be referred to. The network I / F 206 is an interface for connecting to the network 60. The display I / F 208 is an interface for connecting a display device 220 such as an LCD (Liquid Crystal Display). The input / output I / F 210 is an interface for connecting an input / output device. In the example of FIG. 2, a keyboard 230 and a pointing device mouse 232 are connected as input / output devices.

補助記憶Ｉ／Ｆ２１２は、ＨＤＤ（Hard Disk Drive）２５０やＳＳＤ（Solid State Drive）などの補助記憶装置を接続するためのインタフェースである。 The auxiliary storage I / F 212 is an interface for connecting an auxiliary storage device such as an HDD (Hard Disk Drive) 250 or an SSD (Solid State Drive).

ＨＤＤ２５０は、大容量の記憶容量を有しており、本実施形態を実行するためのプログラムが格納されている。文章処理装置１０には、文章解析プログラム３１０、文章分類プログラム３２０、分類変更プログラム３３０、文章要約プログラム３４０、要約変更プログラム３５０、レポート作成プログラム３６０がインストールされている。 The HDD 250 has a large storage capacity, and stores a program for executing this embodiment. The sentence processing apparatus 10 is installed with a sentence analysis program 310, a sentence classification program 320, a classification change program 330, a sentence summary program 340, a summary change program 350, and a report creation program 360.

文章解析プログラム３１０、文章分類プログラム３２０、分類変更プログラム３３０、文章要約プログラム３４０、要約変更プログラム３５０、レポート作成プログラム３６０は、それぞれ、文章解析部１１０、文章分類部１２０、分類変更部１３０、文章要約部１４０、要約変更部１５０、レポート作成部１６０の各機能を実現するためのプログラムである。 The sentence analysis program 310, the sentence classification program 320, the classification change program 330, the sentence summary program 340, the summary change program 350, and the report creation program 360 are a sentence analysis unit 110, a sentence classification unit 120, a classification change unit 130, and a sentence summary, respectively. This is a program for realizing the functions of the unit 140, the summary change unit 150, and the report creation unit 160.

また、ＨＤＤ２５０は、分類実績データ５１０、要約実績データ５２０、分類・要約済データ５３０、文章解析結果データ５４０を格納する。なお、各データの意義と構造については、後に詳説する。 The HDD 250 stores classification result data 510, summary result data 520, classified / summarized data 530, and sentence analysis result data 540. The significance and structure of each data will be described in detail later.

次に、図３を用いて、本発明の文章処理の特徴の概要を説明する。
従来では、文章の分類をおこなう場合には、文章から過去の分類実績データを元に、分類に当てはまる尤度/確信度を算出して、分類をおこなっていた。 Next, the outline of the feature of the text processing of the present invention will be described with reference to FIG.
Conventionally, when classifying a sentence, the likelihood / confidence level that applies to the classification is calculated from the sentence based on past classification record data, and the classification is performed.

本発明では、文章とともに、ユーザにより指定分類を入力させて、半自動分類を実施する（図３のｐｈａｓｅＩ）。ここで、指定分類とは、ユーザが新たに入力する分類のための分類タグ名とキーワードをいう。また、そのように分類のためのキーワードを入力して、情報処理装置に分類させることを半自動分類という。これにより、ユーザが抽出したい分類結果を半自動的に抽出可能になる。 In the present invention, a specified classification is input by a user together with a sentence, and semi-automatic classification is performed (phase I in FIG. 3). Here, the designated classification refers to a classification tag name and a keyword for classification newly input by the user. Also, such a method of inputting a keyword for classification and causing the information processing apparatus to classify is called semi-automatic classification. As a result, the classification result that the user wants to extract can be extracted semi-automatically.

また、従来では、文章からパラメータを元に要約結果を生成していた。また、その評価結果を元にパラメータ更新していた。ここで、パラメータとは、例えば、特許文献２に記載されているように、文あるいは文の集合の特徴を重みとして表現した量である。 In the past, summary results were generated from text based on parameters. In addition, parameters were updated based on the evaluation results. Here, the parameter is, for example, an amount expressing a feature of a sentence or a set of sentences as a weight as described in Patent Document 2.

本発明では、文章の要約をするときには、文章の構造・意味を解析して、その情報に、図３のｐｈａｓｅＩでの分類の情報を加えて、要約文を作成する（図３のｐｈａｓｅＩＩ）。これにより、分類にあった要約文が作成可能になる。 In the present invention, when summarizing a sentence, the structure / meaning of the sentence is analyzed, and information on the classification in phase I in FIG. 3 is added to the information to create a summary sentence (phase II in FIG. 3). As a result, a summary sentence suitable for the classification can be created.

次に、図４ないし図７を用いて本実施形態の文章処理装置で用いられるデータ構造について説明する。
分類実績データ５１０は、文章を分類するための規範とするデータであり、図４に示されるように、分類タグ５１１、文章５１２の欄からなるデータ形式を有する。これは、文章５１２に格納されているテキストのパターンに対して、分類タグ５１１で示されるタグの分類をおこなうことを示している。この分類実績データ５１０は、分類学習処理（後述）によって学習されることにより生成される。 Next, a data structure used in the text processing apparatus according to the present embodiment will be described with reference to FIGS.
The classification result data 510 is data used as a standard for classifying sentences, and has a data format including classification tags 511 and sentences 512 as shown in FIG. This indicates that the tag indicated by the classification tag 511 is classified with respect to the text pattern stored in the sentence 512. This classification result data 510 is generated by learning through classification learning processing (described later).

要約実績データ５２０は、文章の要約文を作成するための規範とするデータであり、図５に示されるように、分類タグ５２１、文章５２２、文章構造５２３、要約文５２４の欄からなるデータ形式を有する。これは、分類タグ５２１により分類され、文章構造５２３に記載された文章構造を有する文章５２２のテキストパターンに対して、要約文５２４に記載された要約文を生成することを示している。この分類実績データ５１０は、要約学習処理（後述）によって学習されることにより生成される。 The summary performance data 520 is data used as a standard for creating a summary sentence of a sentence. As shown in FIG. 5, a data format including a classification tag 521, a sentence 522, a sentence structure 523, and a summary sentence 524 column. Have This indicates that the summary sentence described in the summary sentence 524 is generated for the text pattern of the sentence 522 classified by the classification tag 521 and having the sentence structure described in the sentence structure 523. The classification result data 510 is generated by learning through summary learning processing (described later).

分類・要約済データ５３０は、本実施形態の文章処理装置により、文章を分類し、要約を作成し終わった結果として保存されるデータであり、図６に示されるように、登録日５３１、文章５３２、分類タグ１（５３３）、分類タグ２（５３４）、要約文５３５の欄からなるデータ形式を有する。 The classified / summarized data 530 is data that is stored as a result of classifying sentences and creating a summary by the sentence processing apparatus of the present embodiment. As shown in FIG. 532, a classification tag 1 (533), a classification tag 2 (534), and a summary data 535 column.

文章５３２に格納された文章に対して、登録日５３１に記載された登録日に、分類ためのタグとして、分類タグ１（５３３）、分類タグ２（５３４）に記載されたタグが付与され、要約文５３５に記載された要約文が生成されたことを示している。なお、本実施形態の説明では、分類タグは、二つとしているが、一つでもよいし、さらに、三以上の分類タグを有するようにしてもよい。 Tags written in classification tag 1 (533) and classification tag 2 (534) are assigned as classification tags to the text stored in text 532 on the registration date described in registration date 531. This shows that the summary sentence described in the summary sentence 535 has been generated. In the description of the present embodiment, two classification tags are used. However, one classification tag may be used, or three or more classification tags may be included.

文章解析結果データ５４０は、文章処理の対象となる文章に含まれる文章の解析結果を保持するデータであり、図７に示されるように、単文５４１、単語５４２、品詞５４３、文章構造５４４、意味５４５の欄からなるデータ形式を有する。 The sentence analysis result data 540 is data that holds the analysis result of the sentence included in the sentence to be processed. As shown in FIG. 7, the simple sentence 541, the word 542, the part of speech 543, the sentence structure 544, the meaning It has a data format consisting of 545 fields.

これは、単文５４１に記載された文が、各単語５４２に記載され、品詞５４３に記載された品詞に分解され、文章構造５４４に記載された文章構造を有し、その文は、単語、文章構造から意味５４５に記載された意味を有することを示している。例えば、図７の例では、「〜してほしい」という文章構造から「速やかに放送をしてほしい」という単文は、（顧客の）要望という意味を有することを示している。 This is because the sentence described in the simple sentence 541 has the sentence structure described in the sentence structure 544, which is decomposed into the part of speech described in each part of the word 542 and described in the part of speech 543. It has the meaning described in meaning 545 from the structure. For example, in the example of FIG. 7, a single sentence “I want you to broadcast immediately” from a sentence structure “I want you to do” indicates that it has a meaning of (customer) request.

次に、図８ないし図１２を用いて本実施形態の文章処理装置のユーザインタフェースについて説明する。
レポート作成開始画面７００は、文章処理装置１０により、文章分類と要約処理のレポートを生成し、表示するために最初に表示される画面であり、指定された期間における分類のための分類タグ名(以下、単に「分類タグ」ということもある)と分類キーワードを入力する画面である。レポート作成開始画面７００は、図８に示されるように、文章対象範囲欄７０１、指定分類入力欄７０２、レポート作成開始ボタン７０５の表示要素を有する。 Next, the user interface of the text processing apparatus according to this embodiment will be described with reference to FIGS.
The report creation start screen 700 is a screen that is first displayed for generating and displaying a report of sentence classification and summary processing by the sentence processing apparatus 10, and a classification tag name for classification in a specified period ( Hereinafter, it is a screen for simply inputting a classification keyword. As shown in FIG. 8, the report creation start screen 700 includes display elements of a text target range column 701, a designated classification input column 702, and a report creation start button 705.

文章対象範囲欄７０１は、取り扱う文章の期間を入力する欄である。例えば、文章の登録日時としては、文章を作成した時あるいは文章を文章サーバ４００に格納した時とすることができる。指定分類入力欄７０２は、指定分類のための情報を入力する欄であり、分類タグ名入力欄７０３、分類キーワード入力欄７０４からなる。分類タグ名入力欄７０３は、分類のために使用される分類タグ名を入力する欄である。分類キーワード入力欄７０４は、分類に使用されるキーワードを入力する欄である。図８に示される例では、分類タグ名として、「装置故障」、分類キーワードとして、「信号」、「障害」が入力されている。レポート作成開始ボタン７０５は、指定分類を入力して、レポートを作成して表示するときに、選択するボタンである。 The text target range column 701 is a column for inputting the period of the text to be handled. For example, the registration date and time of the sentence can be the time when the sentence is created or when the sentence is stored in the sentence server 400. The designated classification input field 702 is a field for inputting information for designated classification, and includes a classification tag name input field 703 and a classification keyword input field 704. The classification tag name input field 703 is a field for inputting a classification tag name used for classification. The classification keyword input column 704 is a column for inputting a keyword used for classification. In the example shown in FIG. 8, “device failure” is input as the classification tag name, and “signal” and “failure” are input as the classification keywords. The report creation start button 705 is a button that is selected when a specified classification is input and a report is created and displayed.

文章分類・要約結果画面７１０は、文章分類・要約処理の結果を表示する画面であり、図９に示されるように、分類タグ表示欄７１１、件数表示欄７１２、詳細一覧表示ボタン７１３がセットとして繰り返される表示要素を有する。 The sentence classification / summary result screen 710 is a screen for displaying the result of sentence classification / summary processing. As shown in FIG. 9, a classification tag display field 711, a number display field 712, and a detailed list display button 713 are set. Has a repeated display element.

分類タグ表示欄７１１は、指定分類の入力されたタグ名を表示する欄である。件数表示欄７１２は、その指定分類に属する文章の件数を表示する欄である。詳細一覧表示ボタン７１３は、その指定分類に属する文章分類・要約詳細一覧画面７２０(後述)を表示するために選択するボダンである。 The classification tag display field 711 is a field for displaying the tag name to which the designated classification is input. The number display column 712 is a column that displays the number of sentences belonging to the specified classification. The detailed list display button 713 is a body selected to display a sentence classification / summary detailed list screen 720 (described later) belonging to the designated classification.

文章分類・要約詳細一覧画面７２０は、文章分類・要約処理をおこなった文章の詳細情報を表示し、変更の指定をおこなうための画面であり、図１０に示されるように、原文表示欄７２１、要約文表示欄７２２、分類変更ボタン７２３、要約変更ボタン７２４の表示要素からなる。 The sentence classification / summary detail list screen 720 is a screen for displaying detailed information of the sentences subjected to the sentence classification / summarization processing and designating the change. As shown in FIG. 10, as shown in FIG. The display includes a summary sentence display field 722, a classification change button 723, and a summary change button 724.

原文表示欄７２１は、元の文章を表示する欄である。要約文表示欄７２２は、元の文章の要約文を表示する欄である。分類変更ボタン７２３、要約変更ボタン７２４は、それぞれ分類変更画面７３０(後述)、要約変更画面７４０(後述)を表示するために選択するボタンである。 The original text display field 721 is a field for displaying the original text. The summary sentence display field 722 is a field for displaying a summary sentence of the original sentence. The classification change button 723 and the summary change button 724 are buttons that are selected to display a classification change screen 730 (described later) and a summary change screen 740 (described later), respectively.

分類変更画面７３０は、文章の指定分類を変更するための画面であり、図１１に示されるように、原文表示欄７３１、要約文表示欄７３２、分類指定コンボボックス７３３、変更反映ボタン７３４の表示要素を有する。 The classification change screen 730 is a screen for changing the designated classification of the text. As shown in FIG. 11, the original text display field 731, the summary text display field 732, the classification designation combo box 733, and the change reflection button 734 are displayed. Has an element.

原文表示欄７３１は、その文章の原文を表示する欄である。要約文表示欄７３２は、その文章の要約文を表示する欄である。分類指定コンボボックス７３３は、その文章の分類タグを付け替えるため入力表示要素である。変更反映ボタン７３４は、選択した分類タグを指定した文章に付与するために選択するボタンである。 The original sentence display column 731 is a column for displaying the original sentence of the sentence. The summary sentence display column 732 is a column for displaying a summary sentence of the sentence. The classification designation combo box 733 is an input display element for changing the classification tag of the sentence. The change reflection button 734 is a button that is selected to give the selected classification tag to the designated sentence.

なお、本実施形態では、分類のタグを三種類まで、指定できるようにしているが、それより少ない一種類、二種類を指定できるようにしてもよいし、それより多い四種類以上指定できるようにしてもよい。 In this embodiment, up to three types of classification tags can be specified. However, one or two types of classification tags can be specified, or four or more types can be specified. It may be.

要約変更画面７４０は、その文章要約文を編集して変更するための画面であり、図１２に示されるように、原文表示欄７４１、分類表示欄７４２、要約文編集エリア７４３、変更反映ボタン７４４の表示要素を有する。 The summary change screen 740 is a screen for editing and changing the sentence summary sentence. As shown in FIG. 12, the original sentence display field 741, the classification display field 742, the summary sentence editing area 743, and the change reflection button 744. Display elements.

原文表示欄７４１は、その文章の原文を表示する欄である。分類表示欄７４２は、その文章の分類タグを表示する欄である。要約文編集エリア７４３は、その文章の要約文を表示する欄である。要約文編集エリア７４３には、現在の要約文の内容が表示され、ユーザは、キーボードなどの入力デバイスを用いて、その要約文を変更することができる。変更反映ボタン７４４は、要約文編集エリア７４３で編集した要約文を反映するために選択する表示要素である。 The original sentence display column 741 is a column for displaying the original sentence of the sentence. The classification display column 742 is a column for displaying the classification tag of the sentence. The summary sentence editing area 743 is a column for displaying a summary sentence of the sentence. The summary sentence editing area 743 displays the contents of the current summary sentence, and the user can change the summary sentence using an input device such as a keyboard. The change reflection button 744 is a display element that is selected to reflect the summary sentence edited in the summary sentence edit area 743.

レポート表示画面７５０は、指定分類入力後に作成されるレポートを表示する画面であり、図１３に示されるように、文章対象範囲表示欄７５１、指定分類表示欄７５２、要約文表示欄７５３、印刷ボタン７５４、閉じるボタン７５５の表示要素を有する。 The report display screen 750 is a screen for displaying a report created after the designated classification is input. As shown in FIG. 13, a text target range display field 751, a designated classification display field 752, a summary sentence display field 753, a print button. 754 has a display element of a close button 755.

文章対象範囲表示欄７５１は、レポート作成開始画面７００から入力された対象文章の期間を表示する欄である。指定分類表示欄７５２は、レポート作成開始画面７００から入力された指定分類の分類タグ名、キーワードを表示する欄である。要約文表示欄７５３は、取り出された要約文を表示する欄である。印刷ボタン７５４は、レポート表示画面７５０に表示される情報を印刷するときに選択するボタンである。閉じるボタン７５５は、レポート表示画面７５０を閉じるときに選択するボタンである。 The text target range display field 751 is a field for displaying the period of the target text input from the report creation start screen 700. The designated category display column 752 is a column for displaying the category tag name and keyword of the designated category input from the report creation start screen 700. The summary sentence display column 753 is a column for displaying the extracted summary sentence. The print button 754 is a button that is selected when printing information displayed on the report display screen 750. The close button 755 is a button that is selected when the report display screen 750 is closed.

次に、図１４ないし図２１を用いて文章処理装置のおこなう処理について説明する。
図１４のフローチャートに示される文章情報蓄積処理は、文章とその文章の分類・要約の結果を文章処理装置に蓄積する処理である。 Next, processing performed by the text processing apparatus will be described with reference to FIGS.
The sentence information accumulation process shown in the flowchart of FIG. 14 is a process for accumulating sentences and the classification / summary results of the sentences in the sentence processing apparatus.

文章情報蓄積処理では、先ず、文章解析処理をおこなう(Ｓ１０１）。この文章解析処理は、後に詳述する。
次に、文章分類処理をおこなう(Ｓ１０２）。この文章分類処理は、後に詳述する。
次に、文章要約処理をおこなう(Ｓ１０３）。この文章要約処理は、後に詳述する。
次に、Ｓ１０２とＳ１０３の文章分類・文章要約の結果を、図６に示した分類・要約済データ５３０に蓄積する(Ｓ１０４）。 In the sentence information accumulation process, first, a sentence analysis process is performed (S101). This sentence analysis process will be described in detail later.
Next, sentence classification processing is performed (S102). This sentence classification process will be described in detail later.
Next, sentence summarization processing is performed (S103). This sentence summarization process will be described in detail later.
Next, the result of the sentence classification / sentence summary of S102 and S103 is accumulated in the classified / summarized data 530 shown in FIG. 6 (S104).

次に、図１５を用いて文章解析処理について説明する。
図１５に示される文章解析処理は、文章を単語レベル・構造レベルで解析して、その結果を保存する処理であり、図１４のＳ１０１に該当する処理である。
文章解析処理では、文章処理装置１０は、ネットワーク６０を経由して、文章サーバ４００から、文章を受信する(Ｓ２０１)。 Next, the sentence analysis process will be described with reference to FIG.
The sentence analysis process shown in FIG. 15 is a process of analyzing a sentence at the word level / structure level and storing the result, and corresponds to S101 in FIG.
In the sentence analysis process, the sentence processing apparatus 10 receives a sentence from the sentence server 400 via the network 60 (S201).

次に、図１に示される文章解析部１１０の単語解析部１１１が、受信した文章を解析し、図７に示した文章解析結果データ５４０に保存する(Ｓ２０２)。
次に、構造解析部１１２が分割した単語の関連構造を解析し、その意味を抽出し、文章解析結果データ５４０に保存する(Ｓ２０３)。 Next, the word analysis unit 111 of the sentence analysis unit 110 shown in FIG. 1 analyzes the received sentence and stores it in the sentence analysis result data 540 shown in FIG. 7 (S202).
Next, the structure analysis unit 112 analyzes the related structure of the divided words, extracts the meaning thereof, and stores it in the sentence analysis result data 540 (S203).

図１６に示される文章分類処理は、文章に分類のための分類タグを付加する処理であり、図１４のＳ１０２の処理に該当する処理である。
文章分類処理では、先ず、図７に示される文章解析結果データ５４０に格納されている出現単語や文章構造の情報を、文章分類部１２０に入力する(Ｓ３０１)。
次に、文章処理装置が保持する分類タグ(図示せず)ごとに、Ｓ３０３〜Ｓ３０５の処理をおこなう(Ｓ３０２〜Ｓ３０６)。 The sentence classification process shown in FIG. 16 is a process for adding a classification tag for classification to a sentence, and corresponds to the process of S102 in FIG.
In the sentence classification process, first, information on the appearance words and sentence structure stored in the sentence analysis result data 540 shown in FIG. 7 is input to the sentence classification unit 120 (S301).
Next, the processing of S303 to S305 is performed for each classification tag (not shown) held by the text processing device (S302 to S306).

分類処理部１２１が出現単語を用いて、文章と各分類タグとの類似度を算出する(Ｓ３０３)。文章と各分類タグとの類似度は、例えば、分類タグのキーワードが、その文章に出てくる回数の総和をとることにより算出することができる。例えば、「装置故障」という分類タグには、｛装置故障、装置不具合、装置出力｝という単語群をキーワードとして関連付けて置き、文章のその単語群のいずれが出現する回数をカウントとして、文章の総語数で除したものを類似度とする。 The classification processing unit 121 calculates the similarity between the sentence and each classification tag using the appearance word (S303). The similarity between the sentence and each classification tag can be calculated, for example, by taking the sum of the number of times that the keyword of the classification tag appears in the sentence. For example, in the classification tag “device failure”, the word group {device failure, device failure, device output} is placed as a keyword, and the total number of sentences is counted by counting the number of occurrences of that word group in the sentence. The similarity divided by the number of words.

そして、その類似度が所定の閾値以上のときには(Ｓ３０４：ＹＥＳ)、Ｓ３０５に行き、その類似度が所定の閾値未満のときには(Ｓ３０４：ＮＯ)、次の分類タグとの評価に行く(Ｓ３０２)。 When the similarity is equal to or higher than the predetermined threshold (S304: YES), the process goes to S305, and when the similarity is lower than the predetermined threshold (S304: NO), the evaluation is performed with the next classification tag (S302). .

その類似度が所定の閾値以上のときには(Ｓ３０４：ＹＥＳ)、その文章に対し、対象となる分類タグを付与する(Ｓ３０５)。 When the similarity is equal to or higher than a predetermined threshold (S304: YES), a target classification tag is assigned to the sentence (S305).

図１７に示される文章要約処理は、対象の文章から要約文を作成する処理であり、図１４のＳ１０３の処理に該当する処理である。
文章要約処理では、先ず、図７に示される文章解析結果データ５４０に格納されている出現単語や文章構造の情報、分類結果(文章分類処理で付加された分類タグ)を、文章要約部１４０に入力する(Ｓ４０１)。 The sentence summarization process shown in FIG. 17 is a process for creating a summary sentence from a target sentence, and corresponds to the process of S103 in FIG.
In the sentence summarization process, first, the appearance word and sentence structure information and the classification result (classification tag added in the sentence classification process) stored in the sentence analysis result data 540 shown in FIG. Input (S401).

次に、要約処理部１４１が、対象文章に付加された分類タグを取得し(Ｓ４０２)、全ての分類タグについて、Ｓ４０４〜Ｓ４１０の処理を繰り返す(Ｓ４０３〜Ｓ４１１)。 Next, the summary processing unit 141 acquires a classification tag added to the target sentence (S402), and repeats the processes of S404 to S410 for all the classification tags (S403 to S411).

先ず、要約処理部１４１が、文章解析結果データ５４０から意味のある単文を抽出する(Ｓ４０４)。意味のある単文とは、文章解析結果データ５４０の意味５４５の欄が空欄でないレコードに対応する単文５４１に格納されている単文である。例えば、図７に示される例では、意味５４５の「要望」に格納されているレコードの単文５４１に「速やかに放送してほしい。」という単文が格納されている。 First, the summary processing unit 141 extracts a meaningful single sentence from the sentence analysis result data 540 (S404). A meaningful simple sentence is a simple sentence stored in a simple sentence 541 corresponding to a record in which the meaning 545 field of the sentence analysis result data 540 is not blank. For example, in the example shown in FIG. 7, a single sentence “I want you to broadcast immediately” is stored in the single sentence 541 of the record stored in the “request” of the meaning 545.

次に、要約処理部１４１が、対象分類タグを有する要約実績データ５２０を取得する(Ｓ４０５)。
次に、取得した対象分類タグを有する要約実績データ５２０ごとに、Ｓ４０７〜Ｓ４０９を繰り返す(Ｓ４０６〜Ｓ４１０)。 Next, the summary processing unit 141 acquires summary performance data 520 having a target classification tag (S405).
Next, S407 to S409 are repeated for each summary performance data 520 having the acquired target classification tag (S406 to S410).

先ず、要約処理部１４１が、対象文章の単文と要約実績データ５２０の文章との類似度を算出する(Ｓ４０７)。対象文章の単文と要約実績データ５２０の文章との類似度は、例えば、対象文章の単文の単語が要約実績データ５２０の文章の単語に出てくる回数を、要約実績データ５２０の文章の総語数で除した値を類似度として評価することができる。 First, the summary processing unit 141 calculates the similarity between the single sentence of the target sentence and the sentence of the summary performance data 520 (S407). The similarity between the single sentence of the target sentence and the sentence of the summary achievement data 520 is, for example, the number of times that the single sentence word of the target sentence appears in the word of the summary achievement data 520, and the total number of words in the summary achievement data 520 The value divided by can be evaluated as the similarity.

そして、その類似度が閾値以上のときには(Ｓ４０８：ＹＥＳ)、Ｓ４０９に行き、その類似度が閾値未満のときには(Ｓ４０８：ＮＯ)、次の要約実績データ５２０に行く。 When the similarity is equal to or higher than the threshold (S408: YES), the process goes to S409. When the similarity is lower than the threshold (S408: NO), the process goes to the next summary performance data 520.

その類似度が閾値以上のときには(Ｓ４０８：ＹＥＳ)、要約実績データ５２０の要約文５２４を、対象文章の要約文として算出する(Ｓ４０９)。
ここで、対象文章に対して、要約文を複数抽出することにしてもよいし、Ｓ４０７の類似度が高いときのものを、その対象文章の要約文としてもよい。 When the similarity is equal to or higher than the threshold (S408: YES), the summary sentence 524 of the summary performance data 520 is calculated as the summary sentence of the target sentence (S409).
Here, a plurality of summary sentences may be extracted from the target sentence, or a summary sentence having a high similarity in S407 may be used as the summary sentence of the target sentence.

図１８に示される分類学習処理は、文章と分類タグの関連付けの学習データである分類実績データ５１０に、新たな情報を付け加える処理である。
分類学習処理では、先ず、図１１に示した分類変更画面７３０より、対象文章に対する新分類タグを指定する(Ｓ５０１)。
次に、分類変更部１３０に、対象文章と新分類タグの情報を入力する(Ｓ５０２)。
次に、対象文章と新分類タグの情報を分類実績データ５１０に蓄積する(Ｓ５０３)。 The classification learning process shown in FIG. 18 is a process of adding new information to the classification result data 510, which is learning data for associating sentences with classification tags.
In the classification learning process, first, a new classification tag for the target sentence is designated from the classification change screen 730 shown in FIG. 11 (S501).
Next, information on the target sentence and the new classification tag is input to the classification changing unit 130 (S502).
Next, the information of the target sentence and the new classification tag is accumulated in the classification result data 510 (S503).

図１９に示される要約学習処理は、文章と要約文の関連付けの学習データである要約実績データ５２０に、新たな情報を付け加える処理である。
要約学習処理では、先ず、図１２に示した要約変更画面７４０より、対象文章に対する要約文を記載する(Ｓ６０１)。
次に、要約変更部１５０に、対象文章と新要約文の情報を入力する(Ｓ６０２)。
次に、対象文章と新要約文の情報を要約実績データ５２０に蓄積する(Ｓ６０３)。 The summary learning process shown in FIG. 19 is a process of adding new information to summary result data 520 that is learning data for associating a sentence with a summary sentence.
In the summary learning process, first, a summary sentence for the target sentence is described from the summary change screen 740 shown in FIG. 12 (S601).
Next, information on the target sentence and the new summary sentence is input to the summary changing unit 150 (S602).
Next, information on the target sentence and the new summary sentence is accumulated in the summary performance data 520 (S603).

図２０に示されるレポート作成処理は、指定分類入力後に、レポートを作成して、表示する処理である。
レポート作成処理では、先ず、図８に示されるレポート作成開始画面７００より、対象文章範囲(期間)、分類タグ名、分類キーワードを記載する(Ｓ７０１)。
次に、レポート作成部１６０が、分類・要約済データ５３０から入力された対象文章範囲(期間)内の登録日を有するデータを取得する(Ｓ７０２)。 The report creation process shown in FIG. 20 is a process for creating and displaying a report after inputting a specified classification.
In the report creation process, first, the target sentence range (period), classification tag name, and classification keyword are entered from the report creation start screen 700 shown in FIG. 8 (S701).
Next, the report creation unit 160 acquires data having a registration date within the target sentence range (period) input from the classified / summarized data 530 (S702).

次に、レポート作成部１６０が、分類・要約済データ５３０を分類カラムの情報で集計し、件数を算出する(Ｓ７０３)。
次に、入力された指定分類タグごとに、Ｓ７０５の処理を繰り返す(Ｓ７０４〜Ｓ７０６)。 Next, the report creation unit 160 aggregates the classified / summarized data 530 with the information of the classification column, and calculates the number of cases (S703).
Next, the process of S705 is repeated for each input designated classification tag (S704 to S706).

ループ内では、半自動分類・要約処理をおこなう(Ｓ７０５)。半自動分類・要約処理の詳細は、後述する。
最後に、レポート作成部１６０が、結果を図１３に示したレポート表示画面７５０に表示する(Ｓ７０７)。 In the loop, semi-automatic classification / summarization processing is performed (S705). Details of the semi-automatic classification / summarization process will be described later.
Finally, the report creation unit 160 displays the result on the report display screen 750 shown in FIG. 13 (S707).

図２１に示した半自動分類・要約処理は、指定分類入力後に、指定分類に基づく分類と要約処理をおこなうものであり、図２０のＳ７０５に該当する処理である。
半自動分類・要約処理では、先ず、レポート作成部１６０が、指定分類情報に記載されたキーワードを有する分類・要約済データ５３０を検索する(Ｓ８０１)。
次に、分類学習部１２３が、検索された分類・要約済データ５３０を、指定分類タグ名の分類タグ名で、分類実績データ５１０に蓄積する(Ｓ８０２)。 The semi-automatic classification / summarization processing shown in FIG. 21 performs classification and summarization processing based on the designated classification after inputting the designated classification, and corresponds to S705 in FIG.
In the semi-automatic classification / summarization process, first, the report creation unit 160 searches the classified / summarized data 530 having the keywords described in the designated classification information (S801).
Next, the classification learning unit 123 accumulates the searched classified / summarized data 530 in the classification result data 510 with the classification tag name of the designated classification tag name (S802).

次に、分類処理部１２１に、文章対象範囲(期間)の登録日を有する分類・要約済データ５３０を入力する(Ｓ８０３)。
そして、文章対象範囲(期間)の登録日を有する分類・要約済データ５３０ごとに、Ｓ８０５〜Ｓ８０８の処理を繰り返す(Ｓ８０４〜Ｓ８０９)。 Next, the classified / summarized data 530 having the registration date of the text target range (period) is input to the classification processing unit 121 (S803).
Then, the processing of S805 to S808 is repeated for each classified / summarized data 530 having the registration date of the text target range (period) (S804 to S809).

先ず、分類処理部１２１が、対象文章と指定分類タグの当てはまり度を算出する(Ｓ８０５)。対象文章と指定分類タグの当てはまり度は、指定分類タグをその文章の分類タグとしたときの妥当性を示す指標であり、例えば、指定分類タグのタグ名に関連付けられるキーワードを定義しておき、そのキーワードが対象文章に出現する回数を、対象文章の総語数で除した値により定義することができる。 First, the classification processing unit 121 calculates the degree of fit between the target sentence and the designated classification tag (S805). The degree of fit between the target sentence and the designated classification tag is an index indicating the validity when the designated classification tag is the classification tag of the sentence. For example, a keyword associated with the tag name of the designated classification tag is defined, The number of times that the keyword appears in the target sentence can be defined by a value obtained by dividing by the total number of words in the target sentence.

対象文章データと指定分類タグの当てはまり度が、規定の閾値以上のときには(Ｓ８０５：ＹＥＳ)、Ｓ８０７に行き、対象文章データと指定分類タグの当てはまり度が、規定の閾値未満のときには(Ｓ８０５：ＮＯ)、Ｓ８０４に行き、次の分類・要約済データ５３０の処理をおこなう。 When the degree of fit between the target sentence data and the designated classification tag is equal to or greater than the prescribed threshold (S805: YES), the process goes to S807, and when the degree of fit between the target sentence data and the designated classification tag is less than the prescribed threshold (S805: NO) ), Go to S804, and process the next classified / summarized data 530.

対象文章データと指定分類タグの当てはまり度が、規定の閾値以上のときには(Ｓ８０５：ＹＥＳ)、分類処理部１２１が、分類・要約済データ５３０の文章に対して、指定分類の分類タグを付与する(Ｓ８０７)。
次に、要約処理部１４１が、該当する分類・要約済データ５３０の要約文を取得する(Ｓ８０８)。 When the degree of fit between the target sentence data and the specified classification tag is equal to or greater than a predetermined threshold (S805: YES), the classification processing unit 121 assigns the classification tag of the specified classification to the sentence of the classified / summarized data 530. (S807).
Next, the summary processing unit 141 acquires a summary sentence of the corresponding classified / summarized data 530 (S808).

以上説明したように、本実施形態による文章処理装置では、分類と要約をしたデータに対して、指定分類を入力することによって、再分類することにより、アップトゥデートなトピックの文章を選出し、かつ、文章構造に基づく要約処理をおこなうため、ユーザの要望を反映するなど、企業体にとって、価値のある要約文のレポートを得ることができる。 As described above, in the sentence processing apparatus according to the present embodiment, by inputting the specified classification for the classified and summarized data, by reclassifying, the sentence of the up-to-date topic is selected, In addition, since the summarization process based on the sentence structure is performed, it is possible to obtain a report of a summary sentence that is valuable to the business entity, such as reflecting the user's request.

１０…文章処理装置
１１０…文章解析部
１１１…単語解析部
１１２…構造解析部
１１３…文章解析結果処理部
１２０…文章分類部
１２１…分類処理部
１２２…半自動分類部
１２３…分類学習部
１３０…分類変更部
１４０…文章要約部
１４１…要約処理部
１４２…要約学習部
１５０…要約変更部
１６０…レポート作成部 DESCRIPTION OF SYMBOLS 10 ... Text processing apparatus 110 ... Text analysis part 111 ... Word analysis part 112 ... Structure analysis part 113 ... Text analysis result processing part 120 ... Text classification part 121 ... Classification processing part 122 ... Semi-automatic classification part 123 ... Classification learning part 130 ... Classification Changing section 140 ... sentence summarizing section 141 ... summary processing section 142 ... summary learning section 150 ... summary changing section 160 ... report creation section

Claims

A sentence processing device that classifies sentences by adding a classification tag to the sentences,
Enter the text and the specified classification tag,
For the designated classification tag and the sentence, calculate a degree of fit that is an index indicating the validity when the designated classification tag is the classification tag of the sentence,
A sentence processing apparatus, wherein the designated classification tag is assigned to the sentence when the degree of fit of the sentence with the inputted designated classification tag is equal to or greater than a predetermined threshold.

The designated classification tag includes a classification tag name and a keyword,
The sentence processing apparatus according to claim 1, wherein the inputted sentence is a sentence having the keyword of the designated classification tag.

The sentence processing apparatus according to claim 1, wherein the inputted sentence is data to which a classification tag is already added.

A sentence processing apparatus that summarizes a sentence and outputs a summary sentence of the sentence,
Holds summary performance data that is a pair of sentences and summary sentences with classification tags,
Perform structural analysis on the single sentence of the input sentence, extract the meaning related to structural analysis,
For each classification tag of the summary record data, a meaningful single sentence related to the structural analysis is extracted, and when the similarity between the input sentence and the summary record data is equal to or higher than a predetermined threshold The sentence processing apparatus characterized in that the summary sentence of the summary performance data is the summary sentence of the input sentence.

A sentence processing method for classifying sentences by adding classification tags to sentences,
Enter the text and the specified classification tag,
For the designated classification tag and the sentence, calculate a degree of fit that is an index indicating the validity when the designated classification tag is the classification tag of the sentence,
A sentence processing method comprising: assigning the designated classification tag to the sentence when the degree of fit of the sentence with the inputted designated classification tag is equal to or more than a predetermined threshold.

A sentence processing method for summarizing a sentence and outputting a summary sentence of the sentence,
Holds summary performance data that is a pair of sentences and summary sentences with classification tags,
Perform structural analysis on the single sentence of the input sentence, extract the meaning related to structural analysis,
For each classification tag of the summary record data, a meaningful single sentence related to the structural analysis is extracted, and when the similarity between the input sentence and the summary record data is equal to or higher than a predetermined threshold The sentence processing method, wherein the summary sentence of the summary performance data is used as a summary sentence of the inputted sentence.