JP4395094B2

JP4395094B2 - Text mining system

Info

Publication number: JP4395094B2
Application number: JP2005101476A
Authority: JP
Inventors: 和智永沼; 仁司大樫
Original assignee: Mitsubishi Electric Information Systems Corp
Current assignee: Mitsubishi Electric Information Systems Corp
Priority date: 2005-03-31
Filing date: 2005-03-31
Publication date: 2010-01-06
Anticipated expiration: 2025-03-31
Also published as: JP2006285396A

Description

本発明は、テキストマイニングシステムに関する。 The present invention relates to a text mining system.

製品企画や品質管理などの業務で必要とされる重要な情報を、蓄積された大量のテキスト、例えばアンケート、修理依頼書などから抽出して概念辞書を作成し、分析を行い、業務改善のために活用することを可能とするテキストマイニングシステムが、近年増加している。
特開２００１−１８４３５１号公報特開２００３−１４１１３４号公報高山泰博、他３名、「単語の連想関係に基づく情報検索システムＩｎｆｏＭＡＰ」、情報学基礎Ｎｏ．０５３−００１、１９９９年３月 To extract important information necessary for operations such as product planning and quality control from a large amount of accumulated text, such as questionnaires and repair requests, create a concept dictionary, analyze it, and improve the business In recent years, text mining systems that can be used in the future have increased.
JP 2001-184351 A JP 2003-141134 A Yasuhiro Takayama and three others, “Information search system InfoMAP based on word association”, Informatics Foundation No. 053-001, March 1999

この様な従来のテキストマイニングシステムは実際に使ってみないと効果の多少を事前にユーザは判断できない。売り切りのパッケージソフトウェアのような一律の料金設定では、興味はあるものの効果を判断できないユーザにとっては高額に見えてしまう。一方、効果が出たため十分に利用するようになったユーザからも、使用量に見合った料金を徴収できない。本発明は、前述のような問題点を解決するためのものであり、テキストマイニングに適した従量制の課金を行うことができるテキストマイニングシステムを得ることが目的である。 Such a conventional text mining system cannot be judged in advance by the user unless it is actually used. Uniform pricing such as sold out packaged software can be expensive for users who are interested but cannot determine the effect. On the other hand, it is not possible to collect a fee commensurate with the amount of use even from a user who has come to use it sufficiently because of the effect. The present invention is for solving the above-described problems, and an object of the present invention is to obtain a text mining system capable of performing pay-as-you-go billing suitable for text mining.

本発明に係るテキストマイニングシステムは、ライセンス保持を確認するためのライセンスＩＤと、解析対象である複数の文書の登録とを受付ける登録受付部と、前記文書に含まれる単語と、前記単語の共起情報とを抽出するテキスト解析部と、前記単語と、前記共起情報から算出した概念ベクトルとを関連付けて概念辞書を作成する概念辞書作成部と、複数の前記概念辞書を記憶する概念辞書記憶部と、前記ライセンスＩＤと、前記文書の指定と、分析条件の指定とを受付ける分析受付部と、前記文書に係る前記概念辞書を、前記分析条件により分析した結果を出力する分析部と、前記ライセンスＩＤごとに、予め定められた課金種別と、現状値とを格納する制限記憶部と、前記ライセンスＩＤに対応する前記課金種別に応じて、予め定められた処理の実行回数、作成物の個数、又は作成物の量を、現状値に加算する制限確認部と、前記制限記憶部が格納する前記ライセンスＩＤに対応する現状値と、前記課金種別に応じて予め定められた料金単価とから、料金を計算する料金算定部とを備え、前記制限確認部は、前記ライセンスＩＤに対応する課金種別が概念辞書業務範囲であり、かつ、前記概念辞書作成部が新規の概念辞書を作成する場合、前記新規の概念辞書において、前記概念辞書記憶部に記憶された既存の概念辞書と単語が重複する割合が、すべての前記既存の概念辞書のいずれかひとつについて、予め定められた値よりも小さい場合には、概念辞書業務範囲に対応する現状値に１を加算することを特徴とする。 A text mining system according to the present invention includes a license ID for confirming license retention, a registration receiving unit that receives registration of a plurality of documents to be analyzed, a word included in the document, and a co-occurrence of the word A text analysis unit that extracts information; a concept dictionary creation unit that creates a concept dictionary by associating the word with a concept vector calculated from the co-occurrence information; and a concept dictionary storage unit that stores a plurality of the concept dictionaries An analysis receiving unit that receives the license ID, the specification of the document, and the specification of an analysis condition, an analysis unit that outputs a result of analyzing the concept dictionary related to the document according to the analysis condition, and the license For each ID, a predetermined storage type for storing a predetermined charging type and a current value, and a predetermined storage type according to the charging type corresponding to the license ID. A limit confirmation unit that adds the number of executions of the processed processing, the number of created products, or the amount of created products to the current value, a current value corresponding to the license ID stored in the limit storage unit, and the charge type And a charge calculation unit for calculating a charge from a predetermined charge unit price. The restriction confirmation unit has a charge type corresponding to the license ID within the concept dictionary business range, and creates the concept dictionary. When the unit creates a new concept dictionary, in the new concept dictionary, the ratio of overlapping words with the existing concept dictionary stored in the concept dictionary storage unit is any one of all the existing concept dictionaries. Is smaller than a predetermined value, 1 is added to the current value corresponding to the conceptual dictionary work range .

前記制限記憶部は、前記ライセンスＩＤごとに制限値をさらに格納し、前記制限確認部は、前記現状値に加算すると制限値を越えると判断した時に、前記登録受付部、又は前記分析受付部は処理を中断することを特徴とする。 The limit storage unit further stores a limit value for each license ID, and when the limit confirmation unit determines that the limit value is exceeded when added to the current value, the registration receiving unit or the analysis receiving unit The process is interrupted.

前記制限値として値を指定できる他に、無制限を指定できることを特徴とする。 In addition to being able to specify a value as the limit value, it is possible to specify unlimited.

前記制限記憶部は、さらに前記ライセンスＩＤごとに、複数の課金種別に基づく料金の算定種別として、和、高、及び安のいずれかの値を有する料金算定種別を格納し、前記制限確認部は、前記ライセンスＩＤに対応する複数の課金種別に応じて、各々の前記課金種別に対応する現状値に加算し、前記料金算定部は、前記ライセンスＩＤに対応する複数の課金種別に応じて、各々の前記課金種別に対応する料金を算出し、前記料金算定方法が和であれば、各々の前記料金の和を、前記料金算定方法が高であれば、最も高い前記料金を、前記料金算定方法が安であれば、最も安い前記料金を、前記ライセンスＩＤに対する料金とすることを特徴とする。 The restriction storage unit further stores, for each license ID, a charge calculation type having a value of sum, high, or low as a charge calculation type based on a plurality of charge types, and the restriction confirmation unit includes , According to a plurality of billing types corresponding to the license ID, and added to the current value corresponding to each of the billing types, the charge calculation unit, respectively according to the plurality of billing types corresponding to the license ID If the charge calculation method is a sum, the sum of the respective charges is calculated. If the charge calculation method is high, the highest charge is calculated as the charge calculation method. If the price is low, the cheapest fee is set as the fee for the license ID.

前記制限記憶部は、前記ライセンスＩＤごとに、複数の前記課金種別に基づく制限種別として、すべて、又はひとつのいずれかの値を有する制限結合種別をさらに格納し、前記制限確認部は、前記制限種別がすべての場合、前記ライセンスＩＤに対応する複数の課金種別に応じて、各々の前記課金種別に対応する現状値に加算し、加算後の値が、すべての現状値について制限値を越えた場合、前記登録受付部はエラーを出力し、前記制限種別がひとつの場合、前記ライセンスＩＤに対応する複数の課金種別に応じて、各々の前記課金種別に対応する現状値に加算し、ひとつの現状値について制限値を越えた場合、前記登録受付部は処理を中断することを特徴とする。 The restriction storage unit further stores, for each license ID, a restriction combination type having all or one value as a restriction type based on a plurality of the charge types, and the restriction confirmation unit includes the restriction check unit When the type is all, the value is added to the current value corresponding to each of the charge types according to a plurality of charge types corresponding to the license ID, and the value after the addition exceeds the limit value for all the current values In this case, the registration receiving unit outputs an error, and when the restriction type is one, according to a plurality of charging types corresponding to the license ID, it adds to the current value corresponding to each of the charging types, When the current value exceeds the limit value, the registration receiving unit interrupts the processing.

前記分析受付部は、ユーザＩＤを受付けて、前記ユーザＩＤが予め定められた前記ライセンスＩＤに対応するユーザＩＤでなければエラーとすることを特徴とする。 The analysis accepting unit accepts a user ID and makes an error if the user ID is not a user ID corresponding to a predetermined license ID.

前記登録受付部は、複数のクライアント端末から、前記ライセンスＩＤと前記文書を受信し、前記テキスト解析部は、前記ライセンスＩＤごとに、前記単語の前記共起情報とを抽出し、前記概念辞書作成部は、前記ライセンスＩＤごとに、前記概念辞書を作成し、前記分析受付部は、前記ユーザＩＤごとに、前記ユーザＩＤと、前記文書の指定と、前記分析条件の指定とを受信し、前記分析部は、前記ユーザＩＤごとに、前記分析条件により分析した結果を前記クライアント端末に対して送信することを特徴とする。 The registration accepting unit, a plurality of clients terminal, receiving the document and the license ID, the text analysis unit for each of the license ID, extracts the said co-occurrence information of the word, the concept dictionary creation unit, for each of the license ID, to create the concept dictionary, the analysis reception unit, to the user ID your capital, receives and the user ID, and the designation of the document, and the designation of the analysis conditions The analysis unit transmits the analysis result according to the analysis condition to the client terminal for each user ID.

本発明によれば、テキストマイニングにおいて、従量制の課金を実現できるので、効果を判断できていないユーザにとっても初期導入の決断がし易くなる。また、効果が出たため使用量が多くなったユーザからは多くの使用料金を徴収することに納得を得られやすい。 According to the present invention, in the text mining, a pay-as-you-go system can be realized, so that the user who has not been able to judge the effect can easily make an initial introduction decision. Moreover, it is easy to be convinced to collect a lot of usage fees from users who have increased the amount of usage due to the effect.

実施の形態１．
以下、図１を用いて、本実施の形態におけるテキストマイニングシステムの構成を説明する。本実施の形態におけるテキストマイニングシステムは、分析対象である文書の登録を行うための登録受付部１０１、テキスト解析部１０２、概念辞書作成部１０３、文書索引作成部１０４、及び属性情報作成部１０５と、登録した文書に関連する情報を格納するための文書記憶部１０６、テキスト解析結果記憶部１０７、概念辞書記憶部１０８、文書索引記憶部１０９、属性情報記憶部１１０、及び管理情報記憶部１１１と、登録した文書の分析を行うための、分析受付部１１２、及び分析部１１３と、登録に係る課金を行うための登録課金種別確認部１１４、及び登録制限確認部１１５と、登録に係る課金に関連する情報を格納するための登録制限記憶部１１６と、分析に係る課金を行うための分析課金種別確認部１１７、及び分析制限確認部１１８と、分析に係る課金に関連する情報を格納するための分析制限記憶部１１９と、予め定められた期間の料金算定を行う料金算定部１２０と、料金算定に関連する情報を格納するための課金情報記憶部１２１とから構成される。 Embodiment 1 FIG.
Hereinafter, the configuration of the text mining system according to the present embodiment will be described with reference to FIG. The text mining system according to the present embodiment includes a registration receiving unit 101 for registering a document to be analyzed, a text analysis unit 102, a concept dictionary creating unit 103, a document index creating unit 104, and an attribute information creating unit 105. A document storage unit 106 for storing information related to the registered document, a text analysis result storage unit 107, a concept dictionary storage unit 108, a document index storage unit 109, an attribute information storage unit 110, and a management information storage unit 111; , An analysis receiving unit 112 and an analysis unit 113 for analyzing a registered document, a registered charging type confirmation unit 114 and a registration restriction confirmation unit 115 for charging related to registration, and a charging related to registration A registration limit storage unit 116 for storing related information, an analysis billing type confirmation unit 117 for charging for analysis, and an analysis limit check Section 118, analysis restriction storage section 119 for storing information related to billing relating to analysis, charge calculation section 120 for calculating charges for a predetermined period, and information related to charge calculation Accounting information storage unit 121.

登録受付部１０１は、ユーザからライセンスＩＤ１２２、登録条件１２３、及び文書１２４を受付けて、テキスト解析部１０２、概念辞書作成部１０３、文書索引作成部１０４、及び属性情報作成部１０５を制御し、かつ制御に係るデータを送受信することにより、登録条件１２３に従って、文書１２４を文書記憶部１０６に格納し、文書１２４から分析部１１３において分析に必要となる情報を作成して、テキスト解析結果記憶部１０７、概念辞書記憶部１０８、文書索引記憶部１０９、属性情報記憶部１１０、及び管理情報記憶部１１１に格納する。 The registration receiving unit 101 receives a license ID 122, a registration condition 123, and a document 124 from a user, controls the text analysis unit 102, the concept dictionary creation unit 103, the document index creation unit 104, and the attribute information creation unit 105, and By transmitting and receiving data related to the control, the document 124 is stored in the document storage unit 106 in accordance with the registration condition 123, information necessary for analysis is created from the document 124 in the analysis unit 113, and the text analysis result storage unit 107 is stored. And stored in the concept dictionary storage unit 108, the document index storage unit 109, the attribute information storage unit 110, and the management information storage unit 111.

また、さらに登録受付部１０１は、登録課金種別確認部１１４、登録制限確認部１１５を制御し、かつ制御に係るデータを送受信することにより、登録制限記憶部１１６に格納されたライセンスＩＤ１２２に係る課金種別を確認し、課金種別と登録処理の使用状況に応じて、従量制の課金を行うために必要な値をカウントし、登録制限記憶部１１６に格納する。 Further, the registration receiving unit 101 controls the registration charge type confirmation unit 114 and the registration restriction confirmation unit 115, and transmits / receives data related to the control, so that the charging related to the license ID 122 stored in the registration restriction storage unit 116 is performed. The type is confirmed, and a value necessary for performing metered charging is counted according to the charging type and the usage status of the registration process, and stored in the registration restriction storage unit 116.

ライセンスＩＤ１２２は、テキストマイニングシステムを使用するライセンス保持を確認するためのＩＤであり、例えば、テキストマイニングシステムの使用契約を結ぶある通信販売会社のマーケッティング部門に割り振られたライセンスＩＤである。登録条件１２３は、既存の文書、又は既存の概念辞書を使用するかどうかを確認するための条件であり、図２において後述する。文書１２４は、分析したい文書の集まりであり、例えばお歳暮に関するアンケートを集めたもの、特定商品に係る修理依頼書を集めたものなどである。 The license ID 122 is an ID for confirming license holding for using the text mining system. For example, the license ID 122 is a license ID assigned to a marketing department of a mail order company that concludes a use contract for the text mining system. The registration condition 123 is a condition for confirming whether to use an existing document or an existing concept dictionary, and will be described later with reference to FIG. The document 124 is a collection of documents to be analyzed, such as a collection of questionnaires about year-end gifts, a collection of repair request documents relating to specific products, and the like.

テキスト解析部１０２は、文書１２４に含まれる単語と、前記単語の共起情報（同一文書、同一段落、同一文など、所定範囲で共に出現する２つの単語の組合せ）とを抽出し、テキスト解析結果記憶部１０７に記憶する。概念辞書作成部１０３は、テキスト解析部１０２で抽出された単語と、共起情報から算出した概念ベクトルとを関連付けて概念辞書を作成し、概念辞書記憶部１０８に記憶する。文書索引作成部１０４は、テキスト解析結果記憶部１０７から文書１２４の文書索引情報を作成し、文書索引記憶部１０９に格納する。属性情報作成部１０５は、文書１２４に含まれる各文書に付与される属性を抽出し属性情報を作成し、属性情報記憶部１１０に格納する。 The text analysis unit 102 extracts a word included in the document 124 and co-occurrence information of the word (a combination of two words that appear together in a predetermined range, such as the same document, the same paragraph, and the same sentence), and performs text analysis The result is stored in the result storage unit 107. The concept dictionary creation unit 103 creates a concept dictionary by associating the word extracted by the text analysis unit 102 with the concept vector calculated from the co-occurrence information, and stores the concept dictionary in the concept dictionary storage unit 108. The document index creation unit 104 creates document index information of the document 124 from the text analysis result storage unit 107 and stores it in the document index storage unit 109. The attribute information creation unit 105 extracts attributes assigned to each document included in the document 124, creates attribute information, and stores the attribute information in the attribute information storage unit 110.

なお、登録受付部１０１は、前述のライセンスＩＤ１２２、文書記憶部１０６に格納した文書、概念辞書記憶部１０８に格納した概念辞書、属性情報記憶部１１０に格納した属性情報を関連付けて、管理情報記憶部１１１に記憶する。 The registration receiving unit 101 associates the license ID 122, the document stored in the document storage unit 106, the concept dictionary stored in the concept dictionary storage unit 108, and the attribute information stored in the attribute information storage unit 110, and stores the management information storage. Store in the unit 111.

登録課金種別確認部１１４は、登録制限記憶部１１６に格納されたライセンスＩＤ１２２に係る課金種別を取得する。登録制限確認部１１５は、登録受付部１０１が制御して行う登録処理の流れの中で、登録処理の使用状況と課金種別に応じて、従量制の課金を行うために必要な値をカウントし、登録制限記憶部１１６に格納する。 The registered billing type confirmation unit 114 acquires the billing type related to the license ID 122 stored in the registration restriction storage unit 116. The registration restriction confirmation unit 115 counts a value necessary to perform metered charging according to the usage status of the registration processing and the charging type in the flow of the registration processing controlled by the registration receiving unit 101. And stored in the registration restriction storage unit 116.

分析受付部１１２は、ユーザからユーザＩＤ１２５、及び分析条件１２６を受付けて、分析部１１３を制御し、かつ制御に係るデータを送受信することにより、分析条件１２６で指定された文書を、分析条件１２６で指定された方法で分析して、分析結果１２７を出力する。 The analysis reception unit 112 receives the user ID 125 and the analysis condition 126 from the user, controls the analysis unit 113, and transmits / receives data related to the control, thereby analyzing the document specified by the analysis condition 126. And the analysis result 127 is output.

また、さらに分析受付部１１２は、ユーザＩＤ１２５に対応するライセンスＩＤ１２２を管理情報記憶部１１１から取得し、分析課金種別確認部１１７、分析制限確認部１１８を制御し、かつ制御に係るデータを送受信することにより、分析制限記憶部１１９に格納されたライセンスＩＤ１２２に係る課金種別を確認し、課金種別と分析処理の使用状況に応じて、従量制の課金を行うために必要な値をカウントし、分析制限記憶部１１９に格納する。 Further, the analysis accepting unit 112 acquires the license ID 122 corresponding to the user ID 125 from the management information storage unit 111, controls the analysis charging type confirmation unit 117 and the analysis restriction confirmation unit 118, and transmits / receives data related to the control. Thus, the charge type related to the license ID 122 stored in the analysis restriction storage unit 119 is confirmed, and the value necessary for performing metered charge is counted according to the charge type and the usage status of the analysis process, and the analysis is performed. Store in the restriction storage unit 119.

ユーザＩＤ１２６は、ライセンスＩＤ１２２ごとに割り振られた分析者を確認するためのＩＤであり、例えば、テキストマイニングシステムの使用契約を結ぶある通信販売会社のマーケッティング部門のＡさんに割り振られたライセンスＩＤである。分析条件１２６は、分析対象の文書、分析手法、及び分析の軸となる属性又はキーワードなどの情報を指定する条件である。分析結果１２７には、例えば、主要キーワードの抽出、属性ごとの傾向分析、時系列な傾向分析などを行った結果が出力される。 The user ID 126 is an ID for confirming an analyst assigned for each license ID 122. For example, the user ID 126 is a license ID assigned to Mr. A of a marketing department of a certain mail-order company that concludes a use contract for a text mining system. . The analysis condition 126 is a condition for designating information such as a document to be analyzed, an analysis technique, and an attribute or keyword that is an axis of analysis. As the analysis result 127, for example, a result of performing extraction of main keywords, trend analysis for each attribute, time-series trend analysis, and the like is output.

分析受付部１１２は、分析受付部で受付けたユーザＩＤ１２５に対応するライセンスＩＤを取得し、分析部１１３に分析を依頼する。分析部１１３は、管理情報記憶部１１１を参照して、分析部１１３から受付けたライセンスＩＤと、分析条件１２６で指定された文書とに関連付けされた概念辞書、文書索引情報、及び属性情報を、各々概念辞書記憶部１０８、文書索引記憶部１０９、及び属性情報記憶部１１０から取得して、これらを用いて分析を行い、分析受付部１１２を介して分析結果１２７を出力する。 The analysis reception unit 112 acquires a license ID corresponding to the user ID 125 received by the analysis reception unit, and requests the analysis unit 113 to perform analysis. The analysis unit 113 refers to the management information storage unit 111 to obtain the license ID received from the analysis unit 113 and the concept dictionary, document index information, and attribute information associated with the document specified by the analysis condition 126. Each of them is acquired from the concept dictionary storage unit 108, the document index storage unit 109, and the attribute information storage unit 110, analyzed using these, and the analysis result 127 is output via the analysis reception unit 112.

分析課金種別確認部１１７は、分析制限記憶部１１９に格納されたライセンスＩＤ１２２に係る課金種別を取得する。分析制限確認部１１８は、分析受付部１１２が制御して行う分析処理の流れの中で、課金種別に応じて、従量制の課金を行うために必要な値をカウントし、分析制限記憶部１１９に格納する。 The analysis billing type confirmation unit 117 acquires the billing type related to the license ID 122 stored in the analysis restriction storage unit 119. The analysis restriction confirmation unit 118 counts a value necessary for performing metered charge according to the charge type in the flow of analysis processing controlled by the analysis reception unit 112, and the analysis restriction storage unit 119. To store.

料金算定部１２０は、テキストマイニングシステムの管理者からの指示を受付け、又は予め定められた期日に自動的に、登録制限記憶部１１６及び分析制限記憶部１１９に格納した登録処理、及び分析処理の使用状況と、課金情報記憶部１２１に格納した料金単価とにより、ライセンスＩＤ１２２ごとの料金１２８を算定する。 The fee calculation unit 120 receives an instruction from the administrator of the text mining system, or automatically performs registration processing and analysis processing stored in the registration restriction storage unit 116 and the analysis restriction storage unit 119 on a predetermined date. A charge 128 for each license ID 122 is calculated based on the usage status and the charge unit price stored in the charge information storage unit 121.

以下、図２を用いて、図１の登録条件１２３について説明する。図１を用いて説明したように、本実施の形態におけるテキストマイニングシステムは、登録受付部１０１の制御により、文書から概念辞書と文書索引情報と属性情報とを作成し、文書と関連付けて格納するという登録処理と、分析受付部１１２の制御により、概念辞書と文書索引情報と属性情報とを用いて分析する分析処理と、料金算定部１２０により料金算定処理とを行うものである。しかし、登録条件の指定によっては、すべてのデータを新規に作成するのではなく、既存のデータを流用することができる。登録条件１２３には、例えば、新規文書登録かつ新規概念辞書作成、新規文書登録かつ既存概念辞書流用、及び追加文書登録かつ既存概念辞書流用がある。 Hereinafter, the registration condition 123 of FIG. 1 will be described with reference to FIG. As described with reference to FIG. 1, the text mining system according to the present embodiment creates a concept dictionary, document index information, and attribute information from a document under the control of the registration receiving unit 101, and stores the concept dictionary in association with the document. Registration processing, analysis processing that analyzes using the concept dictionary, document index information, and attribute information under the control of the analysis reception unit 112, and fee calculation processing by the fee calculation unit 120. However, depending on the designation of registration conditions, existing data can be used instead of creating all data newly. The registration conditions 123 include, for example, new document registration and new concept dictionary creation, new document registration and existing concept dictionary diversion, and additional document registration and existing concept dictionary diversion.

登録条件１２３が新規文書登録かつ新規概念辞書作成の場合は、登録受付部１０１は受付けた文書１２４を文書記憶部１０６に格納し、テキスト解析部１０２は、文書１２４を解析した結果をテキスト解析結果記憶部１０７に格納し、概念辞書作成部１０３は、テキスト解析結果記憶部１０７に格納したテキスト解析結果から、概念辞書を作成して概念辞書記憶部１０８に格納し、文書索引作成部１０４は、テキスト解析結果記憶部１０７に格納したテキスト解析結果から、文書索引情報を作成して文書索引記憶部１０９に格納し、属性情報記憶部１１０は、文書１２３から属性情報を作成して属性情報記憶部１１０に格納し、新規の文書１２３と、新規の概念辞書と、新規の文書索引情報と、新規の属性情報との関係を管理情報記憶部１１１に格納し、これらのデータが後の分析処理に使用される。例えば図２の行２０１に例を示すが、通常は、アンケート１文書から、アンケート１辞書を作成し、後の分析処理では、アンケート１文書の分析にアンケート１辞書を使用する。 When the registration condition 123 is new document registration and creation of a new concept dictionary, the registration receiving unit 101 stores the received document 124 in the document storage unit 106, and the text analysis unit 102 analyzes the result of analyzing the document 124 as a text analysis result. The concept dictionary creation unit 103 creates a concept dictionary from the text analysis result stored in the text analysis result storage unit 107 and stores it in the concept dictionary storage unit 108. The document index creation unit 104 Document index information is created from the text analysis result stored in the text analysis result storage unit 107 and stored in the document index storage unit 109, and the attribute information storage unit 110 creates attribute information from the document 123 and generates attribute information storage unit. The management information storage unit 11 stores the relationship among the new document 123, the new concept dictionary, the new document index information, and the new attribute information. Stored, these data are used for analysis processing after. For example, as shown in the row 201 of FIG. 2, normally, a questionnaire 1 dictionary is created from a questionnaire 1 document, and the questionnaire 1 dictionary is used for analysis of the questionnaire 1 document in later analysis processing.

登録条件１２３が新規文書登録かつ既存概念辞書作成の場合は、登録受付部１０１は受付けた文書１２４を文書記憶部１０６に格納し、テキスト解析部１０２は、文書１２４を解析した結果をテキスト解析結果記憶部１０７に格納し、概念辞書作成部１０３は概念辞書を作成せずに、代わって登録受付部１０１が既存の概念辞書の指定を受付け、文書索引作成部１０４は、テキスト解析結果記憶部１０７に格納したテキスト解析結果から、文書索引情報を作成して文書索引記憶部１０９に格納し、属性情報記憶部１１０は、文書１２３から属性情報を作成して属性情報記憶部１１０に格納し、新規の文書１２３と、既存の概念辞書と、新規の文書索引情報と、新規の属性情報との関係を管理情報記憶部１１１に格納し、これらのデータが後の分析処理に使用される。例えば図２の行２０２に例を示すが、アンケート１文書とアンケート２文書とが、比較的似通った内容であると予想される場合には、時間又は概念辞書記憶部１０８の記憶容量を節約するために、概念辞書は作成せずに、後の分析処理では、アンケート１文書の分析にアンケート１辞書を使用する。 When the registration condition 123 is new document registration and creation of an existing concept dictionary, the registration receiving unit 101 stores the received document 124 in the document storage unit 106, and the text analysis unit 102 analyzes the result of analyzing the document 124 as a text analysis result. Instead, the concept dictionary creation unit 103 does not create a concept dictionary. Instead, the registration accepting unit 101 accepts designation of an existing concept dictionary, and the document index creation unit 104 performs text analysis result storage unit 107. The document index information is created from the text analysis result stored in the document index and stored in the document index storage unit 109. The attribute information storage unit 110 creates attribute information from the document 123 and stores it in the attribute information storage unit 110. The relationship between the document 123, the existing concept dictionary, the new document index information, and the new attribute information is stored in the management information storage unit 111. It is used to analyze the process. For example, as shown in the row 202 of FIG. 2, when it is expected that the questionnaire 1 document and the questionnaire 2 document have relatively similar contents, the time or the storage capacity of the concept dictionary storage unit 108 is saved. Therefore, without creating a concept dictionary, the questionnaire 1 dictionary is used for analysis of the questionnaire 1 document in later analysis processing.

登録条件１２３が追加文書登録かつ既存概念辞書作成の場合は、登録受付部１０１は、指定された既存の文書に、新規に受付けた文書１２４を追加した文書を文書記憶部１０６に格納し、テキスト解析部１０２は、追加した文書を解析した結果をテキスト解析結果記憶部１０７に格納し、概念辞書作成部１０３は概念辞書を作成せずに、代わって既存の文書が使用していた既存の概念辞書と関連付け、文書索引作成部１０４は、テキスト解析結果記憶部１０７に格納したテキスト解析結果から、文書索引情報を作成して文書索引記憶部１０９に格納し、属性情報記憶部１１０は、追加した文書から属性情報を作成して属性情報記憶部１１０に格納し、追加した文書と、既存の概念辞書と、新規の文書索引情報と、新規の属性情報との関係を管理情報記憶部１１１に格納し、これらのデータが後の分析処理に使用される。例えば図２の行２０２に例を示すが、アンケート１文書とアンケート２文書とが、比較的似通った内容であると予想される場合には、時間又は概念辞書記憶部１０８の記憶容量を節約するために、概念辞書は作成せずに、後の分析処理では、アンケート１文書とアンケート２文書との分析にアンケート１辞書を使用する。 When the registration condition 123 is additional document registration and creation of an existing concept dictionary, the registration receiving unit 101 stores a document in which the newly received document 124 is added to the specified existing document in the document storage unit 106, and the text The analysis unit 102 stores the analysis result of the added document in the text analysis result storage unit 107, and the concept dictionary creation unit 103 does not create the concept dictionary, but instead uses the existing concept used by the existing document. In association with the dictionary, the document index creation unit 104 creates document index information from the text analysis result stored in the text analysis result storage unit 107 and stores it in the document index storage unit 109, and the attribute information storage unit 110 adds Attribute information is created from the document and stored in the attribute information storage unit 110, and the relationship between the added document, the existing concept dictionary, the new document index information, and the new attribute information is managed. Stored in the information storage unit 111, these data are used for analysis processing after. For example, as shown in the row 202 of FIG. 2, when it is expected that the questionnaire 1 document and the questionnaire 2 document have relatively similar contents, the time or the storage capacity of the concept dictionary storage unit 108 is saved. Therefore, the concept dictionary is not created, and the questionnaire 1 dictionary is used for the analysis of the questionnaire 1 document and the questionnaire 2 document in the later analysis processing.

次に、図３から図８までを用いて、各記憶部が格納するデータの内容を説明する。文書記憶部１０６は、複数の文書を格納し、通常は、１回の登録処理で受付けた文書を、ひとまとまりの文書として格納する。ただし、登録条件１２３の指定によっては、図２を用いて説明したように、既存の文書に新規の文書を追加して格納することもある。 Next, the contents of data stored in each storage unit will be described with reference to FIGS. The document storage unit 106 stores a plurality of documents, and normally stores the documents received in one registration process as a group of documents. However, depending on the designation of the registration condition 123, a new document may be added to an existing document and stored as described with reference to FIG.

テキスト解析結果記憶部１０７は、登録対象の文書（前述のように、新規の文書の場合は、既存の文書に新規の文書を追加した場合がある。）に含まれる単語と、単語の共起情報（同一文書、同一段落、同一文など、所定範囲で共に出現する２つの単語の組合せ）とを格納する。なお、テキスト解析結果記憶部１０７は一時的な記憶領域であり、１回の登録処理完了後、その内容は消去される。 The text analysis result storage unit 107 includes words included in a registration target document (in the case of a new document, a new document may be added to the existing document as described above), and the word co-occurrence. Information (a combination of two words that appear together in a predetermined range, such as the same document, the same paragraph, and the same sentence) is stored. Note that the text analysis result storage unit 107 is a temporary storage area, and its contents are deleted after one registration process is completed.

概念辞書記憶部１０８は、複数の概念辞書を格納する。図３は、ひとつの概念辞書の例である。列３０１は登録対象の文書に含まれる単語であり、列３０２以降は、前述の共起情報をＳＶＤ（特異値分解）という手法を用いて圧縮した概念ベクトル３０６である。 The concept dictionary storage unit 108 stores a plurality of concept dictionaries. FIG. 3 is an example of one concept dictionary. A column 301 is a word included in a document to be registered, and a column 302 and subsequent columns are concept vectors 306 obtained by compressing the above-described co-occurrence information using a technique called SVD (singular value decomposition).

文書索引記憶部１０９は、複数の文書索引情報を格納する。ひとつの文書索引情報は、ひとつの登録対象の文書に対応して作成される。また、ひとつの文書索引情報は、ひとつの登録対象の文書１２４に含まれる、さらに細かなレベルの文書各々について、文書索引情報を関連付けたものである。 The document index storage unit 109 stores a plurality of document index information. One document index information is created corresponding to one registration target document. Further, one document index information is obtained by associating document index information with respect to each of documents at a finer level included in one document 124 to be registered.

属性情報記憶部１１０は、複数の属性情報を格納する。ひとつの属性情報は、ひとつの登録対象の文書に対応して作成される。また、ひとつの属性情報は、ひとつの登録対象の文書に含まれる、さらに細かなレベルの文書各々について付与された属性情報をまとめたものである。 The attribute information storage unit 110 stores a plurality of attribute information. One piece of attribute information is created corresponding to one document to be registered. In addition, one attribute information is a collection of attribute information assigned to each document at a finer level included in one document to be registered.

管理情報記憶部１１１は、前述のようにして、ライセンスＩＤ１２２と、作成された登録対象の文書、概念辞書、文書索引情報、及び属性情報などの登録物の関連情報を格納する。図４は、登録物の関連情報の例であり、ライセンスＩＤ４０１、文書名４０２、概念辞書名４０３、文書索引情報名４０４、及び属性情報名４０５を保有する。 As described above, the management information storage unit 111 stores the license ID 122 and the related information of the registration object such as the created registration target document, concept dictionary, document index information, and attribute information. FIG. 4 is an example of the related information of the registered item, and has a license ID 401, a document name 402, a concept dictionary name 403, a document index information name 404, and an attribute information name 405.

さらに、管理情報記憶部１１１は、ライセンス情報を格納する。図５は、ライセンス情報の例である。ライセンス情報は、ライセンスＩＤ５０１、及びユーザＩＤ５０２を関連付けて保有する。その他に、ライセンスＩＤに係るライセンス相手の名称、パスワードなどの属性情報、ユーザＩＤに係る分析者の名称、パスワードなどの属性情報を保有していてもよい。 Furthermore, the management information storage unit 111 stores license information. FIG. 5 is an example of license information. The license information is stored in association with the license ID 501 and the user ID 502. In addition, attribute information such as the name of the license partner related to the license ID, attribute information such as a password, the name of an analyst related to the user ID, and password may be held.

登録制限記憶部１１６は、従量制課金の従量である現状値、及び使用制限を行うための制限値を格納する。図６は、登録制限記憶部１１６のデータ例であり、ライセンスＩＤ６０１、課金種別６０２、現状値６０３、及び制限値６０３を保有する。本実施の形態では、ひとつのライセンスＩＤ６０１に対して、ひとつの課金種別６０２、現状値６０３、及び制限値６０４が定義される。ライセンスＩＤ６０１、及び課金種別６０２の値は、予め設定されている。現状値６０３には、登録処理の流れの中で、登録制限確認部１１５により課金種別６０２と登録処理の使用状況に応じてカウントされた値が常時更新される。制限値６０４の値は、予め設定されており、現状値６０３が制限値を超えると、登録処理の使用が制限される。なお、制限値６０４には、あえて無制限を指定することもできる。この場合には、登録処理の使用は制限されない。 The registration restriction storage unit 116 stores a current value, which is a metered amount for metered billing, and a limit value for restricting use. FIG. 6 is an example of data stored in the registration restriction storage unit 116, and has a license ID 601, a charge type 602, a current value 603, and a restriction value 603. In the present embodiment, one accounting type 602, current value 603, and limit value 604 are defined for one license ID 601. The values of the license ID 601 and the charge type 602 are set in advance. In the current value 603, the value counted by the registration restriction confirmation unit 115 according to the charging type 602 and the usage status of the registration process is constantly updated in the flow of the registration process. The value of the limit value 604 is set in advance, and when the current value 603 exceeds the limit value, use of the registration process is limited. It should be noted that the limit value 604 can be specified without limitation. In this case, use of the registration process is not limited.

分析制限記憶部１１９は、従量制課金の従量である現状値、及び使用制限を行うための制限値を格納する。図７は、分析制限記憶部１１９のデータ例であり、登録制限記憶部１１６と同様に、ライセンスＩＤ６０１、課金種別６０２、現状値６０３、及び制限値６０３を保有するが、課金種別６０２については、分析処理に適した課金種別が設定される。本実施の形態では、ひとつのライセンスＩＤ６０１に対して、ひとつの課金種別６０２、現状値６０３、及び制限値６０４が定義される。ライセンスＩＤ６０１、及び課金種別６０２の値は、予め設定されている。現状値６０３には、分析処理の流れの中で、分析制限確認部１１８により課金種別６０２と登録処理の使用状況に応じてカウントされた値が常時更新される。制限値６０４の値は、予め設定されており、現状値６０３が制限値を超えると、分析処理の使用が制限される。 The analysis restriction storage unit 119 stores a current value that is a pay-as-you-go charge for pay-per-use and a limit value for restricting use. FIG. 7 is a data example of the analysis restriction storage unit 119, and similarly to the registration restriction storage unit 116, it holds a license ID 601, a charge type 602, a current value 603, and a limit value 603, but for the charge type 602, A billing type suitable for analysis processing is set. In the present embodiment, one accounting type 602, current value 603, and limit value 604 are defined for one license ID 601. The values of the license ID 601 and the charge type 602 are set in advance. The current value 603 is constantly updated with a value counted according to the charge type 602 and the usage status of the registration process by the analysis restriction confirmation unit 118 in the flow of the analysis process. The value of the limit value 604 is set in advance, and when the current value 603 exceeds the limit value, use of the analysis process is limited.

課金情報記憶部１２１は、料金算定を行うための料金単価を格納する。図８は、課金情報記憶部１２１のデータ例であり、課金対象処理８０１、課金種別６０２、及び料金単価８０３を保有する。課金種別６０２ごとに、課金対象処理８０１及び料金単価８０３が予め設定される。課金対象処理８０１は、該当する課金種別６０２が、登録処理及び分析処理いずれの使用状況に係る者であるかを設定する。料金単価８０３は、該当する課金種別６０２について、登録制限記憶部１１６の現状値６０３、及び分析制限記憶部１１９の現状値６０３の１カウントについての、料金単価を設定する。 The billing information storage unit 121 stores a unit price for fee calculation. FIG. 8 shows an example of data stored in the billing information storage unit 121, which includes a billing target process 801, a billing type 602, and a bill unit price 803. For each charge type 602, a charge target process 801 and a charge unit price 803 are set in advance. The billing target process 801 sets whether the corresponding billing type 602 is a person related to the usage status of the registration process or the analysis process. The charge unit price 803 sets the charge unit price for one count of the current value 603 of the registration restriction storage unit 116 and the current value 603 of the analysis restriction storage unit 119 for the corresponding charge type 602.

課金種別６０２は、従量制課金の従量とする対象の種別を定義するものであり、予め定められた処理の実行回数、作成物の個数、又は作成物の量などが考えられる。さらに具体的な例を上げれば、文書登録回数、文書追加回数、概念辞書作成回数、文書量、単語数、概念辞書容量、概念辞書業務範囲、分析実行回数などがある。 The charge type 602 defines the type of object to be used as a pay-as-you-go charge for pay-as-you-go billing, and may be a predetermined number of processing executions, the number of created products, or the amount of created products. More specific examples include document registration count, document addition count, concept dictionary creation count, document amount, word count, concept dictionary capacity, concept dictionary work range, and analysis execution count.

文書登録回数とは、新規の文書１２４を登録する回数である。文書追加回数とは、既存の文書に新規の文書１２４を追加する回数である。概念辞書作成回数とは、新規の概念辞書を作成する回数である。文書量とは、新規の文書１２４に含まれる語数である。単語数とは、新規の文書１２４に含まれる語数である。概念辞書容量とは、新規の概念辞書の容量である。単位はＭＢなど、予め定められている。概念辞書業務範囲とは、新規の概念辞書が、該当ライセンスＩＤ１２２に係る既存の概念辞書と比較した時に、異なる業務範囲に係る概念辞書であるか否かの判断であり、詳しくは後述する。分析実行回数とは、分析を行う回数である。 The number of document registrations is the number of times a new document 124 is registered. The document addition count is the number of times a new document 124 is added to an existing document. The concept dictionary creation count is the number of times a new concept dictionary is created. The document amount is the number of words included in the new document 124. The number of words is the number of words included in the new document 124. The concept dictionary capacity is the capacity of a new concept dictionary. The unit is predetermined such as MB. The concept dictionary work range is a determination as to whether or not the new concept dictionary is a concept dictionary related to a different work range when compared with the existing concept dictionary related to the corresponding license ID 122, and will be described in detail later. The number of times of analysis execution is the number of times of analysis.

以下、図９を用いて、本実施の形態におけるテキストマイニングシステムのＨ／Ｗ構成を説明する。テキストマイニングシステムは、バスで接続された外部記憶装置９０１、ＣＰＵ（中央処理装置）９０２、メインメモリ（主記憶装置）９０３、入力装置９０４、表示装置９０５、及び通信装置９０６から構成される。テキストマイニングシステムの文書記憶部１０６、概念辞書記憶部１０８、文書索引記憶部１０９、属性情報記憶部１１０、管理情報記憶部１１１、登録制限記憶部１１６、分析制限記憶部１１９、及び課金情報記憶部１２１などの各記憶部は、例えばハードディスク、フロッピー（登録商標）ディスク、ＭＯ、ＣＤ、ＤＶＤ、磁気テープなどの外部記憶装置９０１で構成される。また、テキスト解析結果記憶部１０７、及びプログラムの実行に必要となる一時的な記憶領域は、例えばＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）などのメインメモリ（主記憶装置）９０３、又は外部記憶装置９０１で構成される。登録受付部１０１、テキスト解析部１０２、概念辞書作成部１０３、文書索引作成部１０４、属性情報作成部１０５、分析受付部１１２、分析部１１３、登録課金種別確認部１１４、登録制限確認部１１５、分析課金種別確認部１１７、分析制限確認部１１８、料金算定部１２０は、外部記憶装置９０１、メインメモリ（主記憶装置）９０３、及びＣＰＵ（中央処理装置）９０２で構成され、外部記憶装置９０１からメインメモリ（主記憶装置）９０３にロードされたデータ処理に係わるプログラムがＣＰＵ（中央処理装置）９０２に順次読み込まれて処理を行う。また、各処理部は、例えばマウス、キーボードなどの入力装置９０４、ＬＡＮやＷＡＮなどに接続された通信装置９０６、例えばディスプレイなどの表示装置８０５などを介して、入力を受付けたり、出力を行うことができる。 Hereinafter, the H / W configuration of the text mining system according to the present embodiment will be described with reference to FIG. The text mining system includes an external storage device 901, a CPU (central processing unit) 902, a main memory (main storage device) 903, an input device 904, a display device 905, and a communication device 906 connected by a bus. Document storage unit 106, concept dictionary storage unit 108, document index storage unit 109, attribute information storage unit 110, management information storage unit 111, registration restriction storage unit 116, analysis restriction storage unit 119, and accounting information storage unit of the text mining system Each storage unit 121 and the like includes an external storage device 901 such as a hard disk, a floppy (registered trademark) disk, an MO, a CD, a DVD, and a magnetic tape. The text analysis result storage unit 107 and a temporary storage area necessary for executing the program are configured by a main memory (main storage device) 903 such as a RAM (Random Access Memory), or an external storage device 901, for example. The Registration accepting unit 101, text analyzing unit 102, concept dictionary creating unit 103, document index creating unit 104, attribute information creating unit 105, analysis accepting unit 112, analysis unit 113, registered billing type confirmation unit 114, registration restriction confirmation unit 115, The analysis billing type confirmation unit 117, the analysis restriction confirmation unit 118, and the charge calculation unit 120 are configured by an external storage device 901, a main memory (main storage device) 903, and a CPU (central processing unit) 902. Programs related to data processing loaded in the main memory (main storage device) 903 are sequentially read into the CPU (central processing unit) 902 for processing. Each processing unit accepts input or outputs via an input device 904 such as a mouse or a keyboard, a communication device 906 connected to a LAN or WAN, for example, a display device 805 such as a display, or the like. Can do.

以下、図１０から図１２までを用いて、本実施の形態における登録処理の流れを説明する。登録受付部１０１は、ライセンスＩＤ１２２を受付ける（ステップＳ１００１）。登録受付部１０１は、管理情報記憶部１１１を参照して、該当ライセンスＩＤ１２２の存在を確認する（ステップＳ１００２）。なお、該当ライセンスＩＤ１２２が存在しなければ、登録処理を中断し、エラーの旨を出力する。登録課金種別確認部１１４は、登録制限記憶部１１６を参照して、該当ライセンスＩＤ１２２の登録に係る課金種別６０２を取得する（ステップＳ１００３）。登録受付部１０１は、ユーザから、登録条件１２３の指定、文書１２４を受付ける（ステップＳ１００４）。 Hereinafter, the flow of registration processing in the present embodiment will be described with reference to FIGS. The registration receiving unit 101 receives the license ID 122 (step S1001). The registration receiving unit 101 refers to the management information storage unit 111 and confirms the presence of the corresponding license ID 122 (step S1002). If the corresponding license ID 122 does not exist, the registration process is interrupted and an error message is output. The registered billing type confirmation unit 114 refers to the registration restriction storage unit 116 and acquires the billing type 602 related to the registration of the corresponding license ID 122 (step S1003). The registration accepting unit 101 accepts the designation of the registration condition 123 and the document 124 from the user (step S1004).

登録制限確認部１１５は、課金種別６０２が文書登録回数であり、登録条件１２３が新規文書登録を指定している場合、登録制限記憶部１１６の該当するライセンスＩＤ１２２の現状値６０３に１を加算する。登録制限確認部１１５は、課金種別６０２が文書追加回数であり、登録条件１２３が追加文書登録を指定している場合、登録制限記憶部１１６の該当するライセンスＩＤ１２２の現状値６０３に１を加算する。登録制限確認部１１５は、課金種別６０２が概念辞書作成回数であり、登録条件１２３が新規概念辞書作成を指定している場合、登録制限記憶部１１６の該当するライセンスＩＤ１２２の現状値６０３に１を加算する。登録制限確認部１１５は、課金種別６０２が文書量である場合、文書１２４の文字数をカウントし、登録制限記憶部１１６の該当するライセンスＩＤ１２２の現状値６０３に文字数を加算する。（ステップＳ１００５）。なお、ここで加算後の現状値６０３が制限値６０４を超えた場合は、現状値６０３を元に戻し、登録処理を中断し、エラーの旨を出力する。 The registration restriction confirmation unit 115 adds 1 to the current value 603 of the corresponding license ID 122 in the registration restriction storage unit 116 when the charge type 602 is the number of document registrations and the registration condition 123 specifies new document registration. . The registration restriction confirmation unit 115 adds 1 to the current value 603 of the corresponding license ID 122 in the registration restriction storage unit 116 when the charge type 602 is the number of times of document addition and the registration condition 123 specifies additional document registration. . The registration restriction confirmation unit 115 sets the current value 603 of the corresponding license ID 122 in the registration restriction storage unit 1 to 1 when the accounting type 602 is the number of times of concept dictionary creation and the registration condition 123 designates creation of a new concept dictionary. to add. When the charge type 602 is the document amount, the registration restriction confirmation unit 115 counts the number of characters of the document 124 and adds the number of characters to the current value 603 of the corresponding license ID 122 in the registration restriction storage unit 116. (Step S1005). If the current value 603 after the addition exceeds the limit value 604, the current value 603 is restored, the registration process is interrupted, and an error message is output.

登録受付部１０１は、文書記憶部１０６に文書１２４を保管する（ステップＳ１００６）。なお、登録条件１２３が追加文書登録を指定している場合には、指定された既存の文書に文書１２４を追加する。 The registration receiving unit 101 stores the document 124 in the document storage unit 106 (step S1006). When the registration condition 123 specifies additional document registration, the document 124 is added to the specified existing document.

テキスト解析部１０２は、該当文書１２４に含まれるテキストを解析し、単語に分割し、共起情報を作成する（ステップＳ１１０１）。 The text analysis unit 102 analyzes the text included in the document 124, divides it into words, and creates co-occurrence information (step S1101).

登録制限確認部１１５は、課金種別６０２が単語数である場合、テキスト解析部１０２がカウントした単語数を取得し、登録制限記憶部１１６の該当するライセンスＩＤ１２２の現状値６０３に単語数を加算する。（ステップＳ１１０２）。なお、ここで加算後の現状値６０３が制限値６０４を超えた場合は、現状値６０３を元に戻し、文書記憶部１０６に登録された文書も元に戻し、登録処理を中断し、エラーの旨を出力する。 When the charge type 602 is the number of words, the registration restriction confirmation unit 115 acquires the number of words counted by the text analysis unit 102 and adds the number of words to the current value 603 of the corresponding license ID 122 in the registration restriction storage unit 116. . (Step S1102). If the current value 603 after addition exceeds the limit value 604, the current value 603 is restored, the document registered in the document storage unit 106 is also restored, the registration process is interrupted, and an error Outputs the effect.

登録制限確認部１１５は、課金種別６０２が概念辞書容量であり、登録条件１２３が新規概念辞書作成を指定している場合、テキスト解析部１０２がカウントした単語数を取得し、単語数に予め定められた数値を乗算することにより、概念辞書容量の推測値を算出し、登録制限記憶部１１６の該当するライセンスＩＤ１２２の現状値６０２に推測値を加算して、制限値６０４を超える場合には、その旨出力する。（ステップＳ１１０３）。なお、この時、登録処理の継続又は中断の判断を受付けて、中断を受付けた場合には、文書記憶部１０６に登録された文書も元に戻し、登録処理を中断し、エラーの旨を出力する。 When the charge type 602 is the concept dictionary capacity and the registration condition 123 specifies creation of a new concept dictionary, the registration restriction confirmation unit 115 acquires the number of words counted by the text analysis unit 102 and sets the number of words in advance. When the estimated value of the conceptual dictionary capacity is calculated by multiplying the calculated numerical value and the estimated value is added to the current value 602 of the corresponding license ID 122 in the registration restriction storage unit 116 to exceed the limit value 604, Output to that effect. (Step S1103). At this time, if the registration process is continued or interrupted, and the interrupt is accepted, the document registered in the document storage unit 106 is also restored, the registration process is interrupted, and an error message is output. To do.

テキスト解析部１０２は、解析結果をテキスト解析結果記憶部１０７に格納する（ステップＳ１１０４）。 The text analysis unit 102 stores the analysis result in the text analysis result storage unit 107 (step S1104).

概念辞書作成部１０３は、テキスト解析結果から、各単語の概念ベクトルを計算して概念辞書を作成する（ステップＳ１１０５）。なお、登録条件１２３が既存概念辞書流用を指定している場合には、概念辞書は作成せず、ステップＳ１１０５、Ｓ１１０６、ステップＳ１１０７は実行しない。 The concept dictionary creation unit 103 calculates a concept vector of each word from the text analysis result and creates a concept dictionary (step S1105). When the registration condition 123 specifies that an existing concept dictionary is used, a concept dictionary is not created, and steps S1105, S1106, and S1107 are not executed.

登録制限確認部１１５は、課金種別６０２が概念辞書容量であり、登録条件１２３が新規概念辞書作成を指定している場合、概念辞書作成部１０３が作成した概念辞書の容量を取得し、登録制限記憶部１１６の該当するライセンスＩＤ１２２の現状値６０３に概念辞書容量を加算する。登録制限確認部１１５は、課金種別６０２が概念辞書業務範囲であり、登録条件１２３が新規概念辞書作成を指定している場合、概念辞書作成部１０３が作成した概念辞書の業務範囲を、登録された単語の既存の概念辞書との重複割合から判定し、業務範囲が既存の概念辞書いずれとも異なると判定されれば、登録制限記憶部１１６の該当するライセンスＩＤ１２２の現状値６０３に１を加算する（ステップＳ１１０６）。なお、ここで加算後の現状値６０３が制限値６０４を超えた場合は、現状値６０３を元に戻し、文書記憶部１０６に登録された文書も元に戻し、登録処理を中断し、エラーの旨を出力する。なお、このような概念辞書業務範囲判定の方法については、図１５を用いて後述する。 The registration restriction confirmation unit 115 acquires the capacity of the concept dictionary created by the concept dictionary creation unit 103 when the charge type 602 is the concept dictionary capacity and the registration condition 123 designates the creation of a new concept dictionary, and the registration restriction is obtained. The concept dictionary capacity is added to the current value 603 of the corresponding license ID 122 in the storage unit 116. The registration restriction confirmation unit 115 registers the business range of the concept dictionary created by the concept dictionary creation unit 103 when the charge type 602 is the concept dictionary business range and the registration condition 123 specifies creation of a new concept dictionary. 1 is added to the current value 603 of the corresponding license ID 122 of the registration restriction storage unit 116 if it is determined that the business range is different from any of the existing concept dictionaries. (Step S1106). If the current value 603 after addition exceeds the limit value 604, the current value 603 is restored, the document registered in the document storage unit 106 is also restored, the registration process is interrupted, and an error Outputs the effect. A method for determining the concept dictionary work range will be described later with reference to FIG.

概念辞書作成部１０３は、作成した概念辞書を、概念辞書記憶部１０８に格納する（ステップＳ１１０７）。 The concept dictionary creation unit 103 stores the created concept dictionary in the concept dictionary storage unit 108 (step S1107).

文書索引作成部１０４は、テキスト解析結果から文書索引情報を作成し、文書索引記憶部１０９に格納する（ステップＳ１２０１）。属性情報作成部１０５は、文書１２４の属性情報を作成し、属性情報記憶部１１０に格納する（ステップＳ１２０２）。登録受付部１０１は、ライセンスＩＤ４１０と、各記憶部に格納された文書、概念辞書、文書索引、属性情報の各名称を関連付けて、図４に例示するように管理情報記憶部１１１に保管する（ステップＳ１２０３）。登録受付部は、テキスト解析結果記憶部１０７のテキスト解析結果を削除する（ステップＳ１２０４）。 The document index creation unit 104 creates document index information from the text analysis result and stores it in the document index storage unit 109 (step S1201). The attribute information creation unit 105 creates attribute information of the document 124 and stores it in the attribute information storage unit 110 (step S1202). The registration receiving unit 101 associates the license ID 410 with the names of the document, concept dictionary, document index, and attribute information stored in each storage unit, and stores them in the management information storage unit 111 as illustrated in FIG. Step S1203). The registration receiving unit deletes the text analysis result from the text analysis result storage unit 107 (step S1204).

以下、図１３を用いて、本実施の形態における分析処理の流れを説明する。分析受付部１１２は、ユーザＩＤ１２５を受付ける（ステップＳ１３０１）。分析受付部１１２は、管理情報記憶部１１１を参照して、該当ユーザＩＤ１２５と、対応するライセンスＩＤ１２２の存在を確認する（ステップＳ１３０２）。なお、該当ユーザＩＤ１２５が存在しなければ、登録処理を中断する。分析課金種別確認部１１７は、登録制限記憶部１１６を参照して、該当ライセンスＩＤ１２２の分析に係る課金種別６０２を取得する（ステップＳ１３０３）。 Hereinafter, the flow of analysis processing in the present embodiment will be described with reference to FIG. The analysis reception unit 112 receives the user ID 125 (step S1301). The analysis reception unit 112 refers to the management information storage unit 111 and confirms the presence of the corresponding user ID 125 and the corresponding license ID 122 (step S1302). If the user ID 125 does not exist, the registration process is interrupted. The analysis charging type confirmation unit 117 refers to the registration restriction storage unit 116 and acquires the charging type 602 related to the analysis of the corresponding license ID 122 (step S1303).

分析制限確認部１１８は、課金種別６０２が分析実行回数である場合、分析制限記憶部１１９の該当するライセンスＩＤ１２２の現状値６０３に１を加算する（ステップＳ１３０４）。なお、ここで加算後の現状値６０３が制限値６０４を超えた場合は、現状値６０３を元に戻し、分析処理を中断し、エラーの旨を出力する。 If the charge type 602 is the number of times of analysis execution, the analysis restriction confirmation unit 118 adds 1 to the current value 603 of the corresponding license ID 122 in the analysis restriction storage unit 119 (step S1304). If the current value 603 after the addition exceeds the limit value 604, the current value 603 is restored, the analysis process is interrupted, and an error message is output.

分析受付部１１２は、ユーザから分析対象の文書名、及び分析条件１２６（キーワード、属性など）を受付け、分析部１１３に該当分析対象文書１２４の該当分析条件１２６による分析を依頼する（ステップＳ１３０５）。 The analysis reception unit 112 receives the analysis target document name and the analysis condition 126 (keyword, attribute, etc.) from the user, and requests the analysis unit 113 to analyze the corresponding analysis target document 124 according to the corresponding analysis condition 126 (step S1305). .

分析部１１３は、該当分析対象文書１２４に対応する概念辞書、文書索引情報、属性情報を取得して、該当分析条件１２６により分析し、分析結果１２７を出力する（ステップＳ１３０６）。 The analysis unit 113 acquires the concept dictionary, document index information, and attribute information corresponding to the corresponding analysis target document 124, analyzes them according to the corresponding analysis conditions 126, and outputs an analysis result 127 (step S1306).

以下、図１４を用いて、本実施の形態における料金算定処理の流れを説明する。すべてのライセンスＩＤについて、ステップＳ１４０１からステップＳ１４０６までの処理を繰り返す（ステップＳ１４０１）。料金算定部１２０は、登録制限記憶部１１６及び分析制限記憶部１１９から、該当ライセンスＩＤ１２２に係る現状値６０３を取得する（ステップＳ１４０２）。料金算定部１２０は、課金情報記憶部１２１を参照し、登録処理及び分析処理に係る料金を各々算出し、すべて加算し、該当ライセンスＩＤ１２２の料金１２８とする（ステップＳ１４０３）。 Hereinafter, the flow of fee calculation processing in the present embodiment will be described with reference to FIG. The process from step S1401 to step S1406 is repeated for all license IDs (step S1401). The fee calculation unit 120 acquires the current value 603 related to the corresponding license ID 122 from the registration restriction storage unit 116 and the analysis restriction storage unit 119 (step S1402). The fee calculation unit 120 refers to the billing information storage unit 121, calculates the fees related to the registration process and the analysis process, adds them all up, and sets the charge 128 of the corresponding license ID 122 (step S1403).

料金算定部１２０は、料金１２８を出力する（ステップＳ１４０４）。料金算定部１２０は、登録制限記憶部１１６及び分析制限記憶部１１９の該当ライセンスＩＤ１２２に係る現状値６０３を０にする（ステップＳ１４０５）。すべてのライセンスＩＤ１２２について算定していれば、料金算定処理を完了する。それ以外の場合は、ステップＳ１４０１に戻る。 The fee calculation unit 120 outputs the fee 128 (step S1404). The fee calculation unit 120 sets the current value 603 related to the corresponding license ID 122 in the registration restriction storage unit 116 and the analysis restriction storage unit 119 to 0 (step S1405). If all license IDs 122 have been calculated, the fee calculation process is completed. Otherwise, the process returns to step S1401.

以下、図１５を用いて、本実施の形態における概念辞書業務範囲判定処理の流れを説明する。本処理は、図１１のステップＳ１１０６で説明した、課金種別６０２が概念辞書業務範囲である場合に、登録制限確認部１１５が概念辞書業務範囲判定を行う処理の流れである。 Hereinafter, the flow of the concept dictionary work range determination process in the present embodiment will be described with reference to FIG. This processing is a flow of processing in which the registration restriction confirmation unit 115 performs the conceptual dictionary business range determination when the charge type 602 is the conceptual dictionary business range, which has been described in step S1106 of FIG.

すべての既存概念辞書について、ステップＳ１５０１からステップＳ１５０５までの処理を繰り返す（ステップＳ１５０１）。新規の概念辞書と既存の概念辞書との両方に、重複して格納された重複単語数を確認する（ステップＳ１５０２）。重複単語数を、新規の概念辞書に登録された単語数により割算した、重複割合を取得する（ステップＳ１５０３）。 The processing from step S1501 to step S1505 is repeated for all existing concept dictionaries (step S1501). The number of duplicate words stored redundantly in both the new concept dictionary and the existing concept dictionary is confirmed (step S1502). The duplication ratio obtained by dividing the number of duplicate words by the number of words registered in the new concept dictionary is acquired (step S1503).

重複割合が予め定められた値よりも小さいくなければ、ステップＳ１５０７に進む。それ以外の場合は、ステップＳ１５０５に進む。（ステップＳ１５０４）。
すべての既存概念辞書について確認していれば、ステップＳ１５０６に進む。それ以外の場合は、ステップＳ１５０１に戻る（ステップＳ１５０５）。 If the overlap ratio is not smaller than a predetermined value, the process proceeds to step S1507. In cases other than that described here, process flow proceeds to Step S1505. (Step S1504).
If all the existing concept dictionaries have been confirmed, the process proceeds to step S1506. Otherwise, the process returns to step S1501 (step S1505).

新規概念辞書は、すべての既存概念辞書に対して、単語の重複割合が予め定められた値よりも小さいことが、確認された。すなわち新規概念辞書は、既存の概念辞書のいずれと比べても、異なる用語が多く使用されており、業新規概念辞書は、既存の概念辞書のいずれとも異なる業務範囲であると判断する。（ステップＳ１５０６）。
新規概念辞書は、少なくともひとつの既存概念辞書に対して、単語の重複割合が予め定められた値よりも小さくないことが、確認された。すなわち新規概念辞書は、少なくともひとつの既存概念辞書に対して、似通った用語が多く使用されており、新規概念辞書は、既存の概念辞書のいずれかと同じ業務範囲であると判断する。（ステップＳ１５０７）。 It was confirmed that the new concept dictionary has a word overlapping ratio smaller than a predetermined value with respect to all existing concept dictionaries. That is, the new concept dictionary uses a lot of different terms compared to any of the existing concept dictionaries, and the business new concept dictionary is determined to have a different business scope from any of the existing concept dictionaries. (Step S1506).
New concept dictionary, for at least one existing concept dictionary, that overlap percentage of words is not rather smaller than a predetermined value, was confirmed. That is, the new concept dictionary uses many similar terms with respect to at least one existing concept dictionary, and the new concept dictionary is determined to be within the same business scope as any of the existing concept dictionaries. (Step S1507).

課金種別として、概念辞書業務範囲を指定できることにより、客先でテキストマイニングシステムの使用部門や用途が広がった場合に、課金を増加させることが出来る。
As the accounting classification, and more that you can specify the concept dictionary business scope, if the use departments and applications of text mining system has spread in the customer, it is possible to increase the billing.

制限値の指定により、ユーザが予想以上の支出となるのを防ぐことができる。 By specifying the limit value, it is possible to prevent the user from spending more than expected.

制限値に無制限を指定できることにより、ライセンスＩＤによって、制限をかける場合と、制限をかけない場合とを使い分けることができる。 Since it is possible to designate unlimited as the limit value, it is possible to selectively use a case where the restriction is applied and a case where the restriction is not applied depending on the license ID.

ライセンスＩＤごとにユーザＩＤを複数設定することができる。テキストマイニングシステムのひとつの運用形態としては、登録処理は、定期的業務として、クライアント側管理者が実施し、分析処理は、様々なノウハウを持つ、複数の人物が試行錯誤を繰り返しながら不定期に行うことが想定されるが、このような運用形態に対応できる。 A plurality of user IDs can be set for each license ID. As one operation mode of the text mining system, the registration process is performed as a regular operation by the client-side administrator, and the analysis process is performed irregularly while repeating various trials and errors with various know-how. Although it is assumed to be performed, it is possible to cope with such an operation mode.

実施の形態２．
次に実施の形態２について説明する。本実施の形態では、ひとつのライセンスＩＤ６０１に対して、複数の課金種別６０２を設定できる実施例を説明する。本実施の形態におけるテキストマイニングシステムの構成は、実施の形態１で説明した図１と同じである。ただし、登録制限記憶部１１６に格納されるデータが、実施の形態１とは異なる。 Embodiment 2. FIG.
Next, a second embodiment will be described. In this embodiment, an example in which a plurality of charging types 602 can be set for one license ID 601 will be described. The configuration of the text mining system in the present embodiment is the same as that of FIG. 1 described in the first embodiment. However, the data stored in the registration restriction storage unit 116 is different from that in the first embodiment.

図１６は、本実施の形態における、登録制限記憶部１１６のデータ例であり、ライセンスＩＤ６０１、課金種別６０２、現状値６０３、及び制限値６０３に加えて、料金結合種別１６０１、及び制限結合種別１６０２を保有する。本実施の形態では、ひとつのライセンスＩＤ６０１に対して、複数の課金種別６０２、現状値６０３、及び制限値６０４が定義され、ひとつのライセンスＩＤ６０１に対して、ひとつの料金結合種別１６０１、及び制限結合種別１６０２が定義される。 FIG. 16 is a data example of the registration restriction storage unit 116 in this embodiment. In addition to the license ID 601, the charge type 602, the current value 603, and the limit value 603, the charge combination type 1601 and the limit combination type 1602 are shown. Is held. In this embodiment, a plurality of charging types 602, a current value 603, and a limit value 604 are defined for one license ID 601, and one charge combination type 1601 and limit combination are defined for one license ID 601. A type 1602 is defined.

ライセンスＩＤ６０１、及び課金種別６０２の値は、予め設定されている。現状値６０３には、登録処理の流れの中で、登録制限確認部１１５により課金種別６０２と登録処理の使用状況に応じてカウントされた値が常時更新される。これは複数の課金種別６０２について、実行される。制限値６０４の値は、予め設定されており、現状値６０３が制限値を超えると、登録処理の使用が制限される。料金結合種別１６０１は予め設定されており、複数の課金種別６０２に係る料金をどのように合わせるのかを表す値として、和、高、及び安を設定できる。制限結合種別１６０２は予め設定されており、複数の課員種別６０２に係る制限をどのように合わせるのかを表す値として、全て、及びひとつを設定できる。 The values of the license ID 601 and the charge type 602 are set in advance. In the current value 603, the value counted by the registration restriction confirmation unit 115 according to the charging type 602 and the usage status of the registration process is constantly updated in the flow of the registration process. This is executed for a plurality of billing types 602. The value of the limit value 604 is set in advance, and when the current value 603 exceeds the limit value, use of the registration process is limited. The charge combination type 1601 is set in advance, and sum, high, and low can be set as values representing how to combine charges related to a plurality of charge types 602. The limit combination type 1602 is set in advance, and all or one can be set as a value indicating how the restrictions relating to the plurality of section types 602 are matched.

実施の形態１では、図１０から図１２に示す登録処理のステップＳ１００５、ステップＳ１１０２、及びステップＳ１１０６において、課金種別６０２に応じた現状値６０３の加算を行うが、本実施の形態でも同様であり、複数の課金種別６０２について、加算を行う。 In the first embodiment, in step S1005, step S1102, and step S1106 of the registration process shown in FIGS. 10 to 12, the current value 603 is added according to the charge type 602. The same applies to the present embodiment. Then, addition is performed for a plurality of accounting types 602.

また、実施の形態１では、ステップＳ１００５、ステップＳ１１０２、及びステップＳ１１０６において、加算後の現状値６０３が制限値６０４を超えた場合は、現状値６０３を元に戻し、登録処理を中断し、エラーの旨を出力しているが、本実施の形態では、制限結合種別１６０２の値に応じて、以下のように処理を進める。制限結合手段１６０２が全ての場合には、登録処理を行うライセンス者のライセンスＩＤ６０１に係る全ての課金種別６０２について、現状値６０３が制限値６０４を超えた場合には、現状値６０３を元に戻し、登録処理を中断し、エラーの旨を出力する。制限結合手段１６０２がひとつの場合には、登録処理を行うライセンス者のライセンスＩＤ６０１に係るひとつの課金種別６０２について、現状値６０３が制限値６０４を超えた場合には、現状値６０３を元に戻し、登録処理を中断し、エラーの旨を出力する。 In the first embodiment, if the current value 603 after the addition exceeds the limit value 604 in steps S1005, S1102, and S1106, the current value 603 is returned to the original value, the registration process is interrupted, and an error occurs. In the present embodiment, the process proceeds as follows according to the value of the restricted combination type 1602. When the limit combining unit 1602 is all, when the current value 603 exceeds the limit value 604 for all the charging types 602 related to the license ID 601 of the licensee who performs the registration process, the current value 603 is returned to the original value. The registration process is interrupted and an error message is output. When there is only one limit combining unit 1602, if the current value 603 exceeds the limit value 604 for one charging type 602 related to the license ID 601 of the licensee who performs the registration process, the current value 603 is returned to the original value. The registration process is interrupted and an error message is output.

実施の形態１では、図１３に示す分析処理のステップＳ１３０４において、課金種別６０２に応じた現状値６０３の加算と、現状値６０３が制限値６０４を超えた場合の、分析処理の中断を行う。本実施の形態では、分析処理については、ひとつの課金種別６０２しか用意していないので、分析処理についての動作は、実施の形態１と同様である。なお、分析処理についても複数の課金種別６０２がある場合には、登録処理の場合と同様に、すべての課金種別６０２について加算を行い、制限結合種別１６０２に応じて、制限を行う。 In the first embodiment, in step S1304 of the analysis process shown in FIG. 13, addition of the current value 603 corresponding to the charge type 602 and interruption of the analysis process when the current value 603 exceeds the limit value 604 are performed. In the present embodiment, only one charge type 602 is prepared for the analysis process, and therefore the operation for the analysis process is the same as that of the first embodiment. When there are a plurality of billing types 602 in the analysis process, addition is performed for all the billing types 602 as in the case of the registration processing, and the limit is set according to the limit combination type 1602.

また、実施の形態１では、図１４に示す料金算定処理のステップＳ１４０３において、料金算定部１２０は、課金情報記憶部１２１を参照し、登録処理及び分析処理に係る料金を各々算出し、すべて加算し、該当ライセンスＩＤの料金とするが、本実施の形態では、料金結合種別１６０１の値に応じて、以下のように処理を進める。 Further, in the first embodiment, in step S1403 of the charge calculation process shown in FIG. 14, the charge calculation unit 120 refers to the billing information storage unit 121, calculates the charges related to the registration process and the analysis process, and adds all of them. In the present embodiment, the process proceeds as follows according to the value of the charge combination type 1601.

料金算定部１２０は、課金情報記憶部１２１を参照し、料金算定対象のライセンスＩＤ６０１に係る全ての課金種別６０２について、課金種別６０２ごとの料金を算出する。次に、料金算定部１２０は、料金結合種別１６０１が和の場合は、課金種別６０２ごとの料金の和を算出する。又は、料金結合種別１６０１が高の場合は、課金種別６０２ごとの料金の中で、最も高い料金を選択する。又は、料金結合種別１６０１が安の場合は、課金種別６０２ごとの料金の中で、最も安い料金を選択する。このようにして算出又は選択された料金を、該当ライセンスＩＤの料金とする。 The fee calculating unit 120 refers to the charging information storage unit 121 and calculates a fee for each charging type 602 for all charging types 602 related to the license ID 601 to be calculated. Next, when the charge combination type 1601 is sum, the charge calculation unit 120 calculates the sum of charges for each charge type 602. Alternatively, when the charge combination type 1601 is high, the highest charge is selected from the charges for each charge type 602. Alternatively, when the charge combination type 1601 is low, the lowest price among the charges for each charge type 602 is selected. The fee calculated or selected in this way is set as the fee for the corresponding license ID.

本実施の形態の発明によれば、テキストマイニングにおいて、複数の課金種別を用いて、クライアントのニーズに応じた柔軟な従量制の課金を実現できる。 According to the invention of the present embodiment, in text mining, it is possible to realize flexible metered billing according to client needs using a plurality of billing types.

また、複数の課金種別を用いて、クライアントのニーズに応じた柔軟な使用制限を実現できる。 Further, by using a plurality of billing types, it is possible to realize flexible usage restrictions according to client needs.

なお、実施の形態１又は実施の形態２において、図１を用いて説明したテキストマイニングシステムの構成に、文書索引作成部１０４、属性情報作成部１０５、文書索引記憶部１０９、属性情報記憶部１１０を含めなくてもよい。
この場合には、図４を用いて説明した管理情報記憶部１１１は、文書索引情報名４０４、及び属性情報名４０５を保有せず、ライセンスＩＤ４０１、文書名４０２、及び概念辞書名４０３を保有する。図１２を用いて説明した登録処理の流れでは、ステップＳ１２０１、及びステップＳ１２０２は不要となる。図１３を用いて説明した分析処理の流れのステップＳ１３０６では、分析部１１３は、該当分析対象文書１２４に対応する概念辞書を取得して、該当分析条件１２６により分析し、分析結果１２７を出力する。 In the first embodiment or the second embodiment, the structure of the text mining system described with reference to FIG. 1 includes the document index creation unit 104, the attribute information creation unit 105, the document index storage unit 109, and the attribute information storage unit 110. May not be included.
In this case, the management information storage unit 111 described with reference to FIG. 4 does not have the document index information name 404 and the attribute information name 405, but has the license ID 401, the document name 402, and the concept dictionary name 403. . In the registration process flow described with reference to FIG. 12, steps S1201 and S1202 are not necessary. In step S1306 of the analysis processing flow described with reference to FIG. 13, the analysis unit 113 acquires a concept dictionary corresponding to the corresponding analysis target document 124, analyzes it according to the corresponding analysis condition 126, and outputs an analysis result 127. .

なお、実施の形態１又は実施の形態２において説明したテキストマイニングシステムは、ＬＡＮやＷＡＮなどに接続された通信装置９０６を介して、複数のクライアント端末に対して、入出力を可能としてもよい。 Note that the text mining system described in the first embodiment or the second embodiment may allow input / output to / from a plurality of client terminals via the communication device 906 connected to a LAN, a WAN, or the like.

本発明によれば、ＬＡＮ、又はＷＡＮを通じて、多数のクライアントにテキストマイニングシステムを提供することができ、クライアントにとっては初期導入しやすく、また、テキストマイニングシステムを運営する管理者側にとっても、多数のクライアントを管理しやすい。 According to the present invention, a text mining system can be provided to a large number of clients via a LAN or a WAN, and it is easy for the client to introduce the text mining system. Easy to manage clients.

その他にも、総合的に以下のような効果がある。テキストマイニングシステムの効果を判断できていないユーザにとっても初期導入の決断がし易くなる。また、効果が出たため使用量が多くなったユーザからは多くの使用料金を徴収することに納得を得られやすい。社内のイントラネットで複数ユーザがサーバ上のテキストマイニング装置を使用する場合には、従量制の課金を実現することによって、規模に応じた使用料金を徴収することができる。テキストマイニング装置をインターネット上のサーバで稼動し、あらかじめ登録した技術文書や特許情報などの分析機能を一般に提供するサービスを想定した場合、従量制の課金を実現することによって、規模に応じた使用料金を徴収することができる。 In addition, there are the following effects overall. Even for users who have not been able to judge the effect of the text mining system, it is easy to make an initial introduction decision. Moreover, it is easy to be convinced to collect a lot of usage fees from users who have increased the amount of usage due to the effect. When a plurality of users use a text mining device on a server in an in-house intranet, a usage fee corresponding to the scale can be collected by realizing charge-based billing. Assuming a service that operates a text mining device on a server on the Internet and generally provides analysis functions such as pre-registered technical documents and patent information, the usage fee according to the scale can be obtained by realizing metered billing. Can be collected.

この発明の実施の形態１におけるテキストマイニングシステムの構成図である。It is a block diagram of the text mining system in Embodiment 1 of this invention. この発明の実施の形態１における登録条件を説明する模式図である。It is a schematic diagram explaining the registration conditions in Embodiment 1 of this invention. この発明の実施の形態１における概念辞書記憶部のデータ例の図である。It is a figure of the example of data of the concept dictionary memory | storage part in Embodiment 1 of this invention. この発明の実施の形態１における管理情報記憶部のデータ例の図である。It is a figure of the example of data of the management information storage part in Embodiment 1 of this invention. この発明の実施の形態１における管理情報記憶部のデータ例の図である。It is a figure of the example of data of the management information storage part in Embodiment 1 of this invention. この発明の実施の形態１における登録制限記憶部のデータ例の図である。It is a figure of the example of data of the registration restriction memory | storage part in Embodiment 1 of this invention. この発明の実施の形態１における分析制限記憶部のデータ例の図である。It is a figure of the example of data of the analysis restriction storage part in Embodiment 1 of this invention. この発明の実施の形態１における課金情報記憶部のデータ例の図である。It is a figure of the example of data of the accounting information storage part in Embodiment 1 of this invention. この発明の実施の形態１におけるテキストマイニングシステムのＨ／Ｗ構成図である。It is a H / W block diagram of the text mining system in Embodiment 1 of this invention. この発明の実施の形態１における登録処理のフローを示す図である。It is a figure which shows the flow of the registration process in Embodiment 1 of this invention. この発明の実施の形態１における登録処理のフローを示す図である。It is a figure which shows the flow of the registration process in Embodiment 1 of this invention. この発明の実施の形態１における登録処理のフローを示す図である。It is a figure which shows the flow of the registration process in Embodiment 1 of this invention. この発明の実施の形態１における分析処理のフローを示す図である。It is a figure which shows the flow of the analysis process in Embodiment 1 of this invention. この発明の実施の形態１における料金算定処理のフローを示す図である。It is a figure which shows the flow of the charge calculation process in Embodiment 1 of this invention. この発明の実施の形態１における概念辞書業務範囲判定処理のフローを示す図である。It is a figure which shows the flow of the concept dictionary work range determination process in Embodiment 1 of this invention. この発明の実施の形態２における登録制限記憶部のデータ例の図である。It is a figure of the example of data of the registration restriction memory | storage part in Embodiment 2 of this invention.

Explanation of symbols

１０１登録受付部、１０２テキスト解析部、１０３概念辞書作成部、１０４文書索引作成部、１０５属性情報作成部、１０６文書記憶部、１０７テキスト解析結果記憶部、１０８概念辞書記憶部、１０９文書索引記憶部、１１０属性情報記憶部、１１１管理情報記憶部、１１２分析受付部、１１３分析部、１１４登録課金種別確認部、１１５登録制限確認部（制限確認部）、１１６登録制限記憶部（制限記憶部）、１１７分析課金種別確認部、１１８分析制限確認部（制限確認部）、１１９分析制限記憶部（制限記憶部）、１２０料金算定部、１２１課金情報記憶部。 DESCRIPTION OF SYMBOLS 101 Registration reception part, 102 Text analysis part, 103 Concept dictionary creation part, 104 Document index creation part, 105 Attribute information creation part, 106 Document storage part, 107 Text analysis result storage part, 108 Concept dictionary storage part, 109 Document index storage 110, attribute information storage unit, 111 management information storage unit, 112 analysis reception unit, 113 analysis unit, 114 registered billing type confirmation unit, 115 registration restriction confirmation unit (restriction confirmation unit), 116 registration restriction storage unit (restriction storage unit) ) 117 analysis charge type confirmation unit, 118 analysis restriction confirmation unit (restriction confirmation unit), 119 analysis restriction storage unit (restriction storage unit), 120 fee calculation unit, 121 charge information storage unit.

Claims

A registration receiving unit for receiving a license ID for confirming license holding and registration of a plurality of documents to be analyzed;
A text analysis unit that extracts words included in the document and co-occurrence information of the words;
A concept dictionary creating unit that creates a concept dictionary by associating the word with a concept vector calculated from the co-occurrence information;
A concept dictionary storage unit for storing a plurality of the concept dictionaries;
An analysis reception unit that receives the license ID, the specification of the document, and the specification of analysis conditions;
An analysis unit that outputs a result of analyzing the concept dictionary according to the document according to the analysis condition;
A limit storage unit for storing a predetermined charging type and a current value for each license ID;
In accordance with the billing type corresponding to the license ID, a limit confirmation unit that adds a predetermined number of processing executions, the number of creations, or the amount of creations to the current value;
A charge calculation unit for calculating a charge from a current value corresponding to the license ID stored in the restriction storage unit and a charge unit price determined in advance according to the charge type ;
When the charge type corresponding to the license ID is a concept dictionary service range and the concept dictionary creation unit creates a new concept dictionary, the restriction confirmation unit stores the concept dictionary in the new concept dictionary. If the ratio of overlapping words with existing concept dictionaries stored in the section is smaller than a predetermined value for any one of all the existing concept dictionaries, the current status corresponding to the concept dictionary work range A text mining system characterized by adding 1 to a value .

The limit storage unit further stores a limit value for each license ID,
When the limit confirmation unit determines that the limit value is exceeded when added to the current value,
The text mining system according to claim 1, wherein the registration reception unit or the analysis reception unit interrupts processing.

The text mining system according to claim 2 , wherein an unlimited value can be designated in addition to a value that can be designated as the restricted value.

The limit storage unit further stores, for each license ID, a charge calculation type having a value of sum, high, or low as a charge calculation type based on a plurality of charge types,
The restriction confirmation unit adds to the current value corresponding to each of the charging types according to a plurality of charging types corresponding to the license ID,
The charge calculation unit calculates a charge corresponding to each of the charge types according to a plurality of charge types corresponding to the license ID,
If the charge calculation method is sum, the sum of the charges, if the charge calculation method is high, the highest charge, if the charge calculation method is low, the cheapest charge, The text mining system according to any one of claims 1 to 3 , wherein the fee is a fee for the license ID.

The restriction storage unit further stores, for each license ID, a restriction combination type having all or one of the values as a restriction type based on a plurality of the charge types,
The restriction confirmation unit
When the restriction types are all, according to a plurality of charging types corresponding to the license ID, it is added to the current value corresponding to each of the charging types, and the value after the addition is set as a limiting value for all current values. If it exceeds, the registration reception unit will output an error,
When there is one restriction type, the registration is added to the current value corresponding to each of the charge types according to a plurality of charge types corresponding to the license ID, and when the limit value is exceeded for one current value, the registration The text mining system according to any one of claims 2 to 3 , wherein the reception unit interrupts the processing.

Wherein the analysis reception unit accepts a user ID, according to any one of claims 1 to 5, wherein the user ID is characterized in that an error unless the user ID corresponding to the license ID to a predetermined Text mining system.

The registration accepting unit, a plurality of clients terminal, receiving the document and the license ID,
The text analysis unit extracts the co-occurrence information of the word for each license ID,
The concept dictionary creation unit creates the concept dictionary for each license ID,
Wherein the analysis reception unit, the user ID your capital, receiving said user ID, and designation of the document, and a designation of the analysis condition,
The text mining system according to claim 6 , wherein the analysis unit transmits a result of analysis according to the analysis condition to the client terminal for each user ID.