JP2012208754A

JP2012208754A - Information processing device, update method for updating database, and program for updating database

Info

Publication number: JP2012208754A
Application number: JP2011074229A
Authority: JP
Inventors: Keizo Uchiyama; 恵三内山; Asako Ono; 朝子小野
Original assignee: Tokyo Electric Power Co Inc
Current assignee: Tokyo Electric Power Company Holdings Inc
Priority date: 2011-03-30
Filing date: 2011-03-30
Publication date: 2012-10-25
Anticipated expiration: 2031-03-30
Also published as: JP5673292B2

Abstract

PROBLEM TO BE SOLVED: To reduce a load on a user by enhancing the efficiency of work relating to optimization of a database.SOLUTION: An information processing device connected to a database in which analysis data including an analysis key for analyzing document data is accumulated, comprises: a unit analysis key extraction unit 25 that extracts one or plural analysis keys to be a reference for recognizing the structure of an analysis key from the database as unit analysis keys; a structure recognition unit 26 that recognizes the structure of the analysis key included in the database using the unit analysis keys; and a database update unit 29 that updates information related to the analysis key included in the database using information related to the unit analysis keys according to the structure recognized by the structure recognition unit 26.

Description

本発明は、文書データを解析する際に用いられる解析キーを更新するための技術に関する。 The present invention relates to a technique for updating an analysis key used when analyzing document data.

従来、受け付けた階層型分類辞書に対する更新提案を提案履歴記憶部に格納するとともに、受け付けた更新提案と近似する近似提案を提案履歴記憶部に格納されている過去の更新提案を検索して抽出し、抽出された近似提案を提示する分類辞書更新方法がある（特許文献１を参照）。 Conventionally, update proposals for received hierarchical classification dictionaries are stored in the proposal history storage unit, and approximate proposals that approximate the received update proposals are searched and extracted from past update proposals stored in the proposal history storage unit. There is a classification dictionary update method that presents extracted approximate proposals (see Patent Document 1).

また、入力されたテキスト情報からキーワードを抽出するキーワード抽出部、キーワードの出現に関する統計量を求めるキーワード統計部、キーワードの出現に関する統計量に基づいて抽出されたキーワードの評価値を算出するキーワード評価値算出部、算出された評価値に基づいてこのキーワードを登録するか否かまたは削除するか否かを判定する判定部、判定部が行った判定の結果によって辞書データベースに対してキーワードの登録または削除を行う辞書登録削除部、および辞書データベースを備える辞書作成装置がある（特許文献２を参照）。 In addition, a keyword extraction unit that extracts keywords from the input text information, a keyword statistics unit that obtains statistics on keyword appearance, and a keyword evaluation value that calculates an evaluation value of a keyword extracted based on the statistics on keyword appearance Calculation unit, determination unit for determining whether to register or delete this keyword based on the calculated evaluation value, registration or deletion of a keyword in the dictionary database according to a result of determination performed by the determination unit There is a dictionary creation / deletion unit that performs and a dictionary creation device that includes a dictionary database (see Patent Document 2).

更に、自然言語解析技術に関し、特に、自然言語からなる文の各単語に対して、文脈に合った意味を示す意味タグや該当する意味に対応する概念を示す意味クラスなどを付与する単語意味付与装置がある（特許文献３を参照）。その他、辞書を構築または更新するための技術が種々提案されている（特許文献４および５を参照）。 Furthermore, with regard to natural language analysis technology, in particular, for each word of a sentence composed of natural language, word meaning assignment that assigns a semantic tag indicating a meaning that matches the context, a semantic class indicating a concept corresponding to the corresponding meaning, and the like. There is a device (see Patent Document 3). In addition, various techniques for constructing or updating a dictionary have been proposed (see Patent Documents 4 and 5).

特開２００６−３０９４４６号公報JP 2006-309446 A 再表２００５／０６６８３７号公報Reissue 2005/0666837 特開２００９−１８１４０８号公報JP 2009-181408 A 特開２００８−２３４４２９号公報JP 2008-234429 A 特開２００５−１７４１１６号公報JP-A-2005-174116

従来、辞書データを用いて文書データ等の対象データを解析するための技術がある。このような技術を用いて精度の高い解析結果を得るには、辞書データが含まれるデータベース（辞書）のメンテナンスが必要である。しかし、データベースを整理し、最適化する場合には、ユーザがデータベースの構造および整理の対象となる各データを意識する必要があり、辞書データの最適化作業は、煩雑でユーザへの負担が大きいものであった。 Conventionally, there is a technique for analyzing target data such as document data using dictionary data. In order to obtain a highly accurate analysis result using such a technique, it is necessary to maintain a database (dictionary) including dictionary data. However, when organizing and optimizing the database, the user needs to be aware of the structure of the database and each data to be organized, and the dictionary data optimization work is cumbersome and burdensome to the user It was a thing.

本発明は、上記した問題に鑑み、データベースの最適化に係る作業を効率化し、ユーザの負担を軽減することを課題とする。 In view of the above-described problems, an object of the present invention is to improve the efficiency of work related to database optimization and reduce the burden on the user.

本発明は、以下の構成を備えることで、上記した課題を解決することとした。即ち、本発明は、文書データを解析するための解析キーを含む解析用データが蓄積されるデータベースに接続される情報処理装置であって、前記データベースから、解析キーの構成を把握
するための基準となる１または複数の解析キーを、単位解析キーとして抽出する単位解析キー抽出手段と、前記単位解析キーを用いて、前記データベースに含まれる解析キーの構成を把握する構成把握手段と、前記構成把握手段によって把握された構成に従って、前記単位解析キーに関連づけられた情報を用いて、前記データベースに含まれる解析キーに関連づけられる情報を更新するデータベース更新手段と、を備える情報処理装置である。 The present invention has the following configuration to solve the above-described problems. That is, the present invention is an information processing apparatus connected to a database in which analysis data including an analysis key for analyzing document data is stored, and a reference for grasping the configuration of the analysis key from the database A unit analysis key extracting means for extracting one or a plurality of analysis keys as unit analysis keys, a structure grasping means for grasping a structure of an analysis key included in the database using the unit analysis key, and the structure An information processing apparatus comprising: database updating means for updating information associated with an analysis key included in the database using information associated with the unit analysis key according to a configuration grasped by the grasping means.

ここで、解析キーとは、文書データを解析するためのキーとなる情報であり、例えば、文書データの文字列を検索する際の検索キーとして用いられる情報である。解析用データは、解析キーを含み、更に当該解析キーに関連する情報（解析キーの属性情報や意味情報等）を含み得る。 Here, the analysis key is information used as a key for analyzing document data, and is information used as a search key when searching for a character string of document data, for example. The analysis data includes an analysis key and may further include information related to the analysis key (such as attribute information and semantic information of the analysis key).

また、解析キーは、他の解析キーを包含し得る。このため、本発明に係る情報処理装置は、解析キーの構成を把握するための基準となる解析キーを、単位解析キーとして抽出し、この単位解析キーを用いて、単位解析キーを包含する解析キーの構成を把握する。そして、本発明は、このようにして把握された構成に従ってデータベースを更新することで、データベースの最適化に係る作業を効率化し、ユーザの負担を軽減することを可能とした。 The analysis key can include other analysis keys. Therefore, the information processing apparatus according to the present invention extracts an analysis key serving as a reference for grasping the configuration of the analysis key as a unit analysis key, and uses this unit analysis key to analyze including the unit analysis key. Know the key structure. The present invention updates the database according to the configuration grasped in this way, thereby making it possible to improve the efficiency of the work related to the optimization of the database and reduce the burden on the user.

また、前記単位解析キー抽出手段は、前記データベースから、自身を検索キーとして用いた場合にのみ索出される解析キーを、前記単位解析キーとして抽出してもよい。 Further, the unit analysis key extracting means may extract, as the unit analysis key, an analysis key that is searched only when it is used as a search key from the database.

換言すれば、前記単位解析キー抽出手段は、前記データベースから、自身以外の解析キーを検索キーとして用いた場合には索出されない解析キーを、前記単位解析キーとして抽出することが出来る。このような単位解析キーは、他の解析キーを包含しないため、解析キーの構成を把握するための最小単位として用いることが出来る。 In other words, the unit analysis key extraction means can extract, as the unit analysis key, an analysis key that is not searched when an analysis key other than itself is used as a search key from the database. Since such a unit analysis key does not include other analysis keys, it can be used as a minimum unit for grasping the configuration of the analysis key.

また、前記情報処理装置は、前記データベースに蓄積されている解析キーを用いて、該データベースを検索するデータベース検索手段を更に備え、前記単位解析キー抽出手段は、前記データベース検索手段による検索の結果、自身を検索キーとして用いた場合にのみ索出される解析キーを、前記単位解析キーとして抽出してもよい。 The information processing apparatus further includes database search means for searching the database using the analysis key stored in the database, and the unit analysis key extraction means is a result of the search by the database search means, An analysis key searched only when it is used as a search key may be extracted as the unit analysis key.

また、前記情報処理装置は、前記構成把握手段によって把握された構成において、前記単位解析キーの何れにも該当しない文字列を、追加単位解析キーとして更に抽出する追加単位解析キー抽出手段を更に備えてもよい。 The information processing apparatus further includes an additional unit analysis key extracting unit that further extracts, as an additional unit analysis key, a character string that does not correspond to any of the unit analysis keys in the configuration grasped by the configuration grasping unit. May be.

このような追加単位解析キー抽出手段を更に備えることで、検索等の方法によって抽出されなかった単位解析キーを抽出することが可能となる。なお、この際、文字列自体として意味を有さない文字列（例えば、接続詞「が」や「は」等）については、追加単位解析キーとして抽出する対象から除外されてもよい。 By further providing such additional unit analysis key extraction means, it becomes possible to extract unit analysis keys that have not been extracted by a method such as search. At this time, a character string having no meaning as the character string itself (for example, the conjunctions “ga”, “ha”, etc.) may be excluded from the objects to be extracted as the additional unit analysis key.

また、前記情報処理装置は、前記追加単位解析キーの意味情報として、前記単位解析キーの何れにも該当しない文字列に基づいて生成された意味情報を設定する、追加単位解析キー設定手段を更に備えてもよい。 The information processing apparatus further includes additional unit analysis key setting means for setting semantic information generated based on a character string that does not correspond to any of the unit analysis keys as semantic information of the additional unit analysis key. You may prepare.

また、前記データベース更新手段は、前記単位解析キーに関連づけられた、該単位解析キーの意味情報を用いて、前記データベースに含まれる解析キーに関連づけられる、該解析キーの意味情報を更新してもよい。 Further, the database updating means may update the semantic information of the analysis key associated with the analysis key included in the database using the semantic information of the unit analysis key associated with the unit analysis key. Good.

また、前記解析キーは、正規表現を用いて定義されてもよい。正規表現で定義された解析キーが用いられることによって、解析キーを用いて文書データ等の対象データの解析を
行う場合に、対象データの表記揺れに影響されずに必要な特徴部分を索出することが出来る。なお、このような特徴部分の索出をより正確に行うために、解析キーには、口語体や主語の省略等の表記揺れに影響されない特徴を正規表現化したものが用いられることが好ましい。 The analysis key may be defined using a regular expression. By using the analysis key defined by regular expressions, when analyzing the target data such as document data using the analysis key, the necessary feature part is searched without being affected by the fluctuation of the target data. I can do it. In order to more accurately search for such a feature portion, it is preferable to use a regular expression of an analysis key that is not affected by fluctuations in notation such as colloquial style or subject omission.

更に、本発明は、コンピュータが実行する方法、又はコンピュータに実行させるプログラムとしても把握することが可能である。また、本発明は、そのようなプログラムをコンピュータその他の装置、機械等が読み取り可能な記録媒体に記録したものでもよい。ここで、コンピュータ等が読み取り可能な記録媒体とは、データやプログラム等の情報を電気的、磁気的、光学的、機械的、または化学的作用によって蓄積し、コンピュータ等から読み取ることができる記録媒体をいう。 Furthermore, the present invention can be understood as a method executed by a computer or a program executed by a computer. Further, the present invention may be a program in which such a program is recorded on a recording medium readable by a computer, other devices, machines, or the like. Here, a computer-readable recording medium is a recording medium that stores information such as data and programs by electrical, magnetic, optical, mechanical, or chemical action and can be read from a computer or the like. Say.

本発明によれば、データベースの最適化に係る作業を効率化し、ユーザの負担を軽減することが可能となる。 According to the present invention, work related to database optimization can be made efficient, and the burden on the user can be reduced.

実施形態に係る文書データ解析装置のハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of the document data analysis apparatus which concerns on embodiment. 実施形態に係る文書データ解析装置の概念的構成を示す図である。1 is a diagram illustrating a conceptual configuration of a document data analysis apparatus according to an embodiment. 実施形態に係る文書データ解析装置の機能構成の概略を示す図である。It is a figure which shows the outline of a function structure of the document data analysis apparatus which concerns on embodiment. 実施形態に係る辞書データテーブルの構成を示す図である。It is a figure which shows the structure of the dictionary data table which concerns on embodiment. 実施形態に係る単位解析キーテーブルの構成を示す図である。It is a figure which shows the structure of the unit analysis key table which concerns on embodiment. 実施形態に係る文書データ解析処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the document data analysis process which concerns on embodiment. 実施形態における、辞書データテーブルを用いた検索処理の結果を示す図である。It is a figure which shows the result of the search process using the dictionary data table in embodiment. 実施形態に係るデータベース更新処理の流れを示すフローチャートＡである。It is flowchart A which shows the flow of the database update process which concerns on embodiment. 実施形態に係るデータベース更新処理の流れを示すフローチャートＢである。It is a flowchart B which shows the flow of the database update process which concerns on embodiment. 実施形態に係るデータベース更新処理の流れを示すフローチャートＣである。It is a flowchart C which shows the flow of the database update process which concerns on embodiment. 実施形態に係る適合程度判定部による判定結果と、それに対応する処理の内容と、の関係を示す表である。It is a table | surface which shows the relationship between the determination result by the conformity degree determination part which concerns on embodiment, and the content of the process corresponding to it. 実施形態に係るデータベース更新処理を実行した場合に、更新用データテーブルに含まれる更新用データを用いて辞書データベース内の各テーブルが更新される様子を示す図である。It is a figure which shows a mode that each table in a dictionary database is updated using the data for an update contained in the data table for an update when the database update process which concerns on embodiment is performed. 実施形態に係るデータベース最適化処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the database optimization process which concerns on embodiment.

以下、本発明に係る情報処理装置を、文書データ解析装置１として実施する場合の実施の形態について、図面に基づいて説明する。本実施形態に係る文書データ解析装置１は、例えば、コールセンタにおいてオペレータが入力し蓄積された受付ログの解析に用いることが出来る。コールセンタでは、顧客からの問い合わせ等の電話が受け付けられ、オペレータは、顧客対応の記録をコンピュータに入力する。オペレータによって入力され、蓄積される受付ログは、一部が文章で入力されている。但し、本実施形態に係る文書データ解析装置１は、コールセンタの受付ログ以外にも、様々な文章（例えば、アンケート結果等）を解析する目的で用いることが出来る。 Hereinafter, an embodiment in which an information processing apparatus according to the present invention is implemented as a document data analysis apparatus 1 will be described with reference to the drawings. The document data analysis apparatus 1 according to the present embodiment can be used, for example, for analyzing reception logs input and accumulated by an operator at a call center. In the call center, telephone calls such as inquiries from customers are accepted, and the operator inputs customer correspondence records into the computer. A part of the reception log input and accumulated by the operator is input in text. However, the document data analysis apparatus 1 according to the present embodiment can be used for the purpose of analyzing various sentences (for example, questionnaire results and the like) in addition to the call center reception log.

但し、本発明に係る情報処理装置は、文書データ解析装置１に限定されない。本発明に
係る情報処理装置は、文書データを解析するための解析キーを含む解析用データが蓄積されるデータベースに接続される情報処理装置であればよい。 However, the information processing apparatus according to the present invention is not limited to the document data analysis apparatus 1. The information processing apparatus according to the present invention may be an information processing apparatus connected to a database in which analysis data including an analysis key for analyzing document data is stored.

＜システムの構成＞
図１は、本実施形態に係る文書データ解析装置１のハードウェア構成を示す図である。文書データ解析装置１は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１１、主記憶装置としてのＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１３、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）１２、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）やＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の補助記憶装置１４、表示装置１５としてのディスプレイ、および、入力装置１６としてのキーボードやマウス等を備えるコンピュータ（情報処理装置）である。また、文書データ解析装置１は、辞書データベースに接続される。 <System configuration>
FIG. 1 is a diagram illustrating a hardware configuration of a document data analysis apparatus 1 according to the present embodiment. The document data analysis apparatus 1 includes a CPU (Central Processing Unit) 11, a RAM (Random Access Memory) 13 as a main storage device, a ROM (Read Only Memory) 12, an HDD (Hard Disk Drive), an SSD (Solid State Drive), and the like. The auxiliary storage device 14, the display device 15 as a display, and the input device 16 as a computer (information processing device) including a keyboard and a mouse. The document data analysis apparatus 1 is connected to a dictionary database.

図２は、本実施形態に係る文書データ解析装置１の概念的構成を示す図である。本実施形態に係る文書データ解析装置１によれば、従来のテキスト分析による知識獲得手法では抽出できなかった、複数の文で構成される文書から文の前後関係を考慮した形や人の直感的な言語の特徴表現も正規表現化して言語解析辞書に登録可能となり、分析の幅が広がり、アンケートなど省略形の多い文書の分析・分類・知識獲得にも対応することが可能となる。また、本実施形態に係る文書データ解析装置１は、正規表現の特徴パターンによる文書解析機能を備えているため、従来の形態素に細かく区切られた後に抽出された係り受け関係のパターン分析よりも、実表記パターンの関係をより的確に抽出し、人による直感的な表現パターンを容易に抽出することを可能としている。本実施形態の説明では、このような文書データ解析装置１に、データベース更新処理およびデータベース最適化処理を適用した場合の処理の詳細について説明する。 FIG. 2 is a diagram showing a conceptual configuration of the document data analysis apparatus 1 according to the present embodiment. According to the document data analysis apparatus 1 according to the present embodiment, a form that takes into account the context of a sentence from a document composed of a plurality of sentences, which cannot be extracted by a conventional knowledge acquisition technique based on text analysis, or a human intuitive The feature expression of a simple language can be converted into a regular expression and registered in the language analysis dictionary, the range of analysis can be expanded, and analysis, classification, and knowledge acquisition of documents with many abbreviations such as questionnaires can be handled. Further, since the document data analysis apparatus 1 according to the present embodiment has a document analysis function based on the feature pattern of the regular expression, the pattern analysis of the dependency relation extracted after being finely divided into conventional morphemes, It is possible to more accurately extract the relationship between actual notation patterns and to easily extract intuitive expression patterns by people. In the description of the present embodiment, details of processing when the database update processing and the database optimization processing are applied to the document data analysis apparatus 1 will be described.

図３は、本実施形態に係る文書データ解析装置１の機能構成の概略を示す図である。図１に示された構成を有するコンピュータは、補助記憶装置１４に記録されているプログラムが、ＲＡＭ１３に読み出され、ＣＰＵ１１によって実行されることによって、更新用解析キー取得部２１、データベース検索部２２、適合程度判定部２３、更新処理内容決定部２４、単位解析キー抽出部２５、構成把握部２６、追加単位解析キー抽出部２７、追加単位解析キー設定部２８、データベース更新部２９および文書データ解析部３０を備える文書データ解析装置１として機能する。 FIG. 3 is a diagram showing an outline of a functional configuration of the document data analysis apparatus 1 according to the present embodiment. In the computer having the configuration shown in FIG. 1, the program recorded in the auxiliary storage device 14 is read out to the RAM 13 and executed by the CPU 11, whereby the update analysis key acquisition unit 21 and the database search unit 22 are executed. , Conformity determination unit 23, update processing content determination unit 24, unit analysis key extraction unit 25, configuration grasping unit 26, additional unit analysis key extraction unit 27, additional unit analysis key setting unit 28, database update unit 29, and document data analysis It functions as the document data analysis apparatus 1 including the unit 30.

また、辞書データベースは、通常の文字列または正規表現を用いて予め定義された複数の解析キー、解析キーに対応する属性情報、および解析キー自体の意味を示す意味情報、を含む各種テーブル（解析キーが属性毎にまとめられたテーブル、および単位解析キーテーブル等）を保持する。なお、辞書は、分野ごとに異なるものが用意され、文書データの属する分野に応じて適切な分野の辞書が優先的に用いられることが好ましい。 The dictionary database includes various tables (analysis including a plurality of analysis keys defined in advance using normal character strings or regular expressions, attribute information corresponding to the analysis key, and semantic information indicating the meaning of the analysis key itself). A table in which keys are grouped for each attribute, a unit analysis key table, and the like). It should be noted that it is preferable that different dictionaries are prepared for each field, and a dictionary in an appropriate field is preferentially used according to the field to which the document data belongs.

図４は、本実施形態に係る辞書データテーブルの構成を示す図である。辞書データテーブルは、解析キーを識別するための解析キーＩＤと、通常の文字列または正規表現による文字列で定義された１の解析キーと、この解析キーに対応する属性情報と、この解析キー自体の意味を示す意味情報と、を有するパターンが蓄積された辞書データであり、解析対象の文書データの分野毎に異なるものが用意されることが好ましい。原則として、解析キーは正規表現で定義されており、この正規表現にマッチする文章の属性情報が、夫々の解析キーに関連付けられている。例えば、「.*知り(たい|たかった)」という正規表現にマ
ッチする文章は、応対および意見に分類され、その文章の意味は「知りたい」である。このため、解析キー「.*知り(たい|たかった)」には、属性情報１「応対」、属性情報２「
意見」および意味情報「知りたい」が関連付けられる。 FIG. 4 is a diagram showing a configuration of the dictionary data table according to the present embodiment. The dictionary data table includes an analysis key ID for identifying an analysis key, one analysis key defined by a normal character string or a character string using a regular expression, attribute information corresponding to the analysis key, and the analysis key. It is preferable that different dictionary data is prepared for each field of the analysis target document data. In principle, the analysis key is defined by a regular expression, and attribute information of a sentence that matches the regular expression is associated with each analysis key. For example, a sentence that matches the regular expression “. * I want to know” is classified into a response and an opinion, and the meaning of the sentence is “I want to know”. For this reason, the analysis key “. * Know (Tai | Takata)” has attribute information 1 “response” and attribute information 2 “
Opinion "and semantic information" I want to know "are associated.

更新用解析キー取得部２１は、辞書データベースの更新に用いられる解析キーである更新用解析キーを含む更新用データを取得する。なお、本実施形態において、解析キーは、正規表現を用いて定義される。更新用データの具体例については、図１０を参照。 The update analysis key acquisition unit 21 acquires update data including an update analysis key that is an analysis key used for updating the dictionary database. In the present embodiment, the analysis key is defined using a regular expression. See FIG. 10 for a specific example of the update data.

データベース検索部２２は、所定の検索キーを用いて辞書データベースを検索する。ここで用いられる検索キーとしては、例えば、更新用解析キーの他、同一の辞書データベースに蓄積されている解析キーが挙げられる。 The database search unit 22 searches the dictionary database using a predetermined search key. Examples of the search key used here include an analysis key stored in the same dictionary database in addition to an analysis key for update.

適合程度判定部２３は、解析用データに含まれる情報と、更新用解析キーに関連付けられた情報との適合程度を判定する。より具体的には、適合程度判定部２３は、データベース検索部２２によって、辞書データベースから、更新用解析キーを含む解析用データが索出されなかった場合に、適合程度を判定する。なお、適合程度判定部２３は、解析用データに含まれる解析キー、当該解析キーの属性情報および意味情報の少なくとも何れかと、更新用解析キーに関連付けられた解析キー、当該解析キーの属性情報および意味情報の少なくとも何れかとの適合程度を判定する。なお、判定の結果は、「完全一致」、「部分一致」、「不一致」および「判定不要」等を示す情報を用いて保持される。図９に示された表では、完全一致は「○」、部分一致は「△」、不一致は「×」、判定不要は「？」の記号をもって表示されている。 The degree of conformity determination unit 23 determines the degree of conformity between the information included in the analysis data and the information associated with the update analysis key. More specifically, the suitability determination unit 23 determines the suitability when the database search unit 22 does not search for analysis data including the update analysis key from the dictionary database. The conformity degree determination unit 23 includes at least one of an analysis key included in the analysis data, attribute information and semantic information of the analysis key, an analysis key associated with the update analysis key, attribute information of the analysis key, and The degree of conformity with at least one of the semantic information is determined. Note that the result of the determination is held using information indicating “complete match”, “partial match”, “mismatch”, “determination unnecessary”, and the like. In the table shown in FIG. 9, the complete match is displayed with a symbol “◯”, the partial match with “Δ”, the non-match with “x”, and the determination unnecessary with “?”.

更新処理内容決定部２４は、適合程度判定部２３による判定結果に応じて、更新用解析キーを用いて辞書データベースを更新する際の更新処理の内容を決定する。ここで決定される更新処理の内容としては、例えば、更新用解析キーを辞書データベースに追加する際の追加位置が挙げられる。なお、適合程度判定部２３による判定結果および判定結果に対応する処理内容については、図９を参照。なお、図９は、２つの属性情報を有する解析キーの更新処理を例示しているが、属性情報の数は、２つに限定されない。 The update process content determination unit 24 determines the content of the update process when updating the dictionary database using the update analysis key according to the determination result by the conformity degree determination unit 23. The content of the update process determined here includes, for example, an addition position when an update analysis key is added to the dictionary database. Refer to FIG. 9 for the determination result by the conformity degree determination unit 23 and the processing content corresponding to the determination result. Note that FIG. 9 illustrates the analysis key update process having two pieces of attribute information, but the number of pieces of attribute information is not limited to two.

単位解析キー抽出部２５は、辞書データベースから、解析キーの構成を把握するための基準となる１または複数の解析キーを、単位解析キーとして抽出する。本実施形態では、単位解析キー抽出部２５は、データベース検索部２２による検索の結果、自身を検索キーとして用いた場合にのみ索出される解析キーを、単位解析キーとして抽出する。ここで、自身を検索キーとして用いた場合にのみ索出される解析キーとは、換言すれば、自身以外の解析キーを検索キーとして用いた場合には索出されない解析キーである。 The unit analysis key extraction unit 25 extracts one or more analysis keys serving as a reference for grasping the configuration of the analysis key from the dictionary database as unit analysis keys. In the present embodiment, the unit analysis key extraction unit 25 extracts, as a unit analysis key, an analysis key that is searched only when it is used as a search key as a result of the search by the database search unit 22. Here, the analysis key that is searched only when it is used as a search key is, in other words, an analysis key that is not searched when an analysis key other than itself is used as a search key.

図５は、本実施形態に係る単位解析キーテーブルの構成を示す図である。単位解析キー抽出部２５は、図４に示された辞書から、解析キーの構成を把握するための基準となる解析キーを抽出する。そして、単位解析キー抽出部２５によって抽出された解析キーは、単位解析キーとして、図５に示される単位解析キーテーブルに登録される。 FIG. 5 is a diagram showing the configuration of the unit analysis key table according to the present embodiment. The unit analysis key extraction unit 25 extracts an analysis key serving as a reference for grasping the configuration of the analysis key from the dictionary shown in FIG. The analysis key extracted by the unit analysis key extraction unit 25 is registered as a unit analysis key in the unit analysis key table shown in FIG.

構成把握部２６は、単位解析キーを用いて、辞書データベースに含まれる解析キーの構成を把握する。 The configuration grasping unit 26 grasps the configuration of the analysis key included in the dictionary database using the unit analysis key.

追加単位解析キー抽出部２７は、構成把握部２６によって把握された構成において、単位解析キーの何れにも該当しない文字列を、追加単位解析キーとして更に抽出する。 The additional unit analysis key extraction unit 27 further extracts a character string that does not correspond to any of the unit analysis keys in the configuration grasped by the configuration grasping unit 26 as an additional unit analysis key.

追加単位解析キー設定部２８は、構成把握部２６によって把握された構成における、単位解析キーの何れにも該当しない文字列に基づいて、追加単位解析キーの意味情報を生成し、設定する。 The additional unit analysis key setting unit 28 generates and sets semantic information of the additional unit analysis key based on a character string that does not correspond to any of the unit analysis keys in the configuration grasped by the configuration grasping unit 26.

データベース更新部２９は、更新処理内容決定部２４によって決定された更新処理の内容に従って、更新用解析キーおよび当該更新用解析キーに関連付けられた情報をもって、
辞書データベースを更新する。例えば、データベース更新部２９は、更新処理内容決定部２４によって決定された追加位置に、更新用解析キーおよび当該更新用解析キーに関連付けられた情報を含む解析用データを追加することで、辞書データベースを更新する。 The database update unit 29 has an update analysis key and information associated with the update analysis key according to the content of the update process determined by the update process content determination unit 24.
Update the dictionary database. For example, the database update unit 29 adds the analysis data including the update analysis key and the information associated with the update analysis key to the additional position determined by the update processing content determination unit 24, so that the dictionary database Update.

また、データベース更新部２９は、構成把握部２６によって把握された構成に従って、単位解析キーに関連づけられた情報を用いて、辞書データベースに含まれる解析キーに関連づけられる情報を更新する。例えば、データベース更新部２９は、単位解析キーに関連づけられた、当該単位解析キーの意味情報を用いて、辞書データベースに含まれる解析キーに関連づけられる、当該解析キーの意味情報を更新する。 Further, the database update unit 29 updates information associated with the analysis key included in the dictionary database using information associated with the unit analysis key according to the configuration grasped by the configuration grasping unit 26. For example, the database updating unit 29 uses the semantic information of the unit analysis key associated with the unit analysis key to update the semantic information of the analysis key associated with the analysis key included in the dictionary database.

文書データ解析部３０は、辞書データベースに蓄積された解析キーを用いて、文書データを解析し、文書データに関連する属性情報や意味情報等を取得する。 The document data analysis unit 30 analyzes the document data using the analysis key stored in the dictionary database, and acquires attribute information and semantic information related to the document data.

＜処理の流れ＞
次に、本実施形態に係る文書データ解析装置１によって実行される処理の流れを説明する。なお、本実施形態において示される処理の順序および具体的な処理内容は、本発明を実施するうえで採用できる一例であり、実際の処理順序および具体的な処理内容には、本発明を実施するために当業者が採用可能な様々な処理順序および具体的な処理内容が採用されてよい。 <Process flow>
Next, the flow of processing executed by the document data analysis apparatus 1 according to this embodiment will be described. Note that the order of processing and specific processing contents shown in the present embodiment are examples that can be adopted in carrying out the present invention, and the present invention is implemented in the actual processing order and specific processing contents. Therefore, various processing orders and specific processing contents that can be employed by those skilled in the art may be employed.

図６は、本実施形態に係る文書データ解析処理の流れを示すフローチャートである。本フローチャートに示された処理は、ユーザによる文書データ解析処理の実行の指示が受け付けられたことを契機として開始される。但し、本フローチャートに示された処理は、予め設定されたスケジュールに従って、または定期的に実行されてもよい。 FIG. 6 is a flowchart showing the flow of document data analysis processing according to this embodiment. The process shown in this flowchart is started when a user receives an instruction to execute a document data analysis process. However, the processing shown in this flowchart may be executed according to a preset schedule or periodically.

ステップＳ１０１では、文書データの入力が受け付けられる。文書データ解析部３０は、ＬＡＮ等のネットワークや、ＵＳＢメモリ、ＣＤ−ＲＯＭ等の可搬記録媒体等を介して文書データの入力を受け付ける。本実施形態において処理の対象となる文書データは、例えば、コールセンタにおける受付ログであり、受付ログには、コール単位、または一連の案件単位で、オペレータが入力した文章が文字コードを用いたデータとして含まれる。以下、受付ログに含まれるコール単位又は案件単位のデータを、「ケース」と称する。 In step S101, input of document data is accepted. The document data analysis unit 30 accepts input of document data via a network such as a LAN or a portable recording medium such as a USB memory or a CD-ROM. The document data to be processed in the present embodiment is, for example, a reception log in a call center, and the reception log includes text input by an operator using character codes in units of calls or a series of cases. included. Hereinafter, the data in units of calls or cases included in the reception log is referred to as “case”.

各ケースには、ケースを識別するためのケースＩＤ、及びオペレータによって入力されたケースの属性情報が含まれる。属性情報とは、ケースの属性を示すための情報であり、例えば、「クレーム」、「意見要望」、「おほめ」等、ケースに係るコール又は案件の意味内容を判断可能とするために、電話対応を行ったオペレータによって設定される情報である。但し、属性情報は、必ずしも全てのケースに設定されているものではなく、受付ログ中には、オペレータによる設定忘れ等の原因で、属性情報が設定されていないケースも存在し得る。入力された文書データがＲＡＭに記録され、入力受付が完了すると、処理はステップＳ１０２へ進む。 Each case includes a case ID for identifying the case and case attribute information input by the operator. The attribute information is information for indicating the attribute of the case. For example, in order to be able to determine the meaning content of the call or case related to the case, such as “claim”, “request for opinion”, “praise”, etc. This is information set by the operator who made the call. However, the attribute information is not necessarily set for all cases, and there may be cases where the attribute information is not set in the reception log due to forgetting the setting by the operator. When the input document data is recorded in the RAM and the input acceptance is completed, the process proceeds to step S102.

ステップＳ１０２では、文書における表記が統制される。文書データ解析部３０は、表記統制用辞書（図示は省略する）を用いた検索・置換処理を実行することで、文書における表記を統制する。表記統制用辞書には、表記の揺れや頻出する誤記を定義した解析キーが、通常の文字列または正規表現文字列で含まれており、また、この解析キーに対応する置換文字列、即ち表記統制後の文字列が含まれている。このような表記統制用辞書を用いて、文書データに対して検索・置換処理が適用されることで、文書中の表記が統制され、表記揺れや誤字のない（または、表記揺れや誤字が低減された）文書データが生成される。 In step S102, the notation in the document is regulated. The document data analysis unit 30 controls the notation in the document by executing a search / replacement process using a notation control dictionary (not shown). The dictionary for notation control contains analysis keys that define notation fluctuations and frequent misprints as normal character strings or regular expression character strings, and the replacement character strings corresponding to these analysis keys, that is, notations. The post-control character string is included. By using such a dictionary for notation control, search / replacement processing is applied to document data, so that notation in the document is controlled and there are no notation or typographical errors (or reduced typographical or typographical errors). Document data) is generated.

ここで、統制とは、文章中で用いられる表現を一定の基準の下に画一化することをいう。具体的には、誤記の修正、複数の表記方法がある語句の統一（例えば、「ファックス」、「ファクシミリ」等の文字列を全て「ＦＡＸ」に置換する）、同義語の統制（例えば、「手早く」、「素早く」、「迅速に」等の文字列を「すぐに」に置換する）、接頭語の削除、もってまわった言い回しの補正、である調への統一、簡素化、意味を持たない文末の削除および補正、等が行われる。表記の統制が終了すると、処理はステップＳ１０３へ進む。 Here, the term “control” refers to standardizing expressions used in sentences under a certain standard. Specifically, correction of typographical errors, unification of words and phrases having a plurality of notation methods (for example, replacing all character strings such as “fax” and “facsimile” with “FAX”), synonym control (for example, “ (Easy, quick, and quickly) replaces the string with “immediately”), deletes prefix, corrects phrasing, unifies to key, simplifies, has meaning No end of sentence is deleted and corrected. When the notation control ends, the process proceeds to step S103.

ステップＳ１０３では、辞書データテーブルを用いた検索処理が行われる。文書データ解析部３０は、ステップＳ１０２で生成された、表記統制済みの文書データを、辞書データテーブルに含まれる解析キーで検索し、索出された文字列（以下、「索出文字列」という）、索出文字列が含まれる対象ケースのケースＩＤ、文書中の索出文字列の位置、索出に係る解析キー、索出文字列の属性情報、意味情報等を夫々関連付けて、解析結果としてＲＡＭに保持する。その後、本フローチャートに示された処理は終了する。 In step S103, search processing using a dictionary data table is performed. The document data analysis unit 30 searches the document data after the notation control generated in step S102 with the analysis key included in the dictionary data table, and searches for the searched character string (hereinafter referred to as “searched character string”). ), The case ID of the target case including the search character string, the position of the search character string in the document, the analysis key related to the search, the attribute information of the search character string, the semantic information, and the like, and the analysis result Held in the RAM. Thereafter, the processing shown in this flowchart ends.

図７は、本実施形態における、辞書データテーブルを用いた検索処理の結果を示す図である。なお、正規表現による検索の場合、システムに指定された区切り文字をデータの区切りとして、正規表現と一致する文字列を発見することで検索が行われる。区切り文字は自由に指定可能であり、例えば、「。」や改行を区切り文字として指定できる。また、索出文字列の位置は、文書の先頭からの文字数、バイト数や論理行数等を用いて特定することが出来る。 FIG. 7 is a diagram illustrating a result of the search process using the dictionary data table in the present embodiment. In the case of a search using a regular expression, the search is performed by finding a character string that matches the regular expression using a delimiter specified in the system as a data delimiter. The delimiter can be freely specified. For example, “.” Or a line feed can be specified as the delimiter. Further, the position of the searched character string can be specified by using the number of characters from the top of the document, the number of bytes, the number of logical lines, and the like.

以下、図６および図７を用いて説明した文書データ解析処理において用いられる、辞書データテーブルのメンテナンスのために実行されるデータベース更新処理およびデータベース最適化処理について、図８Ａから図１１を用いて説明する。 Hereinafter, database update processing and database optimization processing executed for dictionary data table maintenance used in the document data analysis processing described with reference to FIGS. 6 and 7 will be described with reference to FIGS. 8A to 11. To do.

図８Ａから図８Ｃは、本実施形態に係るデータベース更新処理の流れを示すフローチャートである。本フローチャートに示された処理は、ユーザによって作成された辞書データベースへ統合される更新用データテーブルが、文書データ解析装置１に入力され、ユーザによる、更新用データテーブルの辞書データベースへの統合指示が受け付けられたことを契機として開始される。但し、本フローチャートに示された処理は、予め設定されたスケジュールに従って、または定期的に実行されてもよい。なお、本フローチャートに示された処理は、更新用データテーブルに含まれる更新用データ毎に実行される。 8A to 8C are flowcharts showing a flow of database update processing according to the present embodiment. In the processing shown in this flowchart, an update data table to be integrated into a dictionary database created by a user is input to the document data analysis apparatus 1, and an instruction to integrate the update data table into the dictionary database is issued by the user. It is started when it is accepted. However, the processing shown in this flowchart may be executed according to a preset schedule or periodically. Note that the processing shown in this flowchart is executed for each update data included in the update data table.

ステップＳ２０１以降に示された処理の実行に先立って、更新用解析キー取得部２１は、ＬＡＮ等のネットワークや、ＵＳＢメモリ、ＣＤ−ＲＯＭ等の可搬記録媒体等を介して、更新用解析キーを含む更新用データテーブルを取得する。更新用データテーブルは、予めユーザによって作成された、更新用解析データの集合である。但し、本実施形態によれば、以下に説明するデータベース更新処理において、適切な更新処理の内容が判定されるため、ユーザは、辞書データベースの構成や現在の内容を気にすることなく、更新用データを作成することが出来る。 Prior to the execution of the processing shown in step S201 and subsequent steps, the update analysis key acquisition unit 21 performs an update analysis key via a network such as a LAN, a portable recording medium such as a USB memory or a CD-ROM, or the like. Get the update data table containing. The update data table is a collection of update analysis data created in advance by the user. However, according to the present embodiment, in the database update process described below, since the content of the appropriate update process is determined, the user can update the database without worrying about the configuration of the dictionary database or the current contents. Data can be created.

ステップＳ２０１およびステップＳ２０２では、更新用データの内容が整理される。ここで、更新用データとは、辞書データベースへのデータの追加または辞書データベース内の情報の上書き等に用いられるデータである。更新用解析キー取得部２１は、更新用データに含まれる解析キーがＮＵＬＬであるデータを削除し（ステップＳ２０１）、解析キーが完全に重複するデータを、最後に追加された１データを残して削除する（ステップＳ２０２）ことによって、更新用データの内容を整理する。 In step S201 and step S202, the contents of the update data are organized. Here, the update data is data used for adding data to the dictionary database or overwriting information in the dictionary database. The update analysis key acquisition unit 21 deletes data in which the analysis key included in the update data is NULL (step S201), leaving data with the analysis key completely duplicated, leaving the last added data. By deleting (step S202), the contents of the update data are organized.

この際、後述する検索の精度を向上させるために、更新用データに係る書式や表現、デ
ータの保持順序等を、所定のルールに従って整理し、画一化する統制処理が行われてもよい。例えば、本実施形態において、解析キーは正規表現を用いて定義されているが、正規表現で記述された解析キーの表現を、所定のルールに従って統制することによって、より精度の高い検索結果を得ることが可能な解析キーとすることが出来る。より具体的には、ＯＲ条件で結合される文字列の指定順序を、所定のルールに従った順序とすることによって、「.*知り(たい|たかった)」と「.*知り(たかった|たい)」が一致すべき解析キーであることを明らかにし、後の検索によって正しい検索結果が索出されるようにすることが出来る。 At this time, in order to improve the accuracy of the search described later, a control process for organizing and standardizing the format and expression related to the update data, the data holding order, and the like according to a predetermined rule may be performed. For example, in the present embodiment, the analysis key is defined using a regular expression, but a more accurate search result is obtained by regulating the expression of the analysis key described in the regular expression according to a predetermined rule. Can be used as an analysis key. More specifically, by specifying the specified order of the strings to be combined with the OR condition according to a predetermined rule, “. * Knew (want | wanted)” and “. * Knew (wish) It is possible to clarify that “I want” is an analysis key to be matched, and to search for a correct search result by a subsequent search.

ステップＳ２０３およびステップＳ２０４では、更新用データに含まれる解析キー（以下、「更新用解析キー」と称する）を用いた、適合程度の判定および更新処理内容の決定が行われる。適合程度判定部２３は、更新用解析キーを用いて、データベース検索部２２に辞書データベースに蓄積された全てのデータに含まれる解析キーを検索させ、解析用データに含まれる情報と更新用データに含まれる情報との適合程度を判定する（ステップＳ２０３）。ここでは、更新用解析キーの文字列（例えば、「.*知り(たい|たかった)」）
を用いた検索が行われ、データベース検索部２２は、完全一致する解析キーのみ索出する。なお、ここで「完全一致」とは、比較対象となる情報同士が、過不足なく一致していることを指す。 In step S203 and step S204, determination of the degree of conformity and determination of update processing contents are performed using an analysis key (hereinafter referred to as “update analysis key”) included in the update data. The conformity degree determination unit 23 uses the update analysis key to cause the database search unit 22 to search for an analysis key included in all data stored in the dictionary database, and uses the information included in the analysis data and the update data. The degree of matching with the included information is determined (step S203). Here, the update analysis key string (for example, ". * Know (I wanted to)")
The database search unit 22 searches only for an analysis key that completely matches. Here, “completely matched” means that the information to be compared is matched without excess or deficiency.

適合程度判定部２３による適合程度の判定が行われると、更新処理内容決定部２４は、適合程度の判定結果に応じて、辞書データベースの更新処理の内容を決定する（ステップＳ２０４）。完全一致する解析キーが索出された場合、処理はステップＳ２０５へ進む。更新用解析キーに完全一致しない（不一致である）解析キーに関する処理は、ステップＳ２０９へ進む。例えば、更新用解析キーが「.*知り(たい|たかった)」であった場合、解
析キー「.*知り(たい|たかった)」のみが完全一致する解析キーとして索出される。その
他の解析キー（例えば、「.*説明(が|は)?((ない|なし)」や「.*知り(たい|たかった).*(が|のに).*説明(が|は)?((ない|なし)」）は、完全一致とはみなされない。 When the degree of conformity is determined by the degree of conformity determination unit 23, the update process content determination unit 24 determines the content of the dictionary database update process according to the determination result of the degree of conformity (step S204). If an analysis key that matches completely is found, the process proceeds to step S205. The process related to the analysis key that does not completely match (not match) the update analysis key proceeds to step S209. For example, if the update analysis key is “. * Know”, only the analysis key “. * Know” is searched as a completely matching analysis key. Other analysis keys (for example, ". * Description (ga | is)? ((Not | none)" or ". * Know (would have wanted) *. )? ((None | None) ") is not considered an exact match.

ステップＳ２０５およびステップＳ２０６では、属性情報および意味情報が比較される。適合程度判定部２３は、更新用解析キーに完全一致した解析キーを有するデータに含まれる属性情報および意味情報が、更新用データの属性情報および意味情報に一致するか否かを判定する（ステップＳ２０５）。適合程度判定部２３による適合程度の判定が行われると、更新処理内容決定部２４は、適合程度の判定結果に応じて、辞書データベースの更新処理の内容を決定する（ステップＳ２０６）。属性情報および意味情報が、更新用データの属性情報および意味情報に一致すると判定された場合、処理はステップＳ２０７へ進む。一方、属性情報および意味情報が、更新用データの属性情報および意味情報に一致しない（即ち、属性情報および意味情報の少なくとも一方が更新用データに含まれるものと異なる）場合、処理はステップＳ２０８へ進む（ステップＳ２０６）。 In step S205 and step S206, the attribute information and the semantic information are compared. The degree-of-fit determination unit 23 determines whether or not the attribute information and semantic information included in the data having the analysis key that completely matches the update analysis key matches the attribute information and semantic information of the update data (step) S205). When the degree of conformity is determined by the degree of conformity determination unit 23, the update process content determination unit 24 determines the contents of the dictionary database update process according to the determination result of the degree of conformity (step S206). If it is determined that the attribute information and semantic information match the attribute information and semantic information of the update data, the process proceeds to step S207. On the other hand, if the attribute information and the semantic information do not match the attribute information and the semantic information of the update data (that is, at least one of the attribute information and the semantic information is different from that included in the update data), the process proceeds to step S208. Proceed (step S206).

ステップＳ２０７では、辞書データベースへの更新用データの追加がキャンセルされる。データベース更新部２９は、ステップＳ２０５において、属性情報および意味情報が、更新用データの属性情報および意味情報に一致すると判定された更新用データを、辞書データベースに追加することなく、破棄または放置する。即ち、ステップＳ２０４において辞書データベースから完全一致する解析キーが索出され、ステップＳ２０５において属性情報および意味情報が、更新用データの属性情報および意味情報に一致すると判定された更新用データは、同一のデータが既に辞書データベースに登録済みであるため、辞書データベースへの追加が行われない。その後、本フローチャートに示された処理は終了する。 In step S207, the addition of update data to the dictionary database is cancelled. In step S205, the database update unit 29 discards or leaves the update data determined to have the attribute information and semantic information matching the attribute information and semantic information of the update data without adding them to the dictionary database. That is, an analysis key that is completely matched is searched from the dictionary database in step S204, and the update data determined that the attribute information and semantic information match the attribute information and semantic information of the update data in step S205 are the same. Since the data is already registered in the dictionary database, it is not added to the dictionary database. Thereafter, the processing shown in this flowchart ends.

図９は、本実施形態に係る適合程度判定部２３による判定結果と、それに対応する処理の内容と、の関係を示す表である。図９には、適合程度判定部２３による判定結果に対応
してステップＳ２０７において実行される処理（更新用データを「登録しない」こと、即ち、辞書データベースへの更新用データの追加がキャンセルされること）が、表のＮｏ．０の行に示されている。なお、図９には、属性情報が「属性１」および「属性２」の２つのみ示されているが、属性情報の数は、２つに限定されない。これは、図１０についても同様である。 FIG. 9 is a table showing the relationship between the determination result by the suitability degree determination unit 23 according to the present embodiment and the content of the processing corresponding to the determination result. In FIG. 9, the process executed in step S207 in response to the determination result by the suitability degree determination unit 23 ("not registering update data, that is, adding update data to the dictionary database is canceled"). In the table). It is shown in row 0. In FIG. 9, only two pieces of attribute information “attribute 1” and “attribute 2” are shown, but the number of pieces of attribute information is not limited to two. The same applies to FIG.

また、図１０は、本実施形態に係るデータベース更新処理を実行した場合に、更新用データテーブルに含まれる更新用データを用いて辞書データベース内の各テーブルが更新される様子を示す図である。図１０を参照すると、図９の表のＮｏ．０に相当する更新用データであるＩＤ１のデータが、辞書データベースに既に存在するデータと完全に一致しているため、登録されないことが分かる。 FIG. 10 is a diagram illustrating a state in which each table in the dictionary database is updated using the update data included in the update data table when the database update process according to the present embodiment is executed. Referring to FIG. 10, No. in the table of FIG. It can be seen that the data of ID1 which is the update data corresponding to 0 is not registered because it completely matches the data already existing in the dictionary database.

ステップＳ２０８では、更新用データの内容を用いて、辞書データベースに登録済みのデータが更新される。データベース更新部２９は、ステップＳ２０５において、属性情報および意味情報が、更新用データの属性情報および意味情報に一致しないと判定されたデータの属性情報および意味情報をもって、ステップＳ２０４で索出された登録済みのデータの属性情報および意味情報を上書き更新する。但し、更新用データの属性情報および意味情報がＮＵＬＬである場合には、更新用データの属性情報および意味情報は、登録済みのデータの属性情報および意味情報に上書きされない。その後、本フローチャートに示された処理は終了する。 In step S208, the data already registered in the dictionary database is updated using the contents of the update data. The database updating unit 29 uses the attribute information and semantic information of the data determined in step S205 that the attribute information and semantic information do not match the attribute information and semantic information of the update data, and the registration retrieved in step S204. Overwrite and update the attribute information and semantic information of the completed data. However, when the attribute information and semantic information of the update data are NULL, the attribute information and semantic information of the update data are not overwritten with the attribute information and semantic information of the registered data. Thereafter, the processing shown in this flowchart ends.

なお、図９には、適合程度判定部２３による判定結果に対応してステップＳ２０８において実行される処理（登録済みのデータの上書き更新）が、表のＮｏ．１の行に示されている。また、図１０を参照すると、図９の表のＮｏ．１に相当する更新用データであるＩＤ２および３のデータが、辞書データベース登録済みのデータに対して上書きされることが分かる。 In FIG. 9, the process executed in step S208 corresponding to the determination result by the conformity degree determination unit 23 (overwriting update of registered data) is shown in No. of the table. It is shown in line 1. Referring to FIG. 10, No. in the table of FIG. It can be seen that the data of IDs 2 and 3 which are update data corresponding to 1 are overwritten on the data already registered in the dictionary database.

ステップＳ２０９およびステップＳ２１０では、更新用データに含まれる属性情報（以下、「更新用属性情報」と称する）を用いた、辞書データベースの検索が行われる。適合程度判定部２３は、更新用データに含まれる更新用属性情報を用いて、データベース検索部２２に辞書データベース内のデータ（但し、ステップＳ２０４の処理において完全一致したデータは除く）に含まれる属性情報を検索させ、解析用データに含まれる情報と更新用データに含まれる情報との適合程度を判定する（ステップＳ２０９）。ここでは、更新用属性情報の文字列（例えば、「意見」）を用いた検索が行われ、データベース検索部２２は、一致する属性情報を１つ以上有するデータを索出する。 In step S209 and step S210, a dictionary database search is performed using attribute information included in the update data (hereinafter referred to as “update attribute information”). Using the update attribute information included in the update data, the suitability degree determination unit 23 uses the update attribute information included in the data in the dictionary database to the database search unit 22 (however, the data included in the data in the processing in step S204 is excluded). Information is searched, and the degree of matching between the information included in the analysis data and the information included in the update data is determined (step S209). Here, a search using a character string (for example, “opinion”) of the attribute information for update is performed, and the database search unit 22 searches for data having one or more matching attribute information.

適合程度判定部２３による適合程度の判定が行われると、更新処理内容決定部２４は、適合程度の判定結果に応じて、辞書データベースの更新処理の内容を決定する（ステップＳ２１０）。一致する属性情報を１つ以上有するデータが索出された場合、処理はステップＳ２１１へ進む。更新用属性情報に一致する属性情報を有するデータが索出されなかった場合、処理はステップＳ２１９へ進む。例えば、更新用属性情報１が「応対」であり、更新用属性情報２が「意見」であった場合、属性情報１または属性情報２の少なくとも一方が「応対」または「意見」であるデータが索出される。その後、処理はステップＳ２１１へ進む。 When the degree of conformity is determined by the degree of conformity determination unit 23, the update process content determination unit 24 determines the content of the dictionary database update process according to the determination result of the degree of conformity (step S210). If data having one or more matching attribute information is found, the process proceeds to step S211. If data having attribute information that matches the update attribute information is not found, the process proceeds to step S219. For example, when the update attribute information 1 is “response” and the update attribute information 2 is “opinion”, data in which at least one of the attribute information 1 or the attribute information 2 is “response” or “opinion” Sought out. Thereafter, the process proceeds to step S211.

ステップＳ２１１およびステップＳ２１２では、更新用データの意味情報を用いた、適合程度の判定および更新処理内容の決定が行われる。適合程度判定部２３は、データベース検索部２２に、ステップＳ２０９において索出された、完全一致する属性情報を１つ以上有するデータ群から、更新用データの意味情報に文字列として最長一致する意味情報を有するデータを索出させることで、解析用データに含まれる情報と更新用データに含まれ
る情報との適合程度を判定する（ステップＳ２１１）。 In step S211 and step S212, determination of the degree of conformity and determination of update processing contents are performed using the semantic information of the update data. The degree-of-fit determination unit 23 searches the database search unit 22 for the longest-matching semantic information as a character string from the data group searched for in step S209 and having one or more completely matching attribute information. The degree of matching between the information included in the analysis data and the information included in the update data is determined by searching for the data having (step S211).

適合程度判定部２３による適合程度の判定が行われると、更新処理内容決定部２４は、適合程度の判定結果に応じて、辞書データベースの更新処理の内容を決定する（ステップＳ２１２）。索出されたデータの意味情報が、更新用データの意味情報に完全一致している場合、処理はステップＳ２１３へ進む。索出されたデータの意味情報が、更新用データの意味情報に部分一致している場合、処理はステップＳ２１４へ進む。一方、更新用データの意味情報に一致する意味情報を含むデータが索出されなかった（不一致であった）場合、処理はステップＳ２１５へ進む。 When the degree of conformity is determined by the degree of conformity determination unit 23, the update process content determination unit 24 determines the content of the dictionary database update process according to the determination result of the degree of conformity (step S212). If the semantic information of the retrieved data completely matches the semantic information of the update data, the process proceeds to step S213. If the semantic information of the retrieved data partially matches the semantic information of the update data, the process proceeds to step S214. On the other hand, if data including semantic information that matches the semantic information of the update data has not been found (does not match), the process proceeds to step S215.

ここで、部分一致とは、索出されたデータの意味情報と、更新用データの意味情報との間で、複数ある意味情報のうち全ては一致していないが１以上が一致している場合の他、完全一致はしていないが所定の下限長（閾値）以上の文字数一致している場合を含む。また、不一致とは、索出されたデータの意味情報と、更新用データの意味情報との間で、一致する意味情報が１つもない場合の他、所定の下限長（閾値）未満の文字列のみ一致している場合を含む。 Here, partial match is when the semantic information of the retrieved data and the semantic information of the update data do not match all of the multiple semantic information, but one or more match. In addition, there is a case where the number of characters matches the predetermined lower limit length (threshold value) but is not completely matched. Inconsistency is a character string that is less than a predetermined lower limit length (threshold) in addition to the case where there is no matching semantic information between the semantic information of the retrieved data and the semantic information of the update data. Including the case where only matches.

ステップＳ２１３では、更新用データが辞書データベースに追加される。データベース更新部２９は、索出されたデータの意味情報が、更新用データの意味情報に完全一致している場合、更新用データを、ステップＳ２１１において索出された、意味情報が完全一致するデータ群の末尾に追加する。その後、本フローチャートに示された処理は終了する。 In step S213, the update data is added to the dictionary database. If the semantic information of the retrieved data completely matches the semantic information of the update data, the database update unit 29 searches the update data for the data whose semantic information is completely matched in step S211. Append to the end of the group. Thereafter, the processing shown in this flowchart ends.

なお、図９には、適合程度判定部２３による判定結果に対応してステップＳ２１３において実行される処理（更新用データが、意味情報が完全一致するデータ群の末尾に追加されること）が、表のＮｏ．２、３、８、９、１４および１５の行に示されている。また、図１０を参照すると、図９の表のＮｏ．２、３、８、９、１４および１５に相当する更新用データであるＩＤ４、５、１０、１１、１６および１７のデータが、意味情報が完全一致するデータ群の末尾に追加されることが分かる。 In FIG. 9, the process executed in step S213 corresponding to the determination result by the suitability degree determination unit 23 (the update data is added to the end of the data group whose semantic information completely matches) No. in the table. 2, 3, 8, 9, 14 and 15 are shown. Referring to FIG. 10, No. in the table of FIG. Data of IDs 4, 5, 10, 11, 16, and 17 that are update data corresponding to 2, 3, 8, 9, 14, and 15 may be added to the end of the data group whose semantic information completely matches. I understand.

ステップＳ２１４では、更新用データが辞書データベースに追加される。データベース更新部２９は、索出されたデータの意味情報が、更新用データの意味情報に部分一致している場合、更新用データを、ステップＳ２１１において索出された、意味情報が部分一致するデータ群の末尾に追加する。その後、本フローチャートに示された処理は終了する。 In step S214, the update data is added to the dictionary database. When the semantic information of the retrieved data partially matches the semantic information of the update data, the database update unit 29 retrieves the update data from the data whose semantic information partially matches in step S211. Append to the end of the group. Thereafter, the processing shown in this flowchart ends.

なお、図９には、適合程度判定部２３による判定結果に対応してステップＳ２１４において実行される処理（更新用データが、意味情報が部分一致するデータ群の末尾に追加されること）が、表のＮｏ．４、５、１０、１１、１６および１７の行に示されている。また、図１０を参照すると、図９の表のＮｏ．４、５、１０、１１、１６および１７に相当する更新用データであるＩＤ６、７、１２、１３、１８および１９のデータが、意味情報が部分一致するデータ群の末尾に追加されることが分かる。 In FIG. 9, the process executed in step S214 corresponding to the determination result by the suitability degree determination unit 23 (the update data is added to the end of the data group in which the semantic information partially matches) No. in the table. 4, 5, 10, 11, 16 and 17 are shown. Referring to FIG. 10, No. in the table of FIG. Data of IDs 6, 7, 12, 13, 18, and 19 that are update data corresponding to 4, 5, 10, 11, 16, and 17 may be added to the end of the data group that partially matches the semantic information. I understand.

ステップＳ２１５およびステップＳ２１６では、同一属性情報内で「解析キー」が比較される。適合程度判定部２３は、ステップＳ２０９において索出された、完全一致する属性情報を１つ以上有するデータに含まれる解析キーと、更新用解析キーとを比較することで、解析用データに含まれる情報と更新用データに含まれる情報との適合程度を判定する（ステップＳ２１５）。 In step S215 and step S216, the “analysis key” is compared in the same attribute information. The conformity degree determination unit 23 compares the analysis key included in the data having one or more completely matched attribute information found in step S209 with the update analysis key, and is included in the analysis data. The degree of matching between the information and the information included in the update data is determined (step S215).

適合程度判定部２３による適合程度の判定が行われると、更新処理内容決定部２４は、適合程度の判定結果に応じて、辞書データベースの更新処理の内容を決定する（ステップＳ２１６）。比較の結果、ステップＳ２０９において索出されたデータに、更新用解析キ
ーに部分一致する解析キーがある場合、処理はステップＳ２１７へ進む。一方、比較の結果、ステップＳ２０９において索出されたデータに、更新用解析キーに一致する解析キーがない場合、処理はステップＳ２１８へ進む。 When the degree of conformity is determined by the degree of conformity determination unit 23, the update process content determination unit 24 determines the contents of the dictionary database update process according to the determination result of the degree of conformity (step S216). As a result of the comparison, if there is an analysis key that partially matches the update analysis key in the data retrieved in step S209, the process proceeds to step S217. On the other hand, as a result of the comparison, if there is no analysis key that matches the update analysis key in the data retrieved in step S209, the process proceeds to step S218.

ステップＳ２１７では、更新用データが辞書データベースに追加される。データベース更新部２９は、ステップＳ２０９において索出されたデータに、更新用解析キーに部分一致する解析キーがある場合、更新用データを、部分一致に係るデータ群の末尾に追加する（ステップＳ２１７）。その後、本フローチャートに示された処理は終了する。 In step S217, the update data is added to the dictionary database. If there is an analysis key that partially matches the update analysis key in the data retrieved in step S209, the database update unit 29 adds the update data to the end of the data group related to the partial match (step S217). . Thereafter, the processing shown in this flowchart ends.

なお、図９には、適合程度判定部２３による判定結果に対応してステップＳ２１７において実行される処理（更新用データが、部分一致に係るデータ群の末尾に追加されること）が、表のＮｏ．６、１２および１８の行に示されている。また、図１０を参照すると、図９の表のＮｏ．６、１２および１８に相当する更新用データであるＩＤ８、１４および２０のデータが、辞書データベース内の更新用解析キーが部分一致するデータ群の末尾に追加されることが分かる。 In FIG. 9, the processing executed in step S217 in response to the determination result by the suitability degree determination unit 23 (the update data is added to the end of the data group related to partial matching) is shown in the table. No. 6, 12 and 18 are shown. Referring to FIG. 10, No. in the table of FIG. It can be seen that the data of IDs 8, 14 and 20 which are update data corresponding to 6, 12 and 18 are added to the end of the data group in which the update analysis keys in the dictionary database partially match.

ステップＳ２１８では、更新用データが辞書データベースに追加される。データベース更新部２９は、ステップＳ２０９において索出されたデータに、更新用解析キーに一致する解析キーがない場合、更新用データを、同一の属性情報を有するデータ群の末尾に追加する（ステップＳ２１８）。その後、本フローチャートに示された処理は終了する。 In step S218, the update data is added to the dictionary database. If the data retrieved in step S209 does not have an analysis key that matches the update analysis key, the database update unit 29 adds the update data to the end of the data group having the same attribute information (step S218). ). Thereafter, the processing shown in this flowchart ends.

なお、図９には、適合程度判定部２３による判定結果に対応してステップＳ２１８において実行される処理（更新用データが、同一の属性情報を有するデータ群の末尾に追加されること）が、表のＮｏ．７、１３および１９の行に示されている。また、図１０を参照すると、図９の表のＮｏ．７、１３および１９に相当する更新用データであるＩＤ９、１５および２１のデータが、辞書データベース内の同一の属性情報を有するデータ群の末尾に追加されることが分かる。 In FIG. 9, the process executed in step S218 corresponding to the determination result by the suitability degree determination unit 23 (the update data is added to the end of the data group having the same attribute information) No. in the table. 7, 13 and 19 are shown. Referring to FIG. 10, No. in the table of FIG. It can be seen that the data of IDs 9, 15 and 21 which are update data corresponding to 7, 13, and 19 are added to the end of the data group having the same attribute information in the dictionary database.

ステップＳ２１９およびステップＳ２２０では、更新用データに含まれる意味情報（以下、「更新用意味情報」と称する）を用いた、適合程度の判定および更新処理内容の決定が行われる。適合程度判定部２３は、更新用データに含まれる更新用意味情報を用いて、データベース検索部２２に、辞書データベース内のデータ（但し、ステップＳ２０４の処理において完全一致したデータは除く）に含まれる意味情報を検索させ、解析用データに含まれる情報と更新用データに含まれる情報との適合程度を判定する（ステップＳ２１９）。ここでは、更新用意味情報の文字列（例えば、「知りたい」）を用いた検索が行われる。 In steps S219 and S220, determination of the degree of conformity and determination of update processing contents are performed using semantic information included in the update data (hereinafter referred to as “update semantic information”). The conformity degree determination unit 23 uses the update semantic information included in the update data, and the database search unit 22 includes the data in the dictionary database (however, excluding the data that completely matches in the process of step S204). The semantic information is searched, and the degree of matching between the information included in the analysis data and the information included in the update data is determined (step S219). Here, a search using a character string (for example, “I want to know”) of the semantic information for update is performed.

適合程度判定部２３による適合程度の判定が行われると、更新処理内容決定部２４は、適合程度の判定結果に応じて、辞書データベースの更新処理の内容を決定する（ステップＳ２２０）。索出されたデータの意味情報が、更新用データの意味情報に完全一致している場合、処理はステップＳ２２１へ進む。索出されたデータの意味情報が、更新用データの意味情報に部分一致している場合、処理はステップＳ２２２へ進む。一方、更新用データの意味情報に一致する意味情報を含むデータが索出されなかった（不一致）場合、処理はステップＳ２２３へ進む。 When the degree of conformity is determined by the degree of conformity determination unit 23, the update process content determination unit 24 determines the content of the dictionary database update process according to the determination result of the degree of conformity (step S220). If the semantic information of the retrieved data completely matches the semantic information of the update data, the process proceeds to step S221. If the semantic information of the retrieved data partially matches the semantic information of the update data, the process proceeds to step S222. On the other hand, when data including semantic information that matches the semantic information of the update data is not found (non-coincidence), the process proceeds to step S223.

ステップＳ２２１からステップＳ２２３では、更新用データが辞書データベースに追加される。なお、ステップＳ２２１およびステップＳ２２２に係る処理の内容はステップＳ２１３およびステップＳ２１４と概略同様であるため、説明を省略する。 In steps S221 to S223, the update data is added to the dictionary database. Note that the contents of the processes relating to step S221 and step S222 are substantially the same as those of step S213 and step S214, and thus the description thereof is omitted.

なお、図９には、適合程度判定部２３による判定結果に対応してステップＳ２２１にお
いて実行される処理（更新用データが、意味情報が完全一致するデータ群の末尾に追加されること）が、表のＮｏ．２０および２１の行に示されており、適合程度判定部２３による判定結果に対応してステップＳ２２２において実行される処理（更新用データが、意味情報が部分一致するデータ群の末尾に追加されること）が、表のＮｏ．２２および２３の行に示されている。また、図１０を参照すると、図９の表のＮｏ．２０および２１に相当する更新用データであるＩＤ２２および２３のデータが、意味情報が完全一致するデータ群の末尾に追加され、図９の表のＮｏ．２２および２３に相当する更新用データであるＩＤ２４および２５のデータが、意味情報が部分一致するデータ群の末尾に追加されることが分かる。 In FIG. 9, the process executed in step S <b> 221 corresponding to the determination result by the suitability degree determination unit 23 (update data is added to the end of the data group whose semantic information completely matches) No. in the table. 20 and 21, and the processing executed in step S 222 corresponding to the determination result by the conformity degree determination unit 23 (update data is added to the end of the data group whose semantic information partially matches. In the table). It is shown in lines 22 and 23. Referring to FIG. 10, No. in the table of FIG. The data of IDs 22 and 23 which are update data corresponding to 20 and 21 are added to the end of the data group in which the semantic information completely matches, and No. in the table of FIG. It can be seen that the data of IDs 24 and 25, which are update data corresponding to 22 and 23, are added to the end of the data group in which the semantic information partially matches.

ステップＳ２２３では、更新用データが辞書データベースに追加される。データベース更新部２９は、索出されたデータの意味情報が、更新用データの意味情報に一致していない場合、更新用データを、辞書データベース内の最後のテーブルの末尾に追加する（ステップＳ２２３）。即ち、本実施形態では、分類不能である解析キーを含むデータを蓄積するためのテーブルとして、辞書データベース内の最後のテーブルが用いられる。 In step S223, the update data is added to the dictionary database. When the semantic information of the retrieved data does not match the semantic information of the update data, the database update unit 29 adds the update data to the end of the last table in the dictionary database (step S223). . That is, in the present embodiment, the last table in the dictionary database is used as a table for storing data including analysis keys that cannot be classified.

なお、図９には、適合程度判定部２３による判定結果に対応してステップＳ２２３において実行される処理（更新用データが最後のテーブルの末尾に追加されること）が、表のＮｏ．２４および２５の行に示されている。また、図１０を参照すると、図９の表のＮｏ．２４および２５に相当する更新用データであるＩＤ２６および２７のデータが、辞書データベース内の最後のテーブルの末尾に追加されることが分かる。 In FIG. 9, the process executed in step S223 corresponding to the determination result by the suitability degree determination unit 23 (adding update data to the end of the last table) It is shown in lines 24 and 25. Referring to FIG. 10, No. in the table of FIG. It can be seen that the data of IDs 26 and 27, which are update data corresponding to 24 and 25, are added to the end of the last table in the dictionary database.

本実施形態に係る文書データ解析装置１によれば、更新用データに含まれる情報と、更新の対象である辞書データベースに含まれる内容とのパターン一致から、辞書データの追加・更新に係る位置を特定し、辞書データベースを更新することが出来る。なお、本実施形態では、図８Ａから図８Ｃのフローチャートに示された処理を実行することによって更新処理の内容が決定されるが、このような方法に代えて、属性情報、意味情報、および解析キーについて適合程度の判定を行い、図９に示されたような表を参照することによって、更新処理の内容を判定する方法が採用されてもよい。 According to the document data analysis apparatus 1 according to the present embodiment, the position related to addition / update of dictionary data is determined based on the pattern match between the information included in the update data and the content included in the dictionary database to be updated. You can identify and update the dictionary database. In this embodiment, the contents of the update process are determined by executing the processes shown in the flowcharts of FIGS. 8A to 8C. Instead of such a method, attribute information, semantic information, and analysis are performed. A method of determining the content of the update process by determining the degree of conformity for the key and referring to a table as shown in FIG. 9 may be adopted.

図１１は、本実施形態に係るデータベース最適化処理の流れを示すフローチャートである。本フローチャートに示された処理は、ユーザによるデータベース最適化処理の実行の指示が受け付けられたことを契機として開始される。但し、本フローチャートに示された処理は、予め設定されたスケジュールに従って、または定期的に実行されてもよい。 FIG. 11 is a flowchart showing the flow of database optimization processing according to this embodiment. The process shown in this flowchart is started when a user receives an instruction to execute a database optimization process. However, the processing shown in this flowchart may be executed according to a preset schedule or periodically.

ステップＳ３０１では、辞書データベースに含まれる解析キーを用いて、辞書データベースが検索される。データベース検索部２２は、辞書データベースに蓄積されている解析キーを用いて、辞書データベースを検索することで、検索に用いられる解析キー（以下、「検索用解析キー」とも称する）を含む解析キーを抽出する。データベース検索部２２は、辞書データベースに含まれる全ての解析キーについて、検索用解析キーの選択と検索を繰り返す。このため、データベース検索部２２は、ステップＳ３０１における処理の結果、辞書データベースに含まれる全ての解析キーについて、解析キー毎に、辞書データベースの検索結果を得る。その後、処理はステップＳ３０２へ進む。 In step S301, the dictionary database is searched using the analysis key included in the dictionary database. The database search unit 22 searches the dictionary database using the analysis keys stored in the dictionary database, and thereby includes an analysis key including an analysis key used for the search (hereinafter also referred to as “search analysis key”). Extract. The database search unit 22 repeats selection and search of search analysis keys for all analysis keys included in the dictionary database. For this reason, the database search unit 22 obtains a dictionary database search result for each analysis key for all analysis keys included in the dictionary database as a result of the processing in step S301. Thereafter, the process proceeds to step S302.

ステップＳ３０２では、単位解析キーが抽出される。単位解析キー抽出部２５は、ステップＳ３０１における検索の結果、自身を検索キーとして用いた場合にのみ索出された１または複数の解析キーを、単位解析キーとして抽出する。ここで、自身を検索キーとして用いた場合にのみ索出される解析キーとは、換言すれば、自身以外の解析キーを検索キーとして用いた場合には索出されない解析キーである。単位解析キーは、解析キーの構成を把握するための基準となる解析キーであり、他の解析キーを包含しないため、解析キーの
構成を把握するための最小単位として用いることが出来る。抽出された単位解析キーはテーブルにまとめられ、単位解析キーテーブル（単位解析キー辞書、最小単位辞書）が生成される。 In step S302, a unit analysis key is extracted. The unit analysis key extraction unit 25 extracts one or a plurality of analysis keys searched out only when using the search key as a result of the search in step S301 as a unit analysis key. Here, the analysis key that is searched only when it is used as a search key is, in other words, an analysis key that is not searched when an analysis key other than itself is used as a search key. The unit analysis key is an analysis key serving as a reference for grasping the configuration of the analysis key, and does not include other analysis keys, and therefore can be used as a minimum unit for grasping the configuration of the analysis key. The extracted unit analysis keys are collected in a table, and a unit analysis key table (unit analysis key dictionary, minimum unit dictionary) is generated.

ステップＳ３０３では、単位解析キーに基づいて、解析キーの構成が把握される。構成把握部２６は、単位解析キーを用いて辞書データベース内を検索し、辞書データベースに含まれる解析キーの構成を把握する。換言すれば、構成把握部２６は、辞書データベースに含まれる解析キー毎に、単位解析キーの使用パターンを把握する。ここで、使用パターンとは、解析キーにおける、単位解析キーの一致の状況（完全一致、部分一致および部分一致する箇所）を示す情報である。例えば、解析キーが「.*知り(たい|たかった).*(が|
のに).*説明(が|は)?((ない|なし)」である場合、構成把握部２６は、単位解析キーを用
いた検索によって、解析キー「.*知り(たい|たかった).*(が|のに).*説明(が|は)?((ない|なし)」が、単位解析キー「.*知り(たい|たかった)」、単位解析キー「.*(が|のに)」および単位解析キー「.*説明(が|は)?((ない|なし)」の３つの単位解析キーによって構成されていることを把握する。把握された構成は、単位解析キーの識別情報の組み合わせによって管理することが出来る。その後、処理はステップＳ３０４へ進む。 In step S303, the configuration of the analysis key is grasped based on the unit analysis key. The configuration grasping unit 26 searches the dictionary database using the unit analysis key, and grasps the configuration of the analysis key included in the dictionary database. In other words, the configuration grasping unit 26 grasps the usage pattern of the unit analysis key for each analysis key included in the dictionary database. Here, the usage pattern is information indicating the unit analysis key match status (complete match, partial match, and partially matched location) in the analysis key. For example, the analysis key is ". * Know (I wanted to). * (
In the case where the description is “(||)” ((not | none)], the configuration grasping unit 26 performs the analysis using the unit analysis key, and the analysis key “. . * (But | but). * Explanation (but | is)? ((Not | none)] is the unit analysis key ". * Know (I wanted to)", unit analysis key ". * (Is | ) ”And the unit analysis key“. * Description (but | is)? ((Not | none) ”]. Then, the process proceeds to step S304.

ステップＳ３０４では、追加単位解析キーが抽出および設定される。追加単位解析キー抽出部２７は、ステップＳ３０２において生成された単位解析キーテーブルに含まれる単位解析キーの何れにも該当しない文字列を、追加単位解析キーとして抽出する。ここで、追加単位解析キー抽出部２７は、このような解析キーを抽出するために、ステップＳ３０３において把握された解析キー構成を参照する。例えば、解析キーが「.*知り(たい|たかった).*(電話した).*説明(が|は)?(ない|なし)」であり、単位解析キー「.*知り(たい|たかった)」および単位解析キー「.*説明(が|は)?(ない|なし)」は単位解析キーテーブルに存在するが、「.*(電話した)」という単位解析キーが存在しない場合、追加単位解析キー抽出部２７は、「.*(電話した)」を、追加単位解析キーとして新たに抽出する。 In step S304, an additional unit analysis key is extracted and set. The additional unit analysis key extraction unit 27 extracts a character string that does not correspond to any of the unit analysis keys included in the unit analysis key table generated in step S302 as an additional unit analysis key. Here, the additional unit analysis key extraction unit 27 refers to the analysis key configuration grasped in step S303 in order to extract such an analysis key. For example, the analysis key is ``. * Know (I wanted to). * (Called). * Explanation (but | is)? ) ”And the unit analysis key“. * Description (but | is)? (Not | none) ”exist in the unit analysis key table, but the unit analysis key“. * (Called) ”does not exist, The additional unit analysis key extraction unit 27 newly extracts “. * (Called)” as an additional unit analysis key.

そして、追加単位解析キー設定部２８は、解析キー中の、単位解析キーの何れにも該当しない文字列から意味情報を生成し、これを新たに抽出された追加単位解析キーの意味情報として設定する。例えば、解析キー中の、単位解析キーの何れにも該当しない文字列が「.*(電話した)」である場合、追加単位解析キーの意味情報として、正規表現のための表現を除いた文字列「電話した」が生成され、設定される。 Then, the additional unit analysis key setting unit 28 generates semantic information from a character string that does not correspond to any of the unit analysis keys in the analysis key, and sets this as semantic information of the newly extracted additional unit analysis key. To do. For example, if the character string that does not correspond to any of the unit analysis keys in the analysis key is ``. * (Called) '', the character excluding the expression for regular expression is used as the semantic information of the additional unit analysis key The column “Called” is generated and set.

但し、解析キー中の、単位解析キーの何れにも該当しない文字列が、接続詞「が」や「は」等、文字列自体として意味を有さない（直接意味を持たない）文字列である場合がある。本実施形態では、このような場合、追加単位解析キー抽出部２７は、単位解析キーを情報の記述の単位（最小単位）とするために、このような文字列を追加単位解析キーとして抽出しない。 However, the character string that does not correspond to any of the unit analysis keys in the analysis key is a character string that has no meaning as the character string itself (no direct meaning) such as the conjunctions “ga” and “ha”. There is a case. In this embodiment, in such a case, the additional unit analysis key extraction unit 27 does not extract such a character string as an additional unit analysis key in order to use the unit analysis key as a unit of information description (minimum unit). .

なお、解析キーに含まれる単位解析キーの使用頻度が所定の閾値よりも低い場合には、このような解析キー全体を追加単位解析キーとして抽出することとしてもよい。この場合、追加単位解析キー抽出部２７は、ステップＳ３０２において生成された単位解析キーテーブルに含まれる単位解析キーの何れにも該当しない文字列を含む解析キーを、追加単位解析キーとして抽出する。例えば、解析キーが「.*知り(たい|たかった).*(が|のに).*説明(が|は)?((ない|なし)」であり、単位解析キー「.*知り(たい|たかった)」および単位
解析キー「.*説明(が|は)?((ない|なし)」は単位解析キーテーブルに存在するが、「.*(
が|のに)」という単位解析キーが存在しない場合、「.*(が|のに)」は、文字列自体とし
て意味を有さない文字列である。但し、単位解析キー「.*知り(たい|たかった)」および
単位解析キー「.*説明(が|は)?((ない|なし)」の使用頻度が所定の閾値よりも低い場合、追加単位解析キー抽出部２７は、「.*知り(たい|たかった).*(が|のに).*説明(が|は)?((
ない|なし)」を、追加単位解析キーとして新たに抽出してもよい。 In addition, when the use frequency of the unit analysis key included in the analysis key is lower than a predetermined threshold, such an entire analysis key may be extracted as the additional unit analysis key. In this case, the additional unit analysis key extraction unit 27 extracts an analysis key including a character string that does not correspond to any of the unit analysis keys included in the unit analysis key table generated in step S302 as an additional unit analysis key. For example, the analysis key is ``. * Know (I wanted to). ) "And the unit analysis key". * Description (but | is)? ((Not | none) "exist in the unit analysis key table, but". * (
When there is no unit analysis key such as “.”, “. *” Is a character string that has no meaning as the character string itself. However, if the usage frequency of the unit analysis key ". * Know (I want to)" and the unit analysis key ". * Description (is | is)? ((None | None)" is lower than the predetermined threshold, it is added The unit analysis key extraction unit 27 reads “. * Know (I want to know).
"None | None)" may be newly extracted as an additional unit analysis key.

そして、単位解析キーの何れにも該当しない文字列を含む解析キーが追加単位解析キーとして抽出された場合、追加単位解析キー設定部２８は、新たに抽出された追加単位解析キーの意味情報として、追加単位解析キーに含まれる他の単位解析キーの意味情報を設定する。例えば、追加単位解析キーが「.*知り(たい|たかった).*(が|のに).*説明(が|は)?((ない|なし)」であり、単位解析キー「.*知り(たい|たかった)」および単位解析キー「.*説明(が|は)?((ない|なし)」を含む場合、追加単位解析キーの意味情報として、これら
の単位解析キーの意味情報「知りたい」および「説明がない」が設定される。 When an analysis key including a character string that does not correspond to any of the unit analysis keys is extracted as an additional unit analysis key, the additional unit analysis key setting unit 28 uses the newly extracted additional unit analysis key as semantic information. The semantic information of other unit analysis keys included in the additional unit analysis key is set. For example, the additional unit analysis key is ``. * Know (I wanted to). If you include `` Know (Tai | Takata) '' and the unit analysis key ``. * Description (but | is)? ((Not | None) '', the semantic information of these unit analysis keys as additional unit analysis key semantic information “I want to know” and “No explanation” are set.

ステップＳ３０５では、辞書データベースが更新される。データベース更新部２９は、辞書データベースに含まれる解析キーの意味情報を、ステップＳ３０３において把握された構成に含まれる単位解析キーに関連づけられた意味情報を用いて更新する。例えば、解析キーが「.*知り(たい|たかった).*説明(が|は)?((ない|なし)」であり、単位解析キー
「.*知り(たい|たかった)」および単位解析キー「.*説明(が|は)?((ない|なし)」を含む
場合、解析キーの意味情報として、これらの単位解析キーの意味情報「知りたい」および「説明がない」が追加または上書きされる。その後、本フローチャートに示された処理は終了する。 In step S305, the dictionary database is updated. The database update unit 29 updates the semantic information of the analysis key included in the dictionary database using the semantic information associated with the unit analysis key included in the configuration grasped in step S303. For example, the analysis key is “. * Know (would have wanted). When the analysis key ". * Description (ga | ha)? ((None | None)" is included, the semantic information of these unit analysis keys "I want to know" and "No description" are added as the semantic information of the analysis key After that, the process shown in this flowchart ends.

本実施形態に係る文書データ解析装置１によれば、情報の記述の単位（最小単位）である単位解析キーとその意味情報を取得し、解析キーを、単位解析キーの意味情報で表現し構造化することによって、意味情報に基づいた辞書の最適化が行われる。このため、本実施形態に係る文書データ解析装置１によれば、意味構造的な見地から最適化された正規表現辞書の構築が可能となり、辞書メンテナンスに関わる作業を効率化することが出来る。また、本実施形態に係る文書データ解析装置１によれば、辞書管理者は、正規表現辞書の登録内容の追加、修正、更新に関わる作業において、辞書内容の意味を容易に理解出来る。 According to the document data analysis apparatus 1 according to the present embodiment, a unit analysis key that is a unit of information description (minimum unit) and its semantic information are acquired, and the analysis key is expressed by the semantic information of the unit analysis key and structured. By optimizing, dictionary optimization based on semantic information is performed. For this reason, according to the document data analysis apparatus 1 according to the present embodiment, it is possible to construct a regular expression dictionary optimized from the viewpoint of semantic structure, and it is possible to improve the work related to dictionary maintenance. Further, according to the document data analysis apparatus 1 according to the present embodiment, the dictionary administrator can easily understand the meaning of the dictionary contents in the work related to the addition, correction, and update of the registered contents of the regular expression dictionary.

１文書データ解析装置
２３適合程度判定部
２４更新処理内容決定部
２５単位解析キー抽出部
２６構成把握部
２９データベース更新部
３０文書データ解析部 DESCRIPTION OF SYMBOLS 1 Document data analysis apparatus 23 Conformity degree determination part 24 Update process content determination part 25 Unit analysis key extraction part 26 Structure grasp part 29 Database update part 30 Document data analysis part

Claims

An information processing apparatus connected to a database in which analysis data including an analysis key for analyzing document data is stored,
Unit analysis key extraction means for extracting one or more analysis keys serving as a reference for grasping the configuration of the analysis key from the database as unit analysis keys;
Using the unit analysis key, configuration grasping means for grasping the configuration of the analysis key included in the database;
Database updating means for updating information associated with the analysis key included in the database using information associated with the unit analysis key according to the structure grasped by the structure grasping means;
An information processing apparatus comprising:

The unit analysis key extracting means extracts, from the database, an analysis key that is searched only when using itself as a search key, as the unit analysis key.
The information processing apparatus according to claim 1.

Further comprising database search means for searching the database using the analysis key stored in the database;
The unit analysis key extracting means extracts, as a unit analysis key, an analysis key that is searched only when using the search key by itself as a result of the search by the database search means.
The information processing apparatus according to claim 2.

In the configuration ascertained by the configuration grasping means, further comprising an additional unit analysis key extracting means for further extracting a character string that does not correspond to any of the unit analysis keys as an additional unit analysis key.
The information processing apparatus according to any one of claims 1 to 3.

The system further comprises additional unit analysis key setting means for setting semantic information generated based on a character string that does not correspond to any of the unit analysis keys as the semantic information of the additional unit analysis key.
The information processing apparatus according to claim 4.

The database update means updates the semantic information of the analysis key associated with the analysis key included in the database using the semantic information of the unit analysis key associated with the unit analysis key.
The information processing apparatus according to any one of claims 1 to 5.

The information processing apparatus according to claim 1, wherein the analysis key is defined using a regular expression.

A computer connected to a database in which analysis data including an analysis key for analyzing document data is stored,
A unit analysis key extraction step of extracting one or a plurality of analysis keys serving as a reference for grasping the configuration of the analysis key from the database as a unit analysis key;
Using the unit analysis key, a configuration grasping step for grasping the configuration of the analysis key included in the database,
A database update step for updating information associated with an analysis key included in the database using information associated with the unit analysis key according to the configuration grasped in the configuration grasping step;
To update the database.

To a computer connected to a database that stores analysis data including analysis keys for analyzing document data,
A unit analysis key extraction step of extracting one or a plurality of analysis keys serving as a reference for grasping the configuration of the analysis key from the database as a unit analysis key;
Using the unit analysis key, a configuration grasping step for grasping the configuration of the analysis key included in the database,
A database update step for updating information associated with an analysis key included in the database using information associated with the unit analysis key according to the configuration grasped in the configuration grasping step;
Database update program to execute