JP2010198141A5

JP2010198141A5 -

Info

Publication number: JP2010198141A5
Application number: JP2009039999A
Authority: JP
Filing date: 2009-02-23
Publication date: 2012-04-05
Anticipated expiration: 2029-02-23

Description

請求項２に記載の発明は請求項１に記載のデータベースの作成装置において、前記カテゴリ設定手段が、語句を分類するための目的カテゴリおよび前記分類の目的外の非目的カテゴリを設定することを特徴とする。 According to a second aspect of the present invention, in the database creation device according to the first aspect, the category setting means sets a target category for classifying words and a non-target category outside the purpose of the classification. And

請求項３に記載の発明は、請求項１または請求項２に記載のデータベースの作成装置において、外部から前記基準語句の入力を受け付ける入力手段を更に備えたことを特徴とする。この場合、ユーザが基準語句を入力するだけで、容易にカテゴリ分類ができる。 According to a third aspect of the present invention, in the database creation device according to the first or second aspect of the present invention, the database creation device further includes an input unit that receives an input of the reference phrase from the outside . In this case, the user can easily categorize by simply inputting the reference phrase.

請求項４に記載の発明は、請求項１から請求項３のいずれか１項に記載のデータベースの作成装置において、前記基準語句と前記共起語句との関連の強さを示す重み係数を算出する重み係数算出手段を更に有し、前記重み係数に基づき前記配属スコアを算出することを特徴とする。 According to a fourth aspect of the present invention, in the database creation device according to any one of the first to third aspects, a weighting coefficient indicating the strength of association between the reference phrase and the co-occurrence phrase is calculated. And a weighting coefficient calculating means for calculating the assignment score based on the weighting coefficient.

請求項５に記載の発明は、請求項４に記載のデータベースの作成装置において、前記配属候補語句を前記カテゴリの前記基準語句に加えた際に、前記重み係数を更新する重み係数更新手段を更に有することを特徴とする。 According to a fifth aspect of the present invention, in the database creation device according to the fourth aspect , when the assignment candidate word / phrase is added to the reference word / phrase of the category, the weight coefficient updating means for updating the weight coefficient is further provided. It is characterized by having.

請求項６に記載の発明は、請求項５に記載のデータベースの作成装置において、前記共起語句が、複数の前記カテゴリの基準語句に対する共起語句となる場合、前記重み係数の値を減少させることを特徴とする。 According to a sixth aspect of the present invention, in the database creation device according to the fifth aspect , when the co-occurrence word / phrase is a co-occurrence word / phrase for a plurality of reference words / phrases of the category, the value of the weighting factor is decreased. It is characterized by that.

請求項７に記載の発明は、請求項１から請求項６のいずれか１項に記載のデータベースの作成装置において、前記配属候補語句について、前記共起語句との前記共起関連性を、共起頻度に基づき算出することを特徴とする。この場合、共起関連性を統計的に求め、さらに分類精度が向上する。 Invention according to claim 7, in the preparation device database according to any one of claims 1 to 6, for the assignment candidate phrase, the co-occurrence relationship between the co-occurrence phrase, co The calculation is based on the occurrence frequency. In this case, the co-occurrence relevance is obtained statistically, and the classification accuracy is further improved.

請求項８に記載の発明は、請求項１から請求項７のいずれか１項に記載のデータベースの作成装置において、前記共起語句が、前記基準語句と係り受け関係を持つ語句であることを特徴とする。 The invention according to claim 8 is the database creation device according to any one of claims 1 to 7 , wherein the co-occurrence word / phrase is a word / phrase having a dependency relationship with the reference word / phrase. Features.

請求項９に記載の発明は、請求項１から請求項８のいずれか１項に記載のデータベースの作成装置において、前記文書から語句を抽出する際、前記語句の品詞の組み合せパターンに基づき、前記文書中で隣接する複数の前記語句から複合語句を作成する複合語句作成手段を更に有することを特徴とする。 According to a ninth aspect of the present invention, in the database creation device according to any one of the first to eighth aspects, when extracting a phrase from the document, based on a combination pattern of parts of speech of the phrase, It further comprises a compound phrase creating means for creating a compound phrase from a plurality of adjacent phrases in the document.

請求項１０に記載の発明は、コンピュータにより実行させるデータベースを作成するデータベースの作成方法であって、語句を分類するためのカテゴリを設定するカテゴリ設定ステップと、前記カテゴリごとに１または２以上の基準語句の入力を受け付け、当該基準語句を初期基準語句として設定する基準語句設定ステップと、前記初期基準語句と共に出現する共起語句を文書から抽出する共起語句抽出ステップと、前記初期基準語句と前記共起語句をデータベースに記憶する第一記憶ステップと、前記文書から前記カテゴリへの配属候補となる語句を抽出する語句抽出ステップと、前記配属候補語句について、前記共起語句との共起関連性に基づき前記カテゴリへの配属スコアを算出する配属スコア算出ステップと、前記配属スコアに基づき前記配属候補語句を前記カテゴリに配属を決定する配属決定ステップと、前記配属決定ステップによって前記カテゴリに配属された前記配属候補語句を前記カテゴリに関連付けて前記データベースに記憶する第二記憶ステップと、を有すること特徴とする。 The invention according to claim 10 is a database creation method for creating a database to be executed by a computer, wherein a category setting step for setting a category for classifying words and phrases, and one or more criteria for each category A reference phrase setting step that accepts an input of a phrase and sets the reference phrase as an initial reference phrase; a co-occurrence phrase extraction step that extracts a co-occurrence phrase that appears with the initial reference phrase from the document; the initial reference phrase and the A first storage step of storing a co-occurrence word in a database; a word extraction step of extracting a word that is a candidate for assignment to the category from the document; and a co-occurrence relationship with the co-occurrence word for the assignment candidate word An assignment score calculating step for calculating an assignment score for the category based on the assignment score; An assignment determining step for determining assignment of the assignment candidate word / phrase to the category, and a second storage step for storing the assignment candidate word / phrase assigned to the category by the assignment determining step in the database in association with the category. It is characterized by having.

請求項１１に記載の発明は、コンピュータを、語句を分類するためのカテゴリを設定するカテゴリ設定手段、前記カテゴリごとに１または２以上の基準語句の入力を受け付け、当該基準語句を初期基準語句として設定する基準語句設定手段、前記初期基準語句と共に出現する共起語句を文書から抽出する共起語句抽出手段、前記初期基準語句と前記共起語句をデータベースに記憶する第一記憶手段、前記文書から前記カテゴリへの配属候補となる語句を抽出する語句抽出手段、前記配属候補語句について、前記共起語句との共起関連性に基づき前記カテゴリへの配属スコアを算出する配属スコア算出手段、前記配属スコアに基づき前記基準語句候補または前記共起語句候補を前記カテゴリに配属を決定する配属決定手段、前記配属決定手段によって前記カテゴリに配属された前記配属候補語句を前記カテゴリに関連付けて前記データベースに記憶する第二記憶手段として機能させることを特徴とする。 The invention described in claim 1 1, a computer, a category setting unit for setting a category for classifying the words, receives the input of one or more reference word for each of the categories, the initial reference word the reference word Reference word setting means for setting as, co-occurrence word extraction means for extracting a co-occurrence word phrase appearing together with the initial reference word phrase from the document, first storage means for storing the initial reference word phrase and the co-occurrence word phrase in a database, the document A phrase extracting means for extracting a phrase that is a candidate for assignment to the category, an assignment score calculating means for calculating an assignment score for the category based on a co-occurrence relationship with the co-occurrence phrase for the assignment candidate phrase, An assignment determining means for determining assignment of the reference word candidate or the co-occurrence word candidate to the category based on an assignment score; Therefore, the assignment candidate word / phrase assigned to the category is made to function as second storage means for storing in the database in association with the category.

Claims

Category setting means for setting a category for classifying words;
A reference phrase setting unit that accepts input of one or more reference phrases for each category, and sets the reference phrases as initial reference phrases;
A co-occurrence phrase extracting means for extracting a co-occurrence phrase appearing together with the initial reference phrase from the document;
First storage means for storing the initial reference phrase and the co-occurrence phrase in a database;
Word / phrase extracting means for extracting words / phrases that are candidates for assignment to the category from the document;
An assignment score calculating means for calculating an assignment score to the category based on the co-occurrence relation with the co-occurrence word / phrase for the assignment candidate word / phrase;
Assignment determination means for determining assignment of the assignment candidate phrase to the category based on the assignment score;
Second storage means for storing the assignment candidate words assigned to the category by the assignment determination means in the database in association with the category;
A database creating apparatus characterized by comprising:

In the database creation apparatus according to claim 1,
An apparatus for creating a database, wherein the category setting means sets a target category for classifying words and a non-purpose category other than the purpose of classification.

In the database creation device according to claim 1 or 2 ,
An apparatus for creating a database, further comprising input means for receiving input of the reference phrase from outside.

In the database creation device according to any one of claims 1 to 3 ,
A weighting factor calculating means for calculating a weighting factor indicating the strength of association between the reference phrase and the co-occurrence phrase;
An apparatus for creating a database, wherein the assignment score is calculated based on the weighting factor.

In the database creation apparatus according to claim 4,
An apparatus for creating a database, further comprising weight coefficient update means for updating the weight coefficient when the assignment candidate phrase is added to the reference phrase of the category.

In the database creation apparatus according to claim 5 ,
An apparatus for creating a database, wherein when the co-occurrence word / phrase becomes a co-occurrence word / phrase for a plurality of reference words / phrases of the category, the value of the weighting factor is decreased.

In the database creation device according to any one of claims 1 to 6 ,
An apparatus for creating a database, wherein the co-occurrence association with the co-occurrence word / phrase is calculated based on the co-occurrence frequency for the assignment candidate word / phrase.

In the database creation device according to any one of claims 1 to 7 ,
An apparatus for creating a database, wherein the co-occurrence word / phrase is a word / phrase having a dependency relationship with the reference word / phrase.

In the database creation device according to any one of claims 1 to 8 ,
Creation of a database, further comprising compound phrase creation means for creating a compound phrase from a plurality of adjacent phrases in the document based on a combination pattern of parts of speech of the phrase when extracting a phrase from the document apparatus.

A database creation method for creating a database to be executed by a computer,
A category setting step for setting a category for classifying words;
A reference phrase setting step for accepting input of one or more reference phrases for each category and setting the reference phrases as initial reference phrases;
A co-occurrence phrase extraction step for extracting from the document a co-occurrence phrase that appears with the initial reference phrase;
A first storage step of storing the initial reference phrase and the co-occurrence phrase in a database;
A phrase extraction step of extracting a phrase that is a candidate for assignment to the category from the document;
An assignment score calculating step for calculating an assignment score to the category based on the co-occurrence relationship with the co-occurrence word / phrase for the assignment candidate word / phrase,
An assignment determining step of determining assignment of the assignment candidate phrase to the category based on the assignment score;
A second storage step of storing the assignment candidate phrases assigned to the category by the assignment determination step in the database in association with the category;
A method of creating a database characterized by comprising:

Computer
Category setting means for setting a category for classifying words,
A reference phrase setting unit that accepts input of one or more reference phrases for each category and sets the reference phrases as initial reference phrases;
A co-occurrence phrase extracting means for extracting a co-occurrence phrase appearing together with the initial reference phrase from the document;
First storage means for storing the initial reference phrase and the co-occurrence phrase in a database;
Word / phrase extraction means for extracting words / phrases that are candidates for assignment to the category from the document;
An assignment score calculating means for calculating an assignment score to the category based on the co-occurrence relationship with the co-occurrence word / phrase for the assignment candidate word / phrase,
An assignment determination means for determining assignment of the assignment candidate word / phrase to the category based on the assignment score; and
A database creation program that functions as second storage means for storing the assignment candidate words assigned to the category by the assignment determination means in the database in association with the category.