JP2010198141A5 - - Google Patents

Download PDF

Info

Publication number
JP2010198141A5
JP2010198141A5 JP2009039999A JP2009039999A JP2010198141A5 JP 2010198141 A5 JP2010198141 A5 JP 2010198141A5 JP 2009039999 A JP2009039999 A JP 2009039999A JP 2009039999 A JP2009039999 A JP 2009039999A JP 2010198141 A5 JP2010198141 A5 JP 2010198141A5
Authority
JP
Japan
Prior art keywords
phrase
category
assignment
database
occurrence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP2009039999A
Other languages
Japanese (ja)
Other versions
JP2010198141A (en
JP5295818B2 (en
Filing date
Publication date
Application filed filed Critical
Priority to JP2009039999A priority Critical patent/JP5295818B2/en
Priority claimed from JP2009039999A external-priority patent/JP5295818B2/en
Publication of JP2010198141A publication Critical patent/JP2010198141A/en
Publication of JP2010198141A5 publication Critical patent/JP2010198141A5/ja
Application granted granted Critical
Publication of JP5295818B2 publication Critical patent/JP5295818B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Description

請求項2に記載の発明は請求項1に記載のデータベースの作成装置において、前記カテゴリ設定手段が、語句を分類するための目的カテゴリおよび前記分類の目的外の非目的カテゴリを設定することを特徴とする。 According to a second aspect of the present invention, in the database creation device according to the first aspect, the category setting means sets a target category for classifying words and a non-target category outside the purpose of the classification. And

請求項3に記載の発明は、請求項1または請求項2に記載のデータベースの作成装置において、外部から前記基準語句の入力を受け付ける入力手段を更に備えたことを特徴とする。この場合、ユーザが基準語句を入力するだけで、容易にカテゴリ分類ができる。 According to a third aspect of the present invention, in the database creation device according to the first or second aspect of the present invention, the database creation device further includes an input unit that receives an input of the reference phrase from the outside . In this case, the user can easily categorize by simply inputting the reference phrase.

請求項に記載の発明は、請求項1から請求項3のいずれか1項に記載のデータベースの作成装置において、前記基準語句と前記共起語句との関連の強さを示す重み係数を算出する重み係数算出手段を更に有し、前記重み係数に基づき前記配属スコアを算出することを特徴とする。 According to a fourth aspect of the present invention, in the database creation device according to any one of the first to third aspects, a weighting coefficient indicating the strength of association between the reference phrase and the co-occurrence phrase is calculated. And a weighting coefficient calculating means for calculating the assignment score based on the weighting coefficient.

請求項に記載の発明は、請求項に記載のデータベースの作成装置において、前記配属候補語句を前記カテゴリの前記基準語句に加えた際に、前記重み係数を更新する重み係数更新手段を更に有することを特徴とする。 According to a fifth aspect of the present invention, in the database creation device according to the fourth aspect , when the assignment candidate word / phrase is added to the reference word / phrase of the category, the weight coefficient updating means for updating the weight coefficient is further provided. It is characterized by having.

請求項に記載の発明は、請求項に記載のデータベースの作成装置において、前記共起語句が、複数の前記カテゴリの基準語句に対する共起語句となる場合、前記重み係数の値を減少させることを特徴とする。 According to a sixth aspect of the present invention, in the database creation device according to the fifth aspect , when the co-occurrence word / phrase is a co-occurrence word / phrase for a plurality of reference words / phrases of the category, the value of the weighting factor is decreased. It is characterized by that.

請求項に記載の発明は、請求項1から請求項のいずれか1項に記載のデータベースの作成装置において、前記配属候補語句について、前記共起語句との前記共起関連性を、共起頻度に基づき算出することを特徴とする。この場合、共起関連性を統計的に求め、さらに分類精度が向上する。 Invention according to claim 7, in the preparation device database according to any one of claims 1 to 6, for the assignment candidate phrase, the co-occurrence relationship between the co-occurrence phrase, co The calculation is based on the occurrence frequency. In this case, the co-occurrence relevance is obtained statistically, and the classification accuracy is further improved.

請求項に記載の発明は、請求項1から請求項のいずれか1項に記載のデータベースの作成装置において、前記共起語句が、前記基準語句と係り受け関係を持つ語句であることを特徴とする。 The invention according to claim 8 is the database creation device according to any one of claims 1 to 7 , wherein the co-occurrence word / phrase is a word / phrase having a dependency relationship with the reference word / phrase. Features.

請求項に記載の発明は、請求項1から請求項のいずれか1項に記載のデータベースの作成装置において、前記文書から語句を抽出する際、前記語句の品詞の組み合せパターンに基づき、前記文書中で隣接する複数の前記語句から複合語句を作成する複合語句作成手段を更に有することを特徴とする。 According to a ninth aspect of the present invention, in the database creation device according to any one of the first to eighth aspects, when extracting a phrase from the document, based on a combination pattern of parts of speech of the phrase, It further comprises a compound phrase creating means for creating a compound phrase from a plurality of adjacent phrases in the document.

請求項10に記載の発明は、コンピュータにより実行させるデータベースを作成するデータベースの作成方法であって、語句を分類するためのカテゴリを設定するカテゴリ設定ステップと、前記カテゴリごとに1または2以上の基準語句の入力を受け付け、当該基準語句を初期基準語句として設定する基準語句設定ステップと、前記初期基準語句と共に出現する共起語句を文書から抽出する共起語句抽出ステップと、前記初期基準語句と前記共起語句をデータベースに記憶する第一記憶ステップと、前記文書から前記カテゴリへの配属候補となる語句を抽出する語句抽出ステップと、前記配属候補語句について、前記共起語句との共起関連性に基づき前記カテゴリへの配属スコアを算出する配属スコア算出ステップと、前記配属スコアに基づき前記配属候補語句を前記カテゴリに配属を決定する配属決定ステップと、前記配属決定ステップによって前記カテゴリに配属された前記配属候補語句を前記カテゴリに関連付けて前記データベースに記憶する第二記憶ステップと、を有すること特徴とする。 The invention according to claim 10 is a database creation method for creating a database to be executed by a computer, wherein a category setting step for setting a category for classifying words and phrases, and one or more criteria for each category A reference phrase setting step that accepts an input of a phrase and sets the reference phrase as an initial reference phrase; a co-occurrence phrase extraction step that extracts a co-occurrence phrase that appears with the initial reference phrase from the document; the initial reference phrase and the A first storage step of storing a co-occurrence word in a database; a word extraction step of extracting a word that is a candidate for assignment to the category from the document; and a co-occurrence relationship with the co-occurrence word for the assignment candidate word An assignment score calculating step for calculating an assignment score for the category based on the assignment score; An assignment determining step for determining assignment of the assignment candidate word / phrase to the category, and a second storage step for storing the assignment candidate word / phrase assigned to the category by the assignment determining step in the database in association with the category. It is characterized by having.

請求項1に記載の発明は、コンピュータを、語句を分類するためのカテゴリを設定するカテゴリ設定手段、前記カテゴリごとに1または2以上の基準語句の入力を受け付け、当該基準語句を初期基準語句として設定する基準語句設定手段、前記初期基準語句と共に出現する共起語句を文書から抽出する共起語句抽出手段、前記初期基準語句と前記共起語句をデータベースに記憶する第一記憶手段、前記文書から前記カテゴリへの配属候補となる語句を抽出する語句抽出手段、前記配属候補語句について、前記共起語句との共起関連性に基づき前記カテゴリへの配属スコアを算出する配属スコア算出手段、前記配属スコアに基づき前記基準語句候補または前記共起語句候補を前記カテゴリに配属を決定する配属決定手段、前記配属決定手段によって前記カテゴリに配属された前記配属候補語句を前記カテゴリに関連付けて前記データベースに記憶する第二記憶手段として機能させることを特徴とする。 The invention described in claim 1 1, a computer, a category setting unit for setting a category for classifying the words, receives the input of one or more reference word for each of the categories, the initial reference word the reference word Reference word setting means for setting as, co-occurrence word extraction means for extracting a co-occurrence word phrase appearing together with the initial reference word phrase from the document, first storage means for storing the initial reference word phrase and the co-occurrence word phrase in a database, the document A phrase extracting means for extracting a phrase that is a candidate for assignment to the category, an assignment score calculating means for calculating an assignment score for the category based on a co-occurrence relationship with the co-occurrence phrase for the assignment candidate phrase, An assignment determining means for determining assignment of the reference word candidate or the co-occurrence word candidate to the category based on an assignment score; Therefore, the assignment candidate word / phrase assigned to the category is made to function as second storage means for storing in the database in association with the category.

Claims (11)

語句を分類するためのカテゴリを設定するカテゴリ設定手段と、
前記カテゴリごとに1または2以上の基準語句の入力を受け付け、当該基準語句を初期基準語句として設定する基準語句設定手段と、
前記初期基準語句と共に出現する共起語句を文書から抽出する共起語句抽出手段と、
前記初期基準語句と前記共起語句をデータベースに記憶する第一記憶手段と、
前記文書から前記カテゴリへの配属候補となる語句を抽出する語句抽出手段と、
前記配属候補語句について、前記共起語句との共起関連性に基づき前記カテゴリへの配属スコアを算出する配属スコア算出手段と、
前記配属スコアに基づき前記配属候補語句を前記カテゴリに配属を決定する配属決定手段と、
前記配属決定手段によって前記カテゴリに配属された前記配属候補語句を前記カテゴリに関連付けて前記データベースに記憶する第二記憶手段と、
を備えたこと特徴とするデータベースの作成装置。
Category setting means for setting a category for classifying words;
A reference phrase setting unit that accepts input of one or more reference phrases for each category, and sets the reference phrases as initial reference phrases;
A co-occurrence phrase extracting means for extracting a co-occurrence phrase appearing together with the initial reference phrase from the document;
First storage means for storing the initial reference phrase and the co-occurrence phrase in a database;
Word / phrase extracting means for extracting words / phrases that are candidates for assignment to the category from the document;
An assignment score calculating means for calculating an assignment score to the category based on the co-occurrence relation with the co-occurrence word / phrase for the assignment candidate word / phrase;
Assignment determination means for determining assignment of the assignment candidate phrase to the category based on the assignment score;
Second storage means for storing the assignment candidate words assigned to the category by the assignment determination means in the database in association with the category;
A database creating apparatus characterized by comprising:
請求項1に記載のデータベースの作成装置において、
前記カテゴリ設定手段が、語句を分類するための目的カテゴリおよび前記分類の目的外の非目的カテゴリを設定することを特徴とするデータベースの作成装置。
In the database creation apparatus according to claim 1,
An apparatus for creating a database, wherein the category setting means sets a target category for classifying words and a non-purpose category other than the purpose of classification.
請求項1または請求項2に記載のデータベースの作成装置において、
外部から前記基準語句の入力を受け付ける入力手段を更に備えたことを特徴とするデータベースの作成装置。
In the database creation device according to claim 1 or 2 ,
An apparatus for creating a database, further comprising input means for receiving input of the reference phrase from outside.
請求項1から請求項3のいずれか1項に記載のデータベースの作成装置において、
前記基準語句と前記共起語句との関連の強さを示す重み係数を算出する重み係数算出手段を更に有し、
前記重み係数に基づき前記配属スコアを算出することを特徴とするデータベースの作成装置。
In the database creation device according to any one of claims 1 to 3 ,
A weighting factor calculating means for calculating a weighting factor indicating the strength of association between the reference phrase and the co-occurrence phrase;
An apparatus for creating a database, wherein the assignment score is calculated based on the weighting factor.
請求項4に記載のデータベースの作成装置において、
前記配属候補語句を前記カテゴリの前記基準語句に加えた際に、前記重み係数を更新する重み係数更新手段を更に有することを特徴とするデータベースの作成装置。
In the database creation apparatus according to claim 4,
An apparatus for creating a database, further comprising weight coefficient update means for updating the weight coefficient when the assignment candidate phrase is added to the reference phrase of the category.
請求項に記載のデータベースの作成装置において、
前記共起語句が、複数の前記カテゴリの基準語句に対する共起語句となる場合、前記重み係数の値を減少させることを特徴とするデータベースの作成装置。
In the database creation apparatus according to claim 5 ,
An apparatus for creating a database, wherein when the co-occurrence word / phrase becomes a co-occurrence word / phrase for a plurality of reference words / phrases of the category, the value of the weighting factor is decreased.
請求項1から請求項のいずれか1項に記載のデータベースの作成装置において、
前記配属候補語句について、前記共起語句との前記共起関連性を、共起頻度に基づき算出することを特徴とするデータベースの作成装置。
In the database creation device according to any one of claims 1 to 6 ,
An apparatus for creating a database, wherein the co-occurrence association with the co-occurrence word / phrase is calculated based on the co-occurrence frequency for the assignment candidate word / phrase.
請求項1から請求項のいずれか1項に記載のデータベースの作成装置において、
前記共起語句が、前記基準語句と係り受け関係を持つ語句であることを特徴とするデータベースの作成装置。
In the database creation device according to any one of claims 1 to 7 ,
An apparatus for creating a database, wherein the co-occurrence word / phrase is a word / phrase having a dependency relationship with the reference word / phrase.
請求項1から請求項のいずれか1項に記載のデータベースの作成装置において、
前記文書から語句を抽出する際、前記語句の品詞の組み合せパターンに基づき、前記文書中で隣接する複数の前記語句から複合語句を作成する複合語句作成手段を更に有することを特徴とするデータベースの作成装置。
In the database creation device according to any one of claims 1 to 8 ,
Creation of a database, further comprising compound phrase creation means for creating a compound phrase from a plurality of adjacent phrases in the document based on a combination pattern of parts of speech of the phrase when extracting a phrase from the document apparatus.
コンピュータにより実行させるデータベースを作成するデータベースの作成方法であって、
語句を分類するためのカテゴリを設定するカテゴリ設定ステップと、
前記カテゴリごとに1または2以上の基準語句の入力を受け付け、当該基準語句を初期基準語句として設定する基準語句設定ステップと、
前記初期基準語句と共に出現する共起語句を文書から抽出する共起語句抽出ステップと、
前記初期基準語句と前記共起語句をデータベースに記憶する第一記憶ステップと、
前記文書から前記カテゴリへの配属候補となる語句を抽出する語句抽出ステップと、
前記配属候補語句について、前記共起語句との共起関連性に基づき前記カテゴリへの配属スコアを算出する配属スコア算出ステップと、
前記配属スコアに基づき前記配属候補語句を前記カテゴリに配属を決定する配属決定ステップと、
前記配属決定ステップによって前記カテゴリに配属された前記配属候補語句を前記カテゴリに関連付けて前記データベースに記憶する第二記憶ステップと、
を有すること特徴とするデータベースの作成方法。
A database creation method for creating a database to be executed by a computer,
A category setting step for setting a category for classifying words;
A reference phrase setting step for accepting input of one or more reference phrases for each category and setting the reference phrases as initial reference phrases;
A co-occurrence phrase extraction step for extracting from the document a co-occurrence phrase that appears with the initial reference phrase;
A first storage step of storing the initial reference phrase and the co-occurrence phrase in a database;
A phrase extraction step of extracting a phrase that is a candidate for assignment to the category from the document;
An assignment score calculating step for calculating an assignment score to the category based on the co-occurrence relationship with the co-occurrence word / phrase for the assignment candidate word / phrase,
An assignment determining step of determining assignment of the assignment candidate phrase to the category based on the assignment score;
A second storage step of storing the assignment candidate phrases assigned to the category by the assignment determination step in the database in association with the category;
A method of creating a database characterized by comprising:
コンピュータを、
語句を分類するためのカテゴリを設定するカテゴリ設定手段、
前記カテゴリごとに1または2以上の基準語句の入力を受け付け、当該基準語句を初期基準語句として設定する基準語句設定手段、
前記初期基準語句と共に出現する共起語句を文書から抽出する共起語句抽出手段、
前記初期基準語句と前記共起語句をデータベースに記憶する第一記憶手段、
前記文書から前記カテゴリへの配属候補となる語句を抽出する語句抽出手段、
前記配属候補語句について、前記共起語句との共起関連性に基づき前記カテゴリへの配属スコアを算出する配属スコア算出手段、
前記配属スコアに基づき前記配属候補語句を前記カテゴリに配属を決定する配属決定手段、および、
前記配属決定手段によって前記カテゴリに配属された前記配属候補語句を前記カテゴリに関連付けて前記データベースに記憶する第二記憶手段として機能させることを特徴とするデータベースの作成プログラム。
Computer
Category setting means for setting a category for classifying words,
A reference phrase setting unit that accepts input of one or more reference phrases for each category and sets the reference phrases as initial reference phrases;
A co-occurrence phrase extracting means for extracting a co-occurrence phrase appearing together with the initial reference phrase from the document;
First storage means for storing the initial reference phrase and the co-occurrence phrase in a database;
Word / phrase extraction means for extracting words / phrases that are candidates for assignment to the category from the document;
An assignment score calculating means for calculating an assignment score to the category based on the co-occurrence relationship with the co-occurrence word / phrase for the assignment candidate word / phrase,
An assignment determination means for determining assignment of the assignment candidate word / phrase to the category based on the assignment score; and
A database creation program that functions as second storage means for storing the assignment candidate words assigned to the category by the assignment determination means in the database in association with the category.
JP2009039999A 2009-02-23 2009-02-23 Database creation apparatus, database creation method, and database creation program in which words included in document are assigned by category Active JP5295818B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2009039999A JP5295818B2 (en) 2009-02-23 2009-02-23 Database creation apparatus, database creation method, and database creation program in which words included in document are assigned by category

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2009039999A JP5295818B2 (en) 2009-02-23 2009-02-23 Database creation apparatus, database creation method, and database creation program in which words included in document are assigned by category

Publications (3)

Publication Number Publication Date
JP2010198141A JP2010198141A (en) 2010-09-09
JP2010198141A5 true JP2010198141A5 (en) 2012-04-05
JP5295818B2 JP5295818B2 (en) 2013-09-18

Family

ID=42822835

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2009039999A Active JP5295818B2 (en) 2009-02-23 2009-02-23 Database creation apparatus, database creation method, and database creation program in which words included in document are assigned by category

Country Status (1)

Country Link
JP (1) JP5295818B2 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6188222B2 (en) * 2013-12-26 2017-08-30 日本放送協会 Topic extraction apparatus and program
CN110413956B (en) * 2018-04-28 2023-08-01 南京云问网络技术有限公司 Text similarity calculation method based on bootstrapping

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3847273B2 (en) * 2003-05-12 2006-11-22 沖電気工業株式会社 Word classification device, word classification method, and word classification program
JP2006065366A (en) * 2004-08-24 2006-03-09 Nec Corp Keyword classification device, its method, terminal device, and program
JP2007264985A (en) * 2006-03-28 2007-10-11 Oki Electric Ind Co Ltd Information classification method, information classification device and information classification program

Similar Documents

Publication Publication Date Title
US9785630B2 (en) Text prediction using combined word N-gram and unigram language models
TWI664540B (en) Search word error correction method and device, and weighted edit distance calculation method and device
US20180052823A1 (en) Hybrid Classifier for Assigning Natural Language Processing (NLP) Inputs to Domains in Real-Time
JP5587493B2 (en) Method and system for assigning actionable attributes to data representing personal identification
CN107180084B (en) Word bank updating method and device
CN103049470A (en) Opinion retrieval method based on emotional relevancy
CN102246169A (en) Assigning an indexing weight to a search term
KR101541306B1 (en) Computer enabled method of important keyword extraction, server performing the same and storage media storing the same
WO2015170963A1 (en) System and method for automatically generating a knowledge base
CN106547732A (en) Near synonym recognition methodss and near synonym identifying system
KR20130022075A (en) Method for building emotional lexical information and apparatus for the same
JP2008165401A (en) Literature retrieval program, literature retrieval device and literature retrieval method
JP2010198141A5 (en)
JP6275569B2 (en) Dialog apparatus, method and program
JP5546565B2 (en) Word addition device, word addition method, and program
El-Beltagy Niletmrg at semeval-2016 task 7: Deriving prior polarities for arabic sentiment terms
JP6092141B2 (en) Data analysis apparatus, method, and program
JP2007157048A (en) Method for evaluating experience based information, system, program, and computer readable recording medium
JP2010267017A (en) Device, method and program for classifying document
JP2015028739A5 (en)
JP5977199B2 (en) Local association word extraction device, regional association word extraction method, and regional association word extraction program
JP2013182580A (en) Identity vector construction device, identity vector construction method, predicate similarity calculation device, predicate similarity calculation method and predicate similarity calculation program
JP5164876B2 (en) Representative word extraction method and apparatus, program, and computer-readable recording medium
JP6502807B2 (en) Information extraction apparatus, information extraction method and information extraction program
WO2014049998A1 (en) Information search system, information search method, and program