JP2002183175A5

JP2002183175A5 -

Info

Publication number: JP2002183175A5
Application number: JP2000379770A
Authority: JP
Filing date: 2000-12-08
Publication date: 2005-07-21

Claims

A program storage medium for storing a program for text mining, wherein the program is stored in a CPU for text mining to extract characteristic information from at least two document sets .
Extracting a set of words appearing simultaneously from two or more document sets stored in a file storage device ;
Program storage medium characterized by and a step of extracting a set of characteristic words from the set of partial document set the extracted word for each.

A program storage medium for storing a program for text mining, wherein the program is for text mining to extract characteristic information from a set of documents to which at least one attribute is assigned by paying attention to the attribute. To the CPU,
Dividing the document set stored in the file storage device into at least two partial document sets based on the attributes ;
Extracting a set of words that simultaneously appear from the two or more partial document sets ;
Program storage medium characterized by and a step of extracting a set of characteristic words from the set of partial document set the extracted word for each.

In the program storage medium according to claim 1 or claim 2, program storage medium, wherein the word set appearing simultaneously said extracted is a set of words appearing in a certain distance.

3. The program storage medium according to claim 1, wherein the program further gives the CPU an amount indicating the strength of connection between two words constituting the extracted characteristic word set. calculating, program storage medium, characterized in that and a step of outputting clustering in accordance with the set of the extracted characteristic word to the amount which is the calculated.

In the program storage medium according to claim 1 or claim 2, the program is in CPU, further comprising the steps of Ru receives the information and words to instruct the document set, the word associated with the input word A program storage for executing the step of obtaining from the information of the set of words extracted from the document set indicated by the input information and the step of outputting as related words of the input word Medium .

In a text mining method for extracting characteristic information by focusing on the attribute from a set of documents to which at least one attribute is assigned, the document set is divided into at least two partial document sets based on the attribute, A text which extracts a set of words appearing simultaneously from the two or more partial document sets, and extracts a characteristic word set from the extracted word sets for each partial document set Mining method.