JP2002183175A5 - - Google Patents

Download PDF

Info

Publication number
JP2002183175A5
JP2002183175A5 JP2000379770A JP2000379770A JP2002183175A5 JP 2002183175 A5 JP2002183175 A5 JP 2002183175A5 JP 2000379770 A JP2000379770 A JP 2000379770A JP 2000379770 A JP2000379770 A JP 2000379770A JP 2002183175 A5 JP2002183175 A5 JP 2002183175A5
Authority
JP
Japan
Prior art keywords
words
word
storage medium
program
program storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2000379770A
Other languages
Japanese (ja)
Other versions
JP2002183175A (en
Filing date
Publication date
Application filed filed Critical
Priority to JP2000379770A priority Critical patent/JP2002183175A/en
Priority claimed from JP2000379770A external-priority patent/JP2002183175A/en
Publication of JP2002183175A publication Critical patent/JP2002183175A/en
Publication of JP2002183175A5 publication Critical patent/JP2002183175A5/ja
Pending legal-status Critical Current

Links

Claims (6)

テキストマイニングのためのプログラムを格納するプログラム記憶媒体であって、前記プログラムは、少なくとも2個以上の文書集合から特徴的な情報を抽出するテキストマイニングのために、CPUに、
ファイル記憶装置に記憶される2個以上の文書集合から同時に出現する語の組を抽出するステップと
前記部分文書集合毎に前記抽出された語の組の中から特徴的な語の組を抽出するステップとを実行させることを特徴とするプログラム記憶媒体
A program storage medium for storing a program for text mining, wherein the program is stored in a CPU for text mining to extract characteristic information from at least two document sets .
Extracting a set of words appearing simultaneously from two or more document sets stored in a file storage device ;
Program storage medium characterized by and a step of extracting a set of characteristic words from the set of partial document set the extracted word for each.
テキストマイニングのためのプログラムを格納するプログラム記憶媒体であって、前記プログラムは、少なくとも一つの属性が付与された文書の集合から、前記属性に着目して特徴的な情報を抽出するテキストマイニングのために、CPUに、
前記属性に基づいて、ファイル記憶装置に記憶される文書集合を少なくとも2個の部分文書集合に分割するステップと、
前記2個以上の部分文書集合から同時に出現する語の組を抽出するステップと、
前記部分文書集合毎に前記抽出された語の組の中から特徴的な語の組を抽出するするステップとを実行させることを特徴とするプログラム記憶媒体
A program storage medium for storing a program for text mining, wherein the program is for text mining to extract characteristic information from a set of documents to which at least one attribute is assigned by paying attention to the attribute. To the CPU,
Dividing the document set stored in the file storage device into at least two partial document sets based on the attributes ;
Extracting a set of words that simultaneously appear from the two or more partial document sets ;
Program storage medium characterized by and a step of extracting a set of characteristic words from the set of partial document set the extracted word for each.
請求項1または請求項2に記載のプログラム記憶媒体において、前記抽出される同時に出現する語の組が、一定の距離内に出現する語の組であることを特徴とするプログラム記憶媒体In the program storage medium according to claim 1 or claim 2, program storage medium, wherein the word set appearing simultaneously said extracted is a set of words appearing in a certain distance. 請求項1または請求項2に記載のプログラム記憶媒体において、該プログラムはCPUに、さらに、前記抽出された特徴的な語の組を構成する2個の語間の結びつきの強さを示す量を計算するステップと、前記抽出された特徴的な語の組を前記計算された量に応じてクラスタリングして出力するステップとを実行させることを特徴とするプログラム記憶媒体3. The program storage medium according to claim 1, wherein the program further gives the CPU an amount indicating the strength of connection between two words constituting the extracted characteristic word set. calculating, program storage medium, characterized in that and a step of outputting clustering in accordance with the set of the extracted characteristic word to the amount which is the calculated. 請求項1または請求項2に記載のプログラム記憶媒体において、該プログラムはCPUに、さらに、前記文書集合を指示する情報および語の入力を受けるステップと、前記入力された語と関連する語を前記入力された情報によって指示される文書集合から抽出された前記語の組の情報から取得するステップと、前記入力された語の関連語として出力するステップとを実行させることを特徴とするプログラム記憶媒体In the program storage medium according to claim 1 or claim 2, the program is in CPU, further comprising the steps of Ru receives the information and words to instruct the document set, the word associated with the input word A program storage for executing the step of obtaining from the information of the set of words extracted from the document set indicated by the input information and the step of outputting as related words of the input word Medium . 少なくとも一つの属性が付与された文書の集合から、前記属性に着目して特徴的な情報を抽出するテキストマイニング方法において、前記属性に基づいて文書集合を少なくとも2個の部分文書集合に分割し、前記2個以上の部分文書集合から同時に出現する語の組を抽出し、前記部分文書集合毎に前記抽出された語の組の中から特徴的な語の組を抽出することを特徴とするテキストマイニング方法。  In a text mining method for extracting characteristic information by focusing on the attribute from a set of documents to which at least one attribute is assigned, the document set is divided into at least two partial document sets based on the attribute, A text which extracts a set of words appearing simultaneously from the two or more partial document sets, and extracts a characteristic word set from the extracted word sets for each partial document set Mining method.
JP2000379770A 2000-12-08 2000-12-08 Text mining method Pending JP2002183175A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2000379770A JP2002183175A (en) 2000-12-08 2000-12-08 Text mining method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2000379770A JP2002183175A (en) 2000-12-08 2000-12-08 Text mining method

Publications (2)

Publication Number Publication Date
JP2002183175A JP2002183175A (en) 2002-06-28
JP2002183175A5 true JP2002183175A5 (en) 2005-07-21

Family

ID=18848074

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2000379770A Pending JP2002183175A (en) 2000-12-08 2000-12-08 Text mining method

Country Status (1)

Country Link
JP (1) JP2002183175A (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3831319B2 (en) * 2002-08-23 2006-10-11 株式会社東芝 Text information analysis system and analysis result presentation method
JP2004178123A (en) * 2002-11-26 2004-06-24 Hitachi Ltd Information processor and program for executing information processor
JP3600611B2 (en) 2002-12-12 2004-12-15 本田技研工業株式会社 Information processing apparatus, information processing method, and information processing program
US8611676B2 (en) 2005-07-26 2013-12-17 Sony Corporation Information processing apparatus, feature extraction method, recording media, and program
CN101889281B (en) 2008-03-10 2012-10-17 松下电器产业株式会社 Content search device and content search method
JP5964149B2 (en) * 2012-06-20 2016-08-03 株式会社Nttドコモ Apparatus and program for identifying co-occurrence words
JP6764973B1 (en) * 2019-04-25 2020-10-07 みずほ情報総研株式会社 Related word dictionary creation system, related word dictionary creation method and related word dictionary creation program

Similar Documents

Publication Publication Date Title
JP4189416B2 (en) Structured document management system and program
JP2006172450A5 (en)
CN110083805A (en) A kind of method and system that Word file is converted to EPUB file
CA2430802A1 (en) Method and system for displaying and linking ink objects with recognized text and objects
US20130174024A1 (en) Method and device for converting document format
US20120137207A1 (en) Systems and methods for converting a pdf file
CN104063365B (en) The method that object is inserted into PDF document
CN109344298A (en) A kind of method and device converting unstructured data to structural data
CN105185377A (en) Voice-based file generation method and device
TW201617940A (en) Compression of cascading style sheet files
JP2002183175A5 (en)
JP5950700B2 (en) Image processing apparatus, image processing method, and program
CN107203509A (en) Title generation method and device
US20160180849A1 (en) Method for producing and recognizing barcode information based on voice, and recording medium
JP5618968B2 (en) Similar page detection device, similar page detection method, and similar page detection program
JP2009140411A (en) Text summarization device and text summarization method
CN104866607A (en) Dongba character interpretation database building method
CN109857989A (en) The font data compression method, apparatus and electronic equipment of pdf document
CN107610006A (en) A kind of intellectual property service management system
JPS6154569A (en) Document poicture processing system
TWI645304B (en) Data extracting method for portable document format file corresponding to credit record of user and personal credit analysis system
CN108304401A (en) E-book searching method and system
JP5366729B2 (en) Semantic relationship information generating apparatus and program
JP2000285116A5 (en)
JP2000311170A5 (en)