WO2008062822A1 - Dispositif d'exploration de texte, procédé d'exploration de texte et programme d'exploration de texte - Google Patents

Dispositif d'exploration de texte, procédé d'exploration de texte et programme d'exploration de texte Download PDF

Info

Publication number
WO2008062822A1
WO2008062822A1 PCT/JP2007/072527 JP2007072527W WO2008062822A1 WO 2008062822 A1 WO2008062822 A1 WO 2008062822A1 JP 2007072527 W JP2007072527 W JP 2007072527W WO 2008062822 A1 WO2008062822 A1 WO 2008062822A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
type
feature
positive
negative
Prior art date
Application number
PCT/JP2007/072527
Other languages
English (en)
Japanese (ja)
Inventor
Takahiro Ikeda
Satoshi Nakazawa
Yousuke Sakao
Kenji Satoh
Original Assignee
Nec Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nec Corporation filed Critical Nec Corporation
Publication of WO2008062822A1 publication Critical patent/WO2008062822A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes

Definitions

  • Text mining device text mining method, and text mining program
  • the present invention relates to a text mining device, a text mining method, and a text mining program for extracting a word as a feature of text, and in particular, from a word obtained as a mining result to a text including the word.
  • the present invention relates to a text mining apparatus, a text mining method, and a text mining program that can extract characteristic attributes.
  • Text mining is a text example when a user designates a text with a specific attribute value as a positive example for a set of texts with attribute values assigned to some attributes. This is a process for extracting and outputting a feature that appears biased to.
  • a related art text mining device extracts words from each text, and selects words or combinations of words that are highly related to text having a specific attribute value specified as a positive example. Configured to extract as features.
  • Patent Document 1 An example of this type of text mining device is described in Patent Document 1!
  • the text mining device described in Patent Document 1 appears in the text to be mined.
  • a feature word extraction processing unit that extracts characteristic words and phrases
  • an analysis axis setting processing unit that sets a classification axis (equivalent to an attribute) to be analyzed, and a category (equivalent to an attribute value) of the classification axis
  • a related word acquisition processing unit that extracts words / phrases with a high degree of recognition, and extracts words / phrases characteristic of each category of the classification axis set by the user as an object of analysis.
  • Non-Patent Document 1 Another example of this type of text mining method is described in Non-Patent Document 1.
  • the text mining method described in Non-Patent Document 1 when a positive example text (object group) and a negative example text (subject group) are given, the appearance frequency of the positive example text is high.
  • data mining a technique for learning some pattern or rule from a data set other than text is called data mining, and various methods for performing data mining are widely known.
  • Non-Patent Document 2 describes a branch conquest algorithm and a covering algorithm as an example of a method for performing data mining. This method is a method for obtaining a decision tree that discriminates positive examples when there is a data set with attributes that are divided into positive examples and negative examples in advance.
  • This method is a method for obtaining a correlation rule between a set of items when a set of transactions that is a combination of items is given.
  • Patent Document 1 Japanese Patent Application Laid-Open No. 2003-141134
  • Non-Patent Document 1 Junichiro Abe and 4 others, “High-speed data mining from text data, search document browsing and its application to web data," Journal of Artificial Intelligence, Vol. 15, No. 4, July 2000 Pp. 618-628
  • Non-Patent Document 2 Hiroshi Motoda et al., “Machine Learning and Data Mining”, Journal of Artificial Intelligence, Vo 1.12, No. 4, July 1997, pp. 505—512
  • Non-Patent Document 3 Yu Yukitagawa, “association rule extraction technique in data mining”, Journal of Artificial Intelligence, Vol. 12, No. 4, July 1997, pp. 513—520 Disclosure of the invention
  • the word “node disk” has the model name “PC—100” in the text with the reception date “October 2005” and the inquiry type “repair request”. ” May appear particularly biased in the text. It is also possible that the word “node disk” appears more frequently in texts that have a reception date of “November 2005” and an inquiry type of “repair request”. However, the user has not been able to know it in the past.
  • the problem with the text mining device of the related art described above is that the user can determine which range of text the feature appears when the feature is extracted from the text. There are things that cannot be presented. In other words, the related technology text mining device knows the attribute values or combinations of attribute values that are valid for new text classification that the user does not explicitly specify based on the feature (text) selected by the user. I can't. The reason for this is that, in the text mining device of the related technology described above, what common features other than the appearance of the features appear in the text where the extracted features appear. This is because you don't present information to users! /.
  • An object of the present invention is to provide a text mining device, a text mining method, and a text mining program that solve the above-described problems.
  • the first text mining device of the present invention performs text mining based on an attribute value condition which is a condition of the first type positive example and the first type negative example specified by the user! And extract the effective part to classify the first type positive example and the first type negative example as features, let the user select the features to be noted from the features, and The text corresponding to the positive example and the negative example of the first type, the text of the second type of positive example where the selected feature appears and the text of the second type negative example where the selected feature does not appear And a data processing device that generates an attribute value condition that is a new feature effective in classifying the second type positive example and the second type negative example.
  • the text mining device is based on an attribute value condition which is a condition of the first type positive example and the first type negative example specified by the user! /
  • the procedure for extracting the effective part for classifying positive examples of type 1 and negative types of type 1 using text mining as features and the features to be noticed from the extracted features are described above.
  • the first text mining program of the present invention performs text mining based on the attribute value condition which is the condition of the first type positive example and the first type negative example specified by the user. ! /, A procedure for extracting the effective part for classifying the first type positive example and the first type negative example as features, and selecting the feature to be noted from the extracted features to the user.
  • the second type of positive example and the second type of text where the selected feature does not appear, and the text corresponding to the first type of positive example and the first type of negative example.
  • Text mining of the procedure for classifying the text into negative examples of text and the procedure for generating attribute value conditions that are useful for classifying positive data of the second type and negative examples of the second type Let the device run. The invention's effect
  • the effect of the present invention is that an attribute value effective for a new text classification that is not explicitly specified by the user based on the feature (text) selected by the user for the user, The ability to know the combination)
  • FIG. 1 is a block diagram showing a configuration of a first exemplary embodiment of the present invention.
  • FIG. 2 is an explanatory diagram showing an example of the contents of a text storage unit.
  • FIG. 3 is an explanatory diagram showing an example of contents of an attribute storage unit.
  • FIG. 4 is a flowchart showing the operation of the first exemplary embodiment of the present invention.
  • FIG. 5 is a block diagram showing a configuration of a second exemplary embodiment of the present invention.
  • FIG. 6 is an explanatory diagram showing an example of the result of text mining.
  • FIG. 7 is an explanatory diagram showing an example of contents displayed on the output device.
  • FIG. 8 is an explanatory diagram showing an example of a decision tree.
  • FIG. 9 is an explanatory diagram showing an example of the output of attribute feature extraction means.
  • FIG. 10 is an explanatory diagram showing logic of an example of the first embodiment of the present invention.
  • the text mining device of the present invention performs text mining based on the attribute value condition that is the condition of the first type positive example and the first type negative example specified by the user!
  • the part effective for classifying the positive example and the first type negative example is extracted as a feature, and the feature to be noticed is selected from the features.
  • the text mining device selects the text corresponding to the first type positive example and the first type negative example as the second type positive example text in which the selected feature appears.
  • the second type of negative example text that does not appear is generated, and an attribute value condition that is a new feature that is effective for classifying the second type positive example and the second type negative example is generated. To do.
  • a part effective for classifying positive examples and negative examples means, for example, "a phrase with a low occurrence frequency in negative example text that has a high appearance frequency in positive example text” It is. In other words, it is not limited to “a phrase that appears in positive text and does not appear in negative text”. Further, for example, whether the appearance frequency is high or the appearance frequency is low can be determined by comparison with each “threshold value” set in advance. Also, for example, it can be determined from the ratio of the frequency of occurrence in the positive example text to the frequency of occurrence in the negative example text. As described above, the appearance frequency may be determined based on a predetermined criterion. Classification can also be based on various measures other than frequency of appearance. Hereinafter, “classification” is used in the above meaning.
  • FIG. 1 is a block diagram showing the configuration of the first exemplary embodiment of the present invention.
  • the text mining device includes an input device 10 such as a keyboard and a mouse, a storage device 21 such as a hard disk for storing information, a data processing device 31 operated by program control, a display device, and the like. Composed of 40 output devices
  • the storage device 21 includes an attribute storage unit 201, a text storage unit 202, and a mining result holding unit 203.
  • the attribute storage unit 201 stores information on attribute values assigned to the text in association with each text stored in the text storage unit 202.
  • the text storage unit 202 memorizes the text to be text mined.
  • FIG. 2 shows an example of the text storage unit 202
  • FIG. 3 shows an example of the attribute storage unit 201.
  • a unique text number is assigned to each text and stored in the text storage unit 202.
  • the attribute storage unit 201 "inquiry type”, "model name”, “reception date” for each text number. , And the attribute values of the four types of attributes “person in charge” are stored.
  • the attribute storage unit 201 and the text storage unit 202 may be configured to simultaneously store text that does not need to be completely separated and an attribute for the text.
  • the mining result holding unit 203 stores characteristics obtained as a result of text mining on the text stored in the text storage unit 202.
  • the data processing device 31 includes attribute value condition specifying means 301, text mining means 302, analysis target feature specifying means 303, positive / negative example text extracting means 304, and attribute feature extracting means 305.
  • the attribute value condition specifying means 301 sends the positive value (first type) attribute value condition and negative example (first type) attribute value condition specified by the user through the input device 10. And read.
  • the text mining means 302 converts the text stored in the text storage unit 202 into a positive text, negative text that matches the positive attribute value condition read by the attribute value condition specifying means 301. Text mining is applied as negative example text that meets the example attribute value condition. As a result, the text mining means 302 extracts a feature effective for classifying the positive example as the negative example as the feature of the positive example text, and outputs it to the user through the output device 40. Further, the extracted feature is stored in the mining result holding unit 203.
  • Non-Patent Document 1 In text mining, in general, a word, a set of words, phrases, sentences, etc. Then, elements constituting a part of text are extracted as features. That is, in text mining, for example, those elements that do not appear very often in negative example text but appear biased in the positive example text are extracted as special features of the positive example text. .
  • the technology described in Non-Patent Document 1 can be partially applied to this text mining.
  • a text mining technique in which the structure of text is analyzed, the text is converted into structured data as a result of the analysis, and then the partial structure of the structured data is extracted as a feature. This can be done, for example, by analyzing the dependency relationship between words in advance and extracting two words in the dependency relationship as features, or by converting the text into a dependency structure tree using dependency structure analysis. For example, the subtree is extracted as a feature.
  • a partial structure is included in structured data obtained from text, it is considered that the partial structure appears in the text.
  • the text mining means 302 outputs the characteristics obtained by text mining to the user through the output device 40 and stores them in the mining result holding unit 203.
  • the information output to the user through the output device 40 indicates how many texts the features appear in, and how much the features are biased to the text of the example. You can also include additional information such as! /
  • the analysis target feature designating unit 303 causes the user to designate a feature to be noted among the features output by the text mining unit 302 and reads the designated content through the input device 10.
  • the positive example negative example text extraction unit 304 reads the text that is processed by the text mining unit 302 out of the text stored in the text storage unit 202, that is, the attribute direct condition designating unit 301 reads the text. For each of the texts that meet either the positive or negative example attribute value condition, it is determined whether or not the feature read by the analysis target feature specifying unit 303 appears, and the text in which the feature appears is identified as a positive example ( Text that does not appear in the second type) is extracted as a negative example (second type).
  • the text mining unit 302 creates an index indicating in which text each feature appears.
  • the positive example negative example text extraction means 304 records the index. You may make it discriminate
  • the analysis target feature specifying unit 303 allows the user to specify a feature, only one feature or a plurality of features may be specified.
  • the positive example negative example text extraction means 304 may use the text in which one of the features appears as a positive example.
  • the text in which all features appear may be a positive example.
  • the positive example negative example text extraction unit 304 discriminates between the positive example and the negative example, only the text that has appeared the number of times equal to or more than a threshold value read by the analysis target feature specifying unit 303 is set as a positive example. It's okay to do IJ additional.
  • the attribute feature extraction unit 305 applies data mining to the positive example text and negative example text extracted by the positive example negative example text extraction unit 304, and extracts the positive example text and the negative example text. Characteristic attribute values or combinations of attribute values effective for classification are extracted and output to the user through the output device 40.
  • the data mining technique applied by the attribute feature extraction unit 305 is not limited to a specific method.
  • a decision tree analysis technique can be used as a data mining technique for extracting attribute values or combinations of attribute values characteristic of positive text.
  • a decision tree whose branch condition is a combination of attribute values for classifying positive example text and negative example text is obtained, and the combination of attribute values when following the path to the positive example in the decision tree is obtained. It can be extracted as a combination of attribute values specific to the positive text.
  • the decision tree is determined by the force S obtained using the method described in Non-Patent Document 2, for example.
  • V Extract attribute value combinations as combinations of attribute values specific to positive text. Since this corresponds to extracting an association rule that satisfies the minimum support level and the minimum certainty level, it can be realized by the method described in Non-Patent Document 3, for example.
  • any data mining technique can be used as long as it is a technique that can extract attribute values or combinations of attribute values that are characteristic of positive text.
  • FIG. 4 is a flowchart showing the operation of the first exemplary embodiment of the present invention.
  • the attribute value condition specifying means 301 reads the attribute value conditions specified by the user as the positive and negative example conditions via the input device 10 (step A1 in FIG. 4).
  • the text mining means 302 converts the text stored in the text storage unit 202 into a positive example that matches the positive attribute value condition read by the attribute value condition specifying means 301. Text mining is performed as negative example text that satisfies the attribute value condition of the positive example, and features effective for classifying the positive example text and the negative example text are extracted (step A2).
  • the text mining means 302 stores the extracted features in the mining result holding unit 203, reads the extracted features from the mining result holding unit 203, and outputs them to the user through the output device 40 ( Step A3).
  • the analysis target feature designating unit 303 reads the feature selection by the user via the input device 10 (step A4).
  • Positive example negative example text extraction means 304 reads the text stored in text storage unit 202 one by one (step A5), and the text is read by attribute value condition specifying means 301 as a positive example or Judgment is made as to whether or not the negative value attribute condition is! / (Step A6). If it matches (step A6 / Yes), the positive / negative example text extraction means 304 determines whether the feature selected by the user in step A4 appears in the text (step A7). If a feature appears in the read text (step A7 / Yes), the positive example negative example text extraction means 304 sets the text as a positive example (step A8), and if no feature appears (step A7 / No), and the text is a negative example (Step A9). Positive example Negative example text extraction means 304 performs scanning until all text has been processed. Repeat steps A5—A9 (step A10).
  • the attribute feature extraction means 305 is an attribute value or attribute value effective for classifying the positive example text and the negative example text extracted by the processing of steps A5—A10 by data mining. Are extracted (step Al l).
  • the attribute feature extraction unit 305 outputs the extraction result (attribute value or combination of attribute values) to the user via the output device 40 (step A12).
  • the attribute value condition specifying unit 301 reads the positive example attribute value condition and the negative example attribute value condition specified by the user, and the text mining unit.
  • 302 Force Text mining is performed with text that conforms to the attribute value condition of the positive example as a positive example and text that conforms to the attribute value condition of a negative example as a negative example.
  • the attribute value condition specifying means 301 receives only positive example attribute value conditions from the user, and text mining means 302 1S handles all text that does not match the positive example attribute value conditions as negative example text. It can also be made.
  • the positive example negative example text extraction unit 304 extracts the positive example text and the negative example text for all the texts stored in the text storage unit 202.
  • the attribute value condition specifying means 301 is not provided, and the text mining means 302 is an element that frequently appears in all the texts stored in the text storage unit 202 (word, set of words, phrases, The structure which extracts a sentence etc. is possible. Also in this case, the positive example negative text extraction unit 304 extracts the positive example text and the negative example text for all the texts stored in the text storage unit 202.
  • data mining is performed by using text that appears as selected by the IJ user as a positive example (type 2) and non-appearing text as a negative example (type 2).
  • Attribute values or combinations of attribute values that are effective for classifying negative types (type 2) and negative examples (type 2) are extracted and output.
  • the text in which the feature (text) selected by the user appears (not necessarily the text in which all the selected features appear). ) Is presented to the user. [0056] Therefore, according to the first embodiment of the present invention, the user does not explicitly specify the user based on the feature (text) selected by the user! / It is possible to know the attribute values (or combinations of attribute values) that are valid for the two types of positive examples and the second type of negative examples.
  • the configuration of the second embodiment of the present invention in FIG. 1 is the same as the configuration of the first embodiment of the present invention.
  • the second embodiment of the present invention is the condition of the first type positive example among the attribute value conditions that are the conditions of the first type positive example and the first type negative example specified by the user.
  • Perform text mining based on attribute value conditions extract the part effective for classifying the first positive example from the whole text as features, and select the features to be noticed from the features. To select.
  • the text mining apparatus selects the text corresponding to the first type positive example and the first type negative example as the second type positive example text in which the selected feature appears.
  • the second type of negative example text that does not appear is generated, and an attribute value condition that is a new feature that is effective for classifying the second type positive example and the second type negative example is generated. To do.
  • the second embodiment of the present invention is configured in comparison with the first embodiment of the present invention because the text mining means 302 only needs to perform mining based on the first positive example. When it becomes easy, it has a! / ⁇ ⁇ effect.
  • FIG. 5 is a block diagram showing a configuration of the third exemplary embodiment of the present invention.
  • the third embodiment of the present invention includes an input device 10, a storage device 22, a data processing device 32 (for example, a computer), an output device 40, and a text mining program 50.
  • a data processing device 32 for example, a computer
  • an output device 40 for example, a text mining program 50.
  • the text mining program 50 includes the attribute value condition specifying means 301, the text mining means 302, the analysis target feature specifying means 303, the positive example negative example text extracting means 304 of the first and second embodiments of the present invention. And the function of the attribute feature extraction means 305 are realized.
  • the text mining program 50 is stored in the storage device 22 or other storage means (not shown).
  • the text mining program 50 is read into the stored data processing device 32 and executed. Thus, the operation of the data processing device 32 is controlled.
  • the data processing device 32 executes the same processing as the processing of the data processing device 31 in the first and second embodiments under the control of the text mining program 50.
  • the third embodiment of the present invention has an effect that it is easy to implement because the processing of FIG. 4 is executed by the cooperation of hardware and software.
  • the attribute storage unit 201 stores attribute values of four types of attributes of “inquiry type”, “model name”, “reception date”, and “person in charge” for each text. Has been.
  • the text storage unit 202 stores in advance text to be mined (contents of response records).
  • the attribute value condition designating unit 301 reads the designation of the attribute value condition of the positive example and the negative example of the text mining by the user through the input device 10.
  • the text mining means 302 corrects the text stored in the text storage unit 202 with "inquiry type is repair request” and "reception date is October 2005". As an example, text mining is performed with “inquiry type is repair request” and “reception year / month power 3 ⁇ 4005 is not October” as a negative example to classify the positive example text and the negative example text. Extract features that are valid for.
  • Tl, T5, and ⁇ 7 are positive examples (first type), and ⁇ 6 is a negative example (first type).
  • ⁇ 2 The text in ⁇ 4 is not used for text mining because it does not apply to positive or negative attribute value conditions. I can't.
  • the text mining means 302 outputs the extracted feature to the user via the output device 40 and stores it in the mining result holding unit 203.
  • FIG. 6 is an explanatory diagram showing an example of the result of text mining.
  • the text mining means 302 extracts words appearing in the text as features, and stores the features as shown in FIG. 6 in the mining result holding unit 203.
  • the analysis target feature specifying unit 303 causes the user to select a feature of interest and reads the selection content via the input device 10.
  • the analysis target feature designating unit 303 can input, for example, whether or not to select a feature for each feature output by the text mining unit 302, and allow the user to select the feature. it can.
  • FIG. 7 is an explanatory diagram showing an example of contents displayed on the output device 40.
  • the analysis target feature specifying means 303 displays a check box for indicating that the user has selected the feature, Read the features that the user has checked the checkbox.
  • the word “node disk” and the word “HDD” are selected by the user.
  • Positive example negative example text extraction means 304 is a positive example (first type) or negative example (first type) read by attribute value condition specifying means 301 among the texts stored in text storage unit 202. For each text that meets one of the attribute value conditions of (), it is determined whether or not the feature specified by the user appears. If the feature appears, as a positive example (type 2), If no feature appears, it is extracted as a negative example (type 2).
  • the text of T1 conforms to the attribute value condition of the positive example (first type) read by the attribute value condition specifying means 301, and further includes the word “no, dead disk”. Extracted as a positive example (type 2).
  • the text of ⁇ 2— ⁇ 4 does not apply to both the positive (first type) attribute value condition and the negative (second type) attribute value condition! /, So the positive example (second type) ) Or negative examples (type 2).
  • the text of ⁇ 5 is the attribute value of the positive example (first type) read by the attribute value condition specifying means 301 Although the condition is met, the word “no, disk” does not include the word “HDD”, so it is extracted as a negative example (type 2).
  • the text of T6 conforms to the attribute value condition of the negative example (first type) read by the attribute value condition specifying means 301 and includes the word “HDD”. ).
  • the text of T7 is extracted as a positive example (second type) because it matches the attribute value condition of the positive example (first type) read by the attribute value condition specifying means 301 and includes the word “HDD”. It is done. The same processing is performed for other texts.
  • the attribute feature extraction unit 305 performs processing on the positive example (second type) and negative example (second type) text extracted by the positive example negative example text extraction unit 304. , Apply data mining to extract attribute values or combinations of attribute values that are effective for classifying positive (second type) text and negative (second type) text. Is output to the user via the output device 40.
  • a decision tree for classifying positive example (second type) text and negative example (second type) text using a combination of attribute values as a branching condition by data mining is generated.
  • the combination of attribute values corresponding to the path leading to the positive example (type 2) in the decision tree is extracted as a combination of attribute values that are characteristic of the text of the positive example (type 2).
  • FIG. 8 is an explanatory diagram showing an example of a decision tree.
  • FIG. 9 is an explanatory diagram showing an output example of the attribute feature extraction means 305 in this case.
  • FIG. 10 is an explanatory diagram showing the logic of this embodiment.
  • the user sets the inquiry type as repair request text in October 2005 as the positive example (first type) (Fig. 10).
  • words such as “hard disk”, “OS”, “HDD”, and “error” are obtained.
  • data mining is performed focusing on the "hard disk” and "HDD” selected by the user
  • the inquiry type is the text of the repair request that is the target of text mining.
  • the user initially specified as a positive example was the condition that the inquiry type was repair request and the reception date was October 2005.
  • the word “hard disk” or “HDD” is characteristic not only in October 2005 but also in the text of repair requests in November 2005. You can see that it appears prominently on the model.
  • the second text mining device of the present invention includes a storage device that stores a plurality of texts and attribute values for each of the texts, and a data processing device.
  • the data processing device reads the text and the attribute value for each text from the storage device, and sets the attribute value condition that is the condition of the first type positive example and the first type negative example specified by the user.
  • Text mining is applied to the text and attribute values for each text, and a portion effective to classify the first type positive example and the first type negative example is extracted as a feature, and the memory is stored.
  • the result is stored in the device as the mining result, the user selects the feature to be focused on from the features, and the text corresponding to the first type positive example and the first type negative example is selected.
  • Classify the second type of positive example and the second type of negative example by classifying the type 2 positive example text where the feature appears and the selected type 2 negative example text where the feature does not appear Generates an attribute value condition that is a new feature that is effective for output and outputs it to the output device That.
  • a third text mining device of the present invention is the first or second text mining device for classifying a first type positive example and a first type negative example.
  • the valid part is “a phrase that appears more frequently in the first type of positive text and less frequently appears in the first type of negative text” based on the first criterion set in advance. is there.
  • the fourth text mining device of the present invention provides the first, second, or third text.
  • An attribute value condition that is a new feature effective for classifying a positive example of type 2 and a negative example of type 2 is a mining device based on a second criterion set in advance. This is a combination of “attribute value for the second type negative example with high appearance frequency as attribute value for the second type positive example”
  • the fifth text mining device of the present invention has a data processing device.
  • the data processing device inputs the attribute value condition that is the condition of the first type positive example and the first type negative example specified by the user, and is based on the attribute value condition that is the condition of the first type positive example.
  • Text mining to extract the effective part for classifying the first type of positive example as a feature, let the user select the feature to be focused on from the features, the first type of positive example, and , Classifying the text corresponding to the first negative example into the second positive text where the selected feature appears and the second negative text where the selected feature does not appear.
  • An attribute value condition that is a new feature effective for classifying the two types of positive examples and the second type of negative examples is generated.
  • the sixth text mining device of the present invention has a data processing device.
  • the data processing device performs text mining based on the attribute value condition that is the condition of the first type of positive example specified by the user, and the text that meets the attribute value condition is the first type of positive example. And the remaining text is extracted as a feature that is effective for classifying it as a negative example of the first type, the feature to be noted is selected from the features, and the positive example of the first type, and The text corresponding to the negative example of the first type is separated into the text of the second type of positive example in which the selected feature appears and the text of the second type of negative example in which the selected feature does not appear, An attribute value condition that is a new feature effective for classifying the second type positive example and the second type negative example is generated.
  • the seventh text mining device of the present invention has a data processing device.
  • the data processing apparatus extracts frequently occurring elements in all stored text as features, causes the user to select a feature to be noticed from among the features, and is a positive example text in which the selected feature appears. And a negative example text in which the selected feature does not appear, and an attribute value condition that is a new feature effective for classifying the positive example and the negative example is generated.
  • a second text mining method of the present invention is a text mining method in a text mining device comprising a plurality of texts, a storage device for storing attribute values for each of the texts, and a data processing device.
  • the data processing device is the storage device.
  • the procedure for reading out the force, the text, and the attribute value for each text, and the attribute value condition that is the condition of the first type positive example and the first type negative example specified by the user Applies to the text and the attribute value for each text, performs text mining, extracts a portion effective for classifying the first type positive example and the first type negative example as a feature, and stores it in the storage device.
  • the second type of positive example text where the selected feature appears and the second type of negative example text where the selected feature does not appear are separated into the second type positive example and the second type negative example.
  • a third text mining method of the present invention is the first or second text mining method for classifying a first type positive example and a first type negative example.
  • the valid part is “a phrase that appears more frequently in the first type of positive text and less frequently appears in the first type of negative text” based on the first criterion set in advance. is there.
  • the fourth text mining method of the present invention is the first, second, or third text mining method, and classifies the second type positive example and the second type negative example.
  • the attribute value condition that is a new feature that is effective for this is based on the second criterion set in advance, ⁇ for the second type negative example that appears frequently as the attribute value for the second type positive example.
  • the attribute value is a combination of “occurrence frequency is low and attribute value”.
  • the fifth text mining method of the present invention is a procedure in which the text mining device inputs an attribute value condition that is a condition of the first type positive example and the first type negative example specified by the user.
  • Text mining based on the attribute value condition, which is the condition for the first type positive example, and a procedure for extracting the effective part for classifying the first type positive example as a feature The procedure for causing the user to select a feature to be selected and the text corresponding to the first type positive example and the first type negative example are the same as the second type positive example text in which the selected feature appears.
  • the text mining device performs text mining based on the attribute value condition which is the first type of positive condition specified by the user, and sets the attribute value condition. Use the above-mentioned procedure to extract features that are useful for classifying the matching text as a positive example of the first type and classify the remaining text as a negative example of the first type.
  • the first type positive example and the text corresponding to the first type negative example, the second type positive example text where the selected feature appears and the selected feature appear To create a new feature that is useful for classifying the positive examples of the second type and the negative examples of the second type. Including.
  • the text mining device extracts a feature that frequently appears in all stored text as a feature, and a feature to be noted from the feature. This is a new effective method for classifying positive and negative examples by separating them into positive text in which the selected feature appears and negative text in which the selected feature does not appear. Generating an attribute value condition that is a unique feature.
  • a second text mining program of the present invention is a text mining program in a text mining device comprising a plurality of texts, a storage device for storing attribute values for each text, and a data processing device. And a procedure for reading the text and the attribute value for each text from the storage device, and an attribute value condition which is a condition of the first type positive example and the first type negative example specified by the user. Is applied to the text and the attribute value for each text, and text mining is performed, and a portion effective for classifying the first type positive example and the first type negative example is extracted as a feature.
  • a procedure for storing the result as a mining result in the storage device a procedure for allowing the user to select a feature to be noted from the extracted features, a first type positive example, and a first type negative
  • the text corresponding to the example is separated into the second type of positive text where the selected feature appears and the second type of negative text where the selected feature does not appear.
  • a procedure for generating an attribute value condition which is a new feature effective for classifying the second type negative example and the second type negative example, and outputting the attribute value condition to the output device.
  • the third text mining program of the present invention provides the first or second text manager.
  • An inning program that is effective for classifying positive cases of type 1 and negative types of type 1 is based on the first criteria set in advance. “Phrases with a low frequency of appearance in the first type of negative text with a high frequency of occurrence in the text”.
  • a fourth text mining program of the present invention is the first, second, or third text mining program, comprising a second type positive example and a second type negative example.
  • the attribute value condition which is a new feature that is effective for classification, is based on the second criterion set in advance, ⁇ for the second type negative example that appears frequently as the attribute value for the second type positive example. This is a combination of “attribute values with low frequency of appearance as attribute values”.
  • the fifth text mining program of the present invention includes a procedure for inputting an attribute value condition which is a condition of the first type positive example and the first type negative example designated by the user, and the first type Based on the attribute value condition that is the condition of the positive example!
  • Text mining and extracting the effective part for classifying the first type of positive example as features, and paying attention to the feature The procedure for causing the user to select a feature to be selected, the text corresponding to the first positive example, and the first negative example, the text of the second positive example in which the selected feature appears, and Attribute to become a new feature effective for classifying the second type negative example and the second type negative example, and the procedure to sort into the second type negative example text where the selected feature does not appear
  • the text mining device executes the procedure for generating the value condition.
  • the sixth text mining program of the present invention performs text mining based on the attribute value condition, which is the first type of positive condition specified by the user, and conforms to the attribute value condition.
  • the selection procedure and the text corresponding to the first type positive example and the first type negative example, the second type positive text where the selected feature appears and the selected feature do not appear.
  • the seventh text mining program of the present invention uses the procedure for extracting elements frequently appearing in all stored text as features and the features to be noted from the features. This is a new procedure effective for classifying positive and negative examples by separating them into positive examples in which the selected features appear and negative examples in which the selected features do not appear.
  • the text mining apparatus is caused to execute a procedure for generating a characteristic attribute value condition.
  • An eighth text mining device of the present invention is a text mining device that extracts and outputs features from a set of texts with attributes, and inputs an analysis target feature from among the features.
  • a positive example negative example text extraction unit that extracts positive example text and negative example text from the text, depending on whether or not the input feature appears in the text, depending on the designation means, the positive example text and the positive example text Attribute feature extracting means for extracting attribute features effective for classifying negative example texts.
  • a ninth text mining device of the present invention comprises a text storage means for holding a set of texts, an attribute storage means for holding attribute values for the text, a condition designating means for inputting text mining conditions, Text mining means for extracting text features according to the conditions, analysis target feature designating means for inputting features of interest from the features, and whether or not the input features appear in the text, Positive example negative example text extracting means for extracting positive example text and negative example text from the text, and an attribute for extracting attribute features effective for classifying the positive example text and the negative example text Extracting means.
  • An eighth text mining method of the present invention is a text mining method in which a computer extracts and outputs features from a set of texts with attributes, and the features to be noted are selected from the features. And the computer extracts positive example text and negative example text from the text depending on whether or not the input feature appears in the text, and the positive example text and the negative example.
  • the computer extracting attribute features useful for classifying the text.
  • a ninth text mining method of the present invention a set of texts and attribute values for the text are stored in a computer, and text mining conditions are input to the computer.
  • a step of extracting a feature a step of inputting a feature to be noticed from the features, a positive example text and a negative example text from the text depending on whether the inputted feature appears in the text. Extract And an attribute characteristic effective for classifying the positive example text and the negative example text
  • An eighth text mining program of the present invention is a text mining program that causes a computer to execute a process of extracting and outputting features from a set of texts with attributes.
  • An analysis target feature specifying process for inputting a feature a positive example text extracting process for extracting a positive example text and a negative example text from the text according to whether or not the input feature appears in the text, and
  • An attribute feature extraction process for extracting an attribute-like feature effective for classifying the positive example text and the negative example text is executed by the computer.
  • the ninth text mining program of the present invention includes a process for storing a set of texts and an attribute value for the text in a storage device, a condition designating process for inputting text mining conditions, and a text
  • a positive example negative example text extraction process that extracts a positive example text and a negative example text
  • an attribute feature extraction process that extracts attribute features that are effective for classifying the positive example text and the negative example text. Let it run.
  • a mining system for extracting effective knowledge such as defect information and problems from inquiry data recorded at a call center and paper document data such as reports, etc. It can be applied to uses such as programs. Also,

Abstract

Un dispositif d'exploration de texte génère une valeur d'attribut (ou une combinaison de valeurs d'attribut) qui est efficace pour une nouvelle classification de texte sur la base d'une caractéristique (du texte) sélectionnée par un utilisateur et qu'un utilisateur ne désigne pas de manière explicite. Un dispositif de traitement de données réalise une exploration de texte sur la base de conditions d'une valeur d'attribut, c'est-à-dire, de conditions d'un exemple positif de première classe et d'un exemple négatif de première classe désignés par l'utilisateur et extrait des parties effectives pour classifier l'exemple positif de première classe et l'exemple négatif de première classe en tant que caractéristiques. Le dispositif de traitement de données amène l'utilisateur à sélectionner une caractéristique remarquable à partir des caractéristiques extraites. Le dispositif de traitement de données classifie les textes correspondant à l'exemple positif de première classe et à l'exemple négatif de première classe en texte positif de seconde classe dans lequel une caractéristique sélectionnée apparaît et en texte d'exemple négatif de seconde classe dans lequel une caractéristique sélectionnée n'apparaît pas. Le dispositif de traitement de données génère une valeur d'attribut qui devient une nouvelle caractéristique effective pour classifier l'exemple positif de première classe et l'exemple négatif de seconde classe.
PCT/JP2007/072527 2006-11-22 2007-11-21 Dispositif d'exploration de texte, procédé d'exploration de texte et programme d'exploration de texte WO2008062822A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2006315862A JP2010061176A (ja) 2006-11-22 2006-11-22 テキストマイニング装置、テキストマイニング方法、および、テキストマイニングプログラム
JP2006-315862 2006-11-22

Publications (1)

Publication Number Publication Date
WO2008062822A1 true WO2008062822A1 (fr) 2008-05-29

Family

ID=39429751

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2007/072527 WO2008062822A1 (fr) 2006-11-22 2007-11-21 Dispositif d'exploration de texte, procédé d'exploration de texte et programme d'exploration de texte

Country Status (2)

Country Link
JP (1) JP2010061176A (fr)
WO (1) WO2008062822A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010072779A (ja) * 2008-09-17 2010-04-02 Mitsubishi Electric Corp データ分類装置及びコンピュータプログラム及びデータ分類方法
WO2011078194A1 (fr) * 2009-12-25 2011-06-30 日本電気株式会社 Système d'exploration de texte, procédé d'exploration de texte et support d'enregistrement
CN109284383A (zh) * 2018-10-09 2019-01-29 北京来也网络科技有限公司 文本处理方法及装置

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014034557A1 (fr) 2012-08-31 2014-03-06 日本電気株式会社 Dispositif d'exploration de texte, procédé d'exploration de texte et support d'enregistrement lisible par ordinateur
WO2014118976A1 (fr) 2013-02-01 2014-08-07 富士通株式会社 Procédé d'apprentissage, dispositif de conversion d'informations et programme d'apprentissage
JP6004015B2 (ja) 2013-02-01 2016-10-05 富士通株式会社 学習方法、情報処理装置および学習プログラム
JP6004016B2 (ja) 2013-02-01 2016-10-05 富士通株式会社 情報変換方法、情報変換装置および情報変換プログラム

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003141134A (ja) * 2001-11-07 2003-05-16 Hitachi Ltd テキストマイニング処理方法及びその実施装置
JP2006031198A (ja) * 2004-07-14 2006-02-02 Nec Corp テキストマイニング装置及びそれに用いるテキストマイニング方法並びにそのプログラム
JP2006244298A (ja) * 2005-03-04 2006-09-14 Mitsubishi Electric Corp テキストマイング方法及びテキストマイニング装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003141134A (ja) * 2001-11-07 2003-05-16 Hitachi Ltd テキストマイニング処理方法及びその実施装置
JP2006031198A (ja) * 2004-07-14 2006-02-02 Nec Corp テキストマイニング装置及びそれに用いるテキストマイニング方法並びにそのプログラム
JP2006244298A (ja) * 2005-03-04 2006-09-14 Mitsubishi Electric Corp テキストマイング方法及びテキストマイニング装置

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010072779A (ja) * 2008-09-17 2010-04-02 Mitsubishi Electric Corp データ分類装置及びコンピュータプログラム及びデータ分類方法
WO2011078194A1 (fr) * 2009-12-25 2011-06-30 日本電気株式会社 Système d'exploration de texte, procédé d'exploration de texte et support d'enregistrement
US8805853B2 (en) 2009-12-25 2014-08-12 Nec Corporation Text mining system for analysis target data, a text mining method for analysis target data and a recording medium for recording analysis target data
JP5772599B2 (ja) * 2009-12-25 2015-09-02 日本電気株式会社 テキストマイニングシステム、テキストマイニング方法および記録媒体
CN109284383A (zh) * 2018-10-09 2019-01-29 北京来也网络科技有限公司 文本处理方法及装置

Also Published As

Publication number Publication date
JP2010061176A (ja) 2010-03-18

Similar Documents

Publication Publication Date Title
Tandel et al. A survey on text mining techniques
US9336496B2 (en) Computer-implemented system and method for generating a reference set via clustering
US20240028837A1 (en) Device and method for machine reading comprehension question and answer
US8468167B2 (en) Automatic data validation and correction
WO2008062822A1 (fr) Dispositif d'exploration de texte, procédé d'exploration de texte et programme d'exploration de texte
JP5023176B2 (ja) 特徴語抽出装置及びプログラム
KR102334236B1 (ko) 음성 변환 Text Data에서 의미있는 키워드 추출 방법과 활용
KR20070009338A (ko) 이미지 상호간의 유사도를 고려한 이미지 검색 방법 및장치
Underwood Understanding genre in a collection of a million volumes
JP5780036B2 (ja) 抽出プログラム、抽出方法及び抽出装置
JP5224532B2 (ja) 評判情報分類装置及びプログラム
JP7193000B2 (ja) 類似文書検索方法、類似文書検索プログラム、類似文書検索装置、索引情報作成方法、索引情報作成プログラムおよび索引情報作成装置
JP5439235B2 (ja) 文書分類方法、文書分類装置、およびプログラム
Wijewickrema Impact of an ontology for automatic text classification
Jain et al. Automatic Question Tagging using k-Nearest Neighbors and Random Forest
US20220138259A1 (en) Automated document intake system
Anđelić et al. Text classification based on named entities
JP2003263441A (ja) キーワード決定データベース作成方法、キーワード決定方法、装置、プログラム、および記録媒体
US20180011919A1 (en) Systems and method for clustering electronic documents
JP7427510B2 (ja) 情報処理装置、情報処理方法およびプログラム
JP2019061522A (ja) 文書推薦システム、文書推薦方法および文書推薦プログラム
JP7135730B2 (ja) 要約生成方法及び要約生成プログラム
JP4426893B2 (ja) 文書検索方法、文書検索プログラムおよびこれを実行する文書検索装置
KR20220041336A (ko) 중요 키워드 추천 및 핵심 문서를 추출하기 위한 그래프 생성 시스템 및 이를 이용한 그래프 생성 방법
JP2008282328A (ja) テキスト分類装置、テキスト分類方法及びテキスト分類プログラム並びにそのプログラムを記録した記録媒体

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07832257

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07832257

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP