TWI703457B - Text exploration method, text exploration program and text exploration device - Google Patents
Text exploration method, text exploration program and text exploration device Download PDFInfo
- Publication number
- TWI703457B TWI703457B TW108106540A TW108106540A TWI703457B TW I703457 B TWI703457 B TW I703457B TW 108106540 A TW108106540 A TW 108106540A TW 108106540 A TW108106540 A TW 108106540A TW I703457 B TWI703457 B TW I703457B
- Authority
- TW
- Taiwan
- Prior art keywords
- symbiosis
- screen
- word
- interest
- network
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/338—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
- G06F2216/03—Data mining
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
Abstract
本發明之文字探勘方法具備如下步驟:自文字資料擷取單詞之步驟;對所擷取之生成共生矩陣之步驟;基於所生成之共生矩陣生成共生網路之步驟;及顯示包含所生成之共生網路之畫面之步驟。於在包含基於所指定之文字資料之整體之第1共生網路之第1畫面內輸入指定關注詞之指示時,自包括所指定之文字資料中含關注詞之部分之限定文字資料擷取單詞,對所擷取之單詞使用限定文字資料生成第2共生矩陣,基於第2共生矩陣生成第2共生網路,顯示包含第2共生網路之第2畫面。 The text mining method of the present invention has the following steps: a step of extracting words from text data; a step of generating a symbiosis matrix from the extracted; a step of generating a symbiosis network based on the generated symbiosis matrix; and displaying the generated symbiosis The steps of the network screen. When inputting instructions for specifying the word of interest in the first screen of the first symbiosis network containing the entirety of the specified text data, extract words from the limited text data that includes the part of the specified text data that contains the word of interest , Use limited text data for the extracted words to generate a second co-occurrence matrix, generate a second co-occurrence network based on the second co-occurrence matrix, and display the second screen containing the second co-occurrence network.
Description
本發明係關於文字探勘,尤其關於顯示包含單詞之共生網路之畫面之文字探勘方法、文字探勘程式及文字探勘裝置。 The present invention relates to text mining, in particular to text mining methods, text mining programs, and text mining devices that display pictures of symbiotic networks containing words.
近年來,對自由記述之文字資料進行分析,並根據分析結果求出有用之資訊之文字探勘受到關注。於文字探勘中,例如,藉由自分析對象之文字資料擷取單詞,並對單詞之出現頻度或出現傾向等進行解析,而求出資訊。 In recent years, the analysis of freely written text data and the search for useful information based on the analysis results have attracted attention. In text exploration, for example, by extracting words from the text data of the analysis object, and analyzing the appearance frequency or tendency of the words, the information is obtained.
於對自由記述之文字資料進行分析時,分析者並非於初始階段中主觀地選擇對象,而必須掌握文字資料之整體情況。因此,有分析者使用文字資料中所包含之單詞之共生網路之情形。 When analyzing freely described textual data, the analyst does not subjectively select objects in the initial stage, but must grasp the overall situation of the textual data. Therefore, there are cases where an analyst uses a symbiotic network of words contained in text data.
圖19係表示共生網路之例之圖。共生網路係自文字資料擷取較多地包含於相同句子中之單詞之對,將其結果利用無向圖表達者。於分析對象之文字資料中單詞Wa與單詞Wb較多地包含於相同句子中之情況下,於共生網路中,包含與單詞Wa對應之節點、與單詞Wb對應之節點、及將兩者連接之邊。圖19所示之共生網路包含與「職員(staff)」對應之節點、與「對應」對應之節點、及將兩者連接之邊。若觀察圖19所示之共生網路則可知,於分析對象之文字資料中「職員」與「對應」較多地包含於相同句子中。 Fig. 19 is a diagram showing an example of a symbiotic network. The symbiosis network extracts more pairs of words contained in the same sentence from text data, and uses undirected graphs to express the results. When the word Wa and word Wb in the text data of the analysis object are mostly contained in the same sentence, the symbiosis network includes the node corresponding to the word Wa, the node corresponding to the word Wb, and connecting the two The side. The symbiosis network shown in Fig. 19 includes a node corresponding to "staff", a node corresponding to "corresponding", and an edge connecting the two. Observing the symbiosis network shown in Fig. 19, it can be seen that in the text data of the analysis object, "staff" and "corresponding" are mostly contained in the same sentence.
一般而言,共生網路係基於所指定之文字資料之整體 而生成。以下,將此種共生網路稱為「整體共生網路」。分析者根據自己建立之假設或分析目的而自整體共生網路選擇複數個應關注之單詞(以下,稱為關注詞),考慮關注詞進行以後之分析。 Generally speaking, the symbiosis network is generated based on the whole specified text data. Hereinafter, this kind of symbiosis network is referred to as the "holistic symbiosis network." The analyst selects a plurality of words that should be paid attention to from the overall symbiosis network according to the hypothesis or analysis purpose established by himself (hereinafter referred to as the word of interest), and considers the words of interest for subsequent analysis.
分析者於選擇關注詞時,為了判斷已選擇之關注詞是否適合分析目的等,而考察於包含關注詞之句子之中如何使用關注詞。因此,分析者有時使用基於包括所指定之文字資料中含關注詞之句子之文字資料(以下,稱為限定文字資料)的共生網路。再者,此處所言之「含關注詞之文章」並非僅指包含關注詞之單一之句子之情況,存在意指包括含關注詞之句子之段落等被分割為塊單位之複數個句子(句子之集合)的情況。以下,將此種共生網路稱為「限定共生網路」。分析者藉由使用限定共生網路,可掌握限定文字資料之內容。分析者反覆參照整體共生網路與限定共生網路,直至選擇所有關注詞為止。 When the analyst selects the word of interest, in order to determine whether the selected word of interest is suitable for the purpose of analysis, etc., he examines how to use the word of interest in the sentence containing the word of interest. Therefore, analysts sometimes use a symbiosis network based on text data (hereinafter referred to as limited text data) including sentences containing words of interest in the specified text data. Furthermore, the "article containing the word of interest" mentioned here does not only refer to the case of a single sentence containing the word of interest. It means that the paragraph including the sentence containing the word of interest is divided into plural sentences (sentences). Of the collection). Hereinafter, this kind of symbiosis network is referred to as a "limited symbiosis network." The analyst can grasp the content of the limited text data by using the limited symbiosis network. The analyst repeatedly refers to the overall symbiosis network and the limited symbiosis network until all the words of interest are selected.
以下,考慮生成文字資料中所包含之單詞之共生網路,顯示包含所生成之共生網路之畫面之文字探勘裝置。於日本專利特開平8-314980號公報中,記載有對複數個文書之各者生成整體共生網路,顯示包含所生成之複數個整體共生網路之畫面之文件資料庫顯示裝置。該顯示裝置係自複數個整體共生網路之中檢索利用者輸入之單詞,將檢索出之單詞於畫面內強調顯示。 In the following, consider generating a symbiosis network of words contained in text data, and a text exploration device that displays a screen containing the generated symbiosis network. In Japanese Patent Laid-Open No. 8-314980, there is a document database display device that generates an overall symbiosis network for each of a plurality of documents, and displays a screen containing the generated plurality of overall symbiosis networks. The display device retrieves the words input by the user from a plurality of overall symbiosis networks, and highlights the retrieved words on the screen.
習知之文字探勘裝置係基於所指定之文字資料之整體而生成共生網路。因此,根據習知之文字探勘裝置,可容易地顯示包含整體共生網路之畫面。 The conventional text exploration device generates a symbiosis network based on the whole specified text data. Therefore, according to the conventional text exploration device, the screen including the overall symbiotic network can be easily displayed.
另一方面,於使用習知之文字探勘裝置顯示包含限定共生網路之畫面時,分析者必須進行繁雜之操作。具體而言,分析者每次自整體共生網路之中選擇1個關注詞時,均必須基於所指定之文字資料生成限定文字資料,並將所生成之限定文字資料賦予至文字探勘裝置。又,分析者於選擇關注詞時,參照整體共生網路與限定共生網路之兩者。因此,文字探勘裝置必須保存整體共生網路之圖像資料與限定共生網路之圖像資料之兩者。然而,於生成較多之共生網路之情況下,圖像資料之保存與管理變得困難。 On the other hand, when using a conventional text mining device to display a screen containing a limited symbiotic network, the analyst must perform complicated operations. Specifically, every time an analyst selects a word of interest from the overall symbiosis network, he must generate limited text data based on the specified text data, and assign the generated limited text data to the text exploration device. In addition, the analyst refers to both the overall symbiosis network and the limited symbiosis network when selecting the words of interest. Therefore, the text mining device must store both the image data of the overall symbiosis network and the image data of the limited symbiosis network. However, when more symbiotic networks are generated, the preservation and management of image data becomes difficult.
因此,本發明之目的在於提供一種可將包含指定關注詞時之共生網路之畫面利用簡單之操作顯示的文字探勘方法、文字探勘程式及文字探勘裝置。 Therefore, the object of the present invention is to provide a text mining method, text mining program, and text mining device that can display the screen of the symbiosis network containing the specified word of interest with simple operations.
本發明之第1態樣係一種文字探勘方法,其係顯示包含文字資料之分析結果之畫面者;其特徵在於,其具備如下步驟:自文字資料擷取單詞之步驟;對上述單詞生成共生矩陣之步驟;基於上述共生矩陣生成共生網路之步驟;及顯示包含上述共生網路之畫面之步驟;於在包含基於所指定之文字資料之整體之第1共生網路之第1畫面內輸入指定關注詞之指示時,擷取上述單詞之步驟係自包括上述所指定之文字資料中含上述關注詞之部分之限定文字資料擷取上述單詞,生成上述共生矩陣之步驟係對上述單詞使用上述限定文字資料生成第2共生矩陣,生成上述共生網路之步驟係基於上述第2共生矩陣生成第2共生網路,顯示上述畫面之步驟係顯示包含上述 第2共生網路之第2畫面。 The first aspect of the present invention is a text exploration method, which displays a screen containing the analysis results of text data; it is characterized in that it has the following steps: a step of extracting words from text data; and generating a co-occurrence matrix for the words The steps of; the steps of generating a symbiosis network based on the above symbiosis matrix; and the steps of displaying the screen containing the above symbiosis network; enter the designation in the first screen of the first symbiosis network including the entirety based on the specified text data When the word of interest is instructed, the step of extracting the word is to extract the word from the limited text data that includes the part of the word of interest in the specified text data, and the step of generating the co-occurrence matrix is to apply the restriction to the word The text data generates a second symbiosis matrix, the step of generating the symbiosis network is based on the second symbiosis matrix to generate a second symbiosis network, and the step of displaying the screen is to display a second screen containing the second symbiosis network.
本發明之第2態樣係於本發明之第1態樣中,特徵在於,藉由於上述第1畫面內選擇上述第1共生網路中所包含之1個或複數個節點,並選擇分析開始,而輸入將與上述節點對應之單詞指定為上述關注詞之指示。 The second aspect of the present invention is in the first aspect of the present invention, and is characterized in that by selecting one or more nodes included in the first symbiosis network in the first screen, and selecting the analysis start , And the input indicates that the word corresponding to the above node is designated as the above-mentioned word of interest.
本發明之第3態樣係於本發明之第1態樣中,特徵在於,藉由於上述第1畫面內繼續選擇上述第1共生網路中所包含之1個節點,而輸入將與上述節點對應之單詞指定為上述關注詞之指示。 The third aspect of the present invention is in the first aspect of the present invention, and is characterized in that by continuing to select a node included in the first symbiosis network in the first screen, the input will be the same as the node The corresponding word is designated as an indication of the above-mentioned word of interest.
本發明之第4態樣係於本發明之第1態樣中,特徵在於,藉由於上述第1畫面內繼續選擇上述第1共生網路中所包含之1條邊,而輸入將與連接於上述邊之2個節點對應之單詞指定為上述關注詞之指示。 The fourth aspect of the present invention is in the first aspect of the present invention, and is characterized in that by continuing to select an edge included in the first symbiosis network in the first screen, the input will be connected to the The words corresponding to the two nodes of the edge are designated as the indication of the above-mentioned word of interest.
本發明之第5態樣係於本發明之第1態樣中,特徵在於,藉由於上述第1畫面內選擇上述第1共生網路中所包含之1條或複數條邊,並選擇分析開始,而輸入將與連接於上述邊之複數個節點對應之單詞指定為上述關注詞之指示。 The fifth aspect of the present invention is in the first aspect of the present invention, characterized in that by selecting one or more edges included in the first symbiosis network in the first screen, and selecting the start of analysis, The input indicates that the words corresponding to the plural nodes connected to the aforementioned edges are designated as the aforementioned words of interest.
本發明之第6態樣係於本發明之第1態樣中,特徵在於,於包含複數個第2共生網路之第2畫面內輸入合併指示時,顯示上述畫面之步驟係將上述複數個第2共生網路以標籤形式顯示。 The sixth aspect of the present invention is in the first aspect of the present invention, and is characterized in that when the merge instruction is entered in the second screen including a plurality of second symbiotic networks, the step of displaying the above screen is to change the above The second symbiosis network is displayed as a label.
本發明之第7態樣係於本發明之第6態樣中,特徵在於,藉由於上述第2畫面內點住一個第2共生網路並於另一個第2共生網路內鬆開,而輸入上述合併指示。 The seventh aspect of the present invention is in the sixth aspect of the present invention, and is characterized in that by clicking on a second symbiosis network in the second screen and releasing it in another second symbiosis network, Enter the above merge instructions.
本發明之第8態樣係於本發明之第1態樣中,特徵在 於,上述限定文字資料包括上述所指定之文字資料中含上述關注詞之句子。 The eighth aspect of the present invention is in the first aspect of the present invention, and is characterized in that the above-mentioned limited text data includes a sentence containing the aforementioned word of interest in the above-mentioned designated text data.
本發明之第9態樣係於本發明之第8態樣中,特徵在於,指定複數個關注詞時之上述限定文字資料包括上述所指定之文字資料中含上述複數個關注詞之全部之句子。 The ninth aspect of the present invention is in the eighth aspect of the present invention, and is characterized in that the above-mentioned limited text data when specifying a plurality of words of interest include all sentences in the specified text data that contain the plurality of words of interest. .
本發明之第10態樣係於本發明之第8態樣中,特徵在於,指定複數個關注詞時之上述限定文字資料包括上述所指定之文字資料中含上述複數個關注詞之任一者之句子。 The tenth aspect of the present invention is in the eighth aspect of the present invention, and is characterized in that the above-mentioned limited text data when specifying a plurality of words of interest includes any one of the above-mentioned plurality of words of interest in the specified text data The sentence.
本發明之第11態樣係於本發明之第1態樣中,特徵在於,生成上述共生矩陣之步驟係生成以Jaccard係數作為要素之共生矩陣。 The eleventh aspect of the present invention is in the first aspect of the present invention, and is characterized in that the step of generating the co-occurrence matrix is to generate a co-occurrence matrix with Jaccard coefficients as elements.
本發明之第12態樣係一種文字探勘程式,其係用以顯示包含文字資料之分析結果之畫面者;其特徵在於,由CPU利用記憶體而使電腦執行如下步驟:自文字資料擷取單詞之步驟;對上述單詞生成共生矩陣之步驟;基於上述共生矩陣生成共生網路之步驟;及顯示包含上述共生網路之畫面之步驟;於在包含基於所指定之文字資料之整體之第1共生網路之第1畫面內輸入指定關注詞之指示時,擷取上述單詞之步驟係自包括上述所指定之文字資料中含上述關注詞之部分之限定文字資料擷取上述單詞,生成上述共生矩陣之步驟係對上述單詞使用上述限定文字資料生成第2共生矩陣,生成上述共生網路之步驟係基於上述第2共生矩陣生成第2共生網路,顯示上述畫面之步驟係顯示包含上述 第2共生網路之第2畫面。 The twelfth aspect of the present invention is a text exploration program, which is used to display a screen containing the analysis result of text data; it is characterized in that the CPU uses the memory to make the computer perform the following steps: extract words from text data The step of; the step of generating a symbiosis matrix for the aforementioned words; the step of generating a symbiosis network based on the aforementioned symbiosis matrix; and the step of displaying a screen containing the aforementioned symbiosis network; in the first symbiosis including the entirety based on the specified text data When inputting instructions for specifying the word of interest in the first screen of the network, the step of extracting the word is to extract the word from the limited text data that includes the part of the word of interest in the specified text data to generate the co-occurrence matrix The step is to generate a second symbiosis matrix using the limited text data for the above words, the step of generating the symbiosis network is to generate a second symbiosis network based on the second symbiosis matrix, and the step of displaying the screen is to display that the second symbiosis is included The second screen of the Internet.
本發明之第13態樣係於本發明之第12態樣中,特徵在於,藉由於上述第1畫面內選擇上述第1共生網路中所包含之1個或複數個節點,並選擇分析開始,而輸入將與上述節點對應之單詞指定為上述關注詞之指示。 The thirteenth aspect of the present invention is in the twelfth aspect of the present invention, and is characterized in that one or more nodes included in the first symbiosis network are selected in the first screen, and the analysis starts , And the input indicates that the word corresponding to the above node is designated as the above-mentioned word of interest.
本發明之第14態樣係於本發明之第12態樣中,特徵在於,藉由於上述第1畫面內繼續選擇上述第1共生網路中所包含之1個節點,而輸入將與上述節點對應之單詞指定為上述關注詞之指示。 The fourteenth aspect of the present invention is in the twelfth aspect of the present invention, and is characterized in that by continuing to select a node included in the first symbiotic network in the first screen, the input will be the same as the node The corresponding word is designated as an indication of the above-mentioned word of interest.
本發明之第15態樣係於本發明之第12態樣中,特徵在於,藉由於上述第1畫面內繼續選擇上述第1共生網路中所包含之1條邊,而輸入將與連接於上述邊之2個節點對應之單詞指定為上述關注詞之指示。 The fifteenth aspect of the present invention is in the twelfth aspect of the present invention, and is characterized in that by continuing to select an edge included in the first symbiosis network in the first screen, the input will be connected to the The words corresponding to the two nodes of the edge are designated as the indication of the above-mentioned word of interest.
本發明之第16態樣係於本發明之第12態樣中,特徵在於,藉由於上述第1畫面內選擇上述第1共生網路中所包含之1條或複數條邊,並選擇分析開始,而輸入將與連接於上述邊之複數個節點對應之單詞指定為上述關注詞之指示。 The sixteenth aspect of the present invention is in the twelfth aspect of the present invention, and is characterized in that by selecting one or more edges included in the first symbiosis network in the first screen, and selecting the start of analysis, The input indicates that the words corresponding to the plural nodes connected to the aforementioned edges are designated as the aforementioned words of interest.
本發明之第17態樣係於本發明之第12態樣中,特徵在於,於在包含複數個第2共生網路之第2畫面內輸入合併指示時,顯示上述畫面之步驟係將上述複數個第2共生網路以標籤形式顯示。 The seventeenth aspect of the present invention is in the twelfth aspect of the present invention, and is characterized in that when a merge instruction is entered in a second screen including a plurality of second symbiotic networks, the step of displaying the screen is to change the plurality of The second symbiosis network is displayed as a label.
本發明之第18態樣係於本發明之第17態樣中,特徵在於,藉由於上述第2畫面內點住一個第2共生網路並於另一個第2共生網路內鬆開,而輸入上述合併指示。 The eighteenth aspect of the present invention is in the seventeenth aspect of the present invention, and is characterized in that by clicking on a second symbiosis network in the second screen and releasing it in another second symbiosis network, Enter the above merge instructions.
本發明之第19態樣係一種文字探勘裝置,其係顯示包 含文字資料之分析結果之畫面者;其特徵在於,其具備:單詞擷取部,其自文字資料擷取單詞;共生矩陣生成部,其對上述單詞生成共生矩陣;共生網路生成部,其基於上述共生矩陣生成共生網路;及畫面顯示部,其顯示包含上述共生網路之畫面;於在包含基於所指定之文字資料之整體之第1共生網路之第1畫面內輸入指定關注詞之指示時,上述單詞擷取部自包括上述指定之文字資料中含上述關注詞之部分之限定文字資料擷取上述單詞,上述共生矩陣生成部對上述單詞使用上述限定文字資料生成第2共生矩陣,上述共生網路生成部基於上述第2共生矩陣生成第2共生網路,上述畫面顯示部顯示包含上述第2共生網路之第2畫面。 The 19th aspect of the present invention is a text exploration device that displays a screen containing the analysis result of text data; it is characterized in that it includes: a word extraction unit that extracts words from text data; and a symbiosis matrix generation unit , Which generates a symbiosis matrix for the above words; a symbiosis network generation section, which generates a symbiosis network based on the symbiosis matrix; and a screen display section, which displays a screen containing the symbiosis network; When inputting instructions to specify the word of interest in the first screen of the overall first symbiosis network, the word extraction unit extracts the word from the limited text data that includes the portion of the word of interest in the specified text data, and the symbiosis The matrix generating unit generates a second co-occurrence matrix using the limited text data for the word, the co-occurrence network generating unit generates a second co-occurrence network based on the second co-occurrence matrix, and the screen display unit displays the second co-occurrence network including the second co-occurrence network 2 pictures.
本發明之第20態樣係於本發明之第19態樣中,特徵在於,於在包含複數個第2共生網路之第2畫面內輸入合併指示時,上述畫面顯示部將上述複數個第2共生網路以標籤形式顯示。 The twentieth aspect of the present invention is in the nineteenth aspect of the present invention, and is characterized in that when a merge instruction is input in a second screen including a plurality of second symbiotic networks, the screen display unit displays the plurality of second symbiosis networks. 2 The symbiosis network is displayed as a label.
根據上述第1、第12或第19態樣,於在包含基於所指定之文字資料之整體之第1共生網路之第1畫面內輸入指定關注詞之指示時,顯示包含基於所指定之文字資料中含關注詞之部分之第2共生網路之第2畫面。因此,可藉由簡單之操作顯示包含指定關注詞時之共生網路之畫面。 According to the first, twelfth, or nineteenth aspect above, when inputting the instruction of the designated word of interest in the first screen of the first symbiosis network containing the entirety of the designated text data, the display contains the text based on the designated The second screen of the second symbiosis network of the part of the data that contains the word of interest. Therefore, it is possible to display the screen of the symbiosis network including the specified word of interest with a simple operation.
根據上述第2或第13態樣,藉由於第1畫面內選擇1個或複數個節點以及分析開始,可藉由簡單之操作輸入指定1個或複數個關注詞之指示,並顯示包含指定1個或複數個關注詞時之共生網路之畫面。 According to the second or 13th aspect above, by selecting one or more nodes in the first screen and starting the analysis, you can input instructions to specify one or more words of interest by simple operations, and display including the specified 1 The screen of the symbiosis network when one or more words follow.
根據上述第3或第14態樣,藉由於第1畫面內繼續選擇1個節點,可藉由簡單之操作輸入指定1個關注詞之指示,並顯示包含指定1個關注詞時之共生網路之畫面。 According to the third or fourteenth aspect above, by continuing to select a node in the first screen, you can input instructions to specify a word of interest with a simple operation, and display the symbiosis network including the specified word of interest The picture.
根據上述第4或第15態樣,藉由於第1畫面內繼續選擇1條邊,可藉由簡單之操作輸入指定2個關注詞之指示,並顯示包含指定2個關注詞時之共生網路之畫面。 According to the 4th or 15th aspect above, by continuing to select an edge in the first screen, you can input instructions for specifying 2 words of interest with a simple operation, and display the symbiosis network that contains the specified 2 words of interest. Picture.
根據上述第5或第16態樣,藉由於第1畫面內選擇1條或複數條邊與分析開始,可藉由簡單之操作輸入指定複數個關注詞之指示,並顯示包含指定複數個關注詞時之共生網路之畫面。 According to the 5th or 16th aspect above, by selecting one or more edges in the first screen and starting the analysis, you can input instructions for specifying multiple words of interest with a simple operation, and display when the specified multiple words of interest are included The picture of the symbiosis network.
根據上述第6、第17或第20態樣,藉由於輸入合併指示時將複數個第2共生網路以標籤形式顯示,可精簡地顯示複數個第2共生網路。 According to the sixth, 17th, or 20th aspect, by displaying a plurality of second symbiosis networks in label form when inputting a merge instruction, a plurality of second symbiosis networks can be displayed concisely.
根據上述第7或第18態樣,藉由於第2畫面內將第2共生網路點住或鬆開,可藉由簡單之操作輸入合併指示,並精簡地顯示複數個第2共生網路。 According to the seventh or eighteenth aspect, by clicking or releasing the second symbiosis network in the second screen, a simple operation can be used to input the merge instruction, and a plurality of second symbiosis networks can be displayed concisely.
根據上述第8態樣,於輸入指定關注詞之指示時,可將所指定之文字資料以句子單位劃分而求出限定文字資料,並顯示包含基於所求出之限定文字資料之第2共生網路之畫面。 According to the eighth aspect above, when inputting instructions for specifying the word of interest, the specified text data can be divided into sentence units to obtain the limited text data, and the second symbiosis network containing the obtained limited text data can be displayed. Picture of the road.
根據上述第9或第10態樣,可顯示包含對複數個關注詞進行AND處理或OR處理時之第2共生網路之畫面。 According to the ninth or tenth aspect above, the screen including the second symbiosis network when AND processing or OR processing plural words of interest can be displayed.
根據上述第11態樣,藉由生成以Jaccard係數作為要素之共生矩陣,可較佳地分析文字資料中所包含之單詞之共生性。 According to the eleventh aspect, by generating a co-occurrence matrix with Jaccard coefficients as elements, the co-occurrence of words contained in the text data can be better analyzed.
10‧‧‧文字探勘裝置 10‧‧‧Text exploration device
11‧‧‧指示輸入部 11‧‧‧Instruction input section
12‧‧‧文字資料儲存部 12‧‧‧Text data storage department
13‧‧‧單詞擷取部 13‧‧‧Word Extraction Section
14‧‧‧共生矩陣生成部 14‧‧‧Symbiosis Matrix Generation Department
15‧‧‧共生網路生成部 15‧‧‧Symbiosis Network Generation Department
16‧‧‧畫面顯示部 16‧‧‧Screen display
20‧‧‧電腦 20‧‧‧Computer
21‧‧‧CPU 21‧‧‧CPU
22‧‧‧主記憶體 22‧‧‧Main memory
23‧‧‧儲存部 23‧‧‧Storage Department
24‧‧‧輸入部 24‧‧‧Input part
25‧‧‧顯示部 25‧‧‧Display
26‧‧‧通信部 26‧‧‧Ministry of Communications
27‧‧‧記錄媒體讀取部 27‧‧‧Recording media reading section
28‧‧‧鍵盤 28‧‧‧Keyboard
29‧‧‧滑鼠 29‧‧‧Mouse
30‧‧‧記錄媒體 30‧‧‧Recording media
31‧‧‧文字探勘程式 31‧‧‧Text exploration program
32‧‧‧文字資料 32‧‧‧Text data
41~45‧‧‧視窗 41~45‧‧‧Window
51‧‧‧整體共生網路 51‧‧‧Integral Symbiosis Network
52~54‧‧‧限定共生網路 52~54‧‧‧Limited symbiosis network
61‧‧‧分析按鈕 61‧‧‧Analyze button
62‧‧‧滑鼠游標 62‧‧‧Mouse cursor
63~64‧‧‧標籤 63~64‧‧‧label
71~75‧‧‧畫面 71~75‧‧‧Screen
圖1係表示本發明之實施形態之文字探勘裝置之構成的方塊圖。 Fig. 1 is a block diagram showing the structure of a character exploration device according to an embodiment of the present invention.
圖2係表示作為圖1所示之文字探勘裝置而發揮功能之電腦之構成的方塊圖。 Fig. 2 is a block diagram showing the structure of a computer functioning as the text mining device shown in Fig. 1.
圖3係表示圖1所示之文字探勘裝置之動作之流程圖。 Fig. 3 is a flowchart showing the operation of the text exploration device shown in Fig. 1.
圖4係表示由圖1所示之文字探勘裝置生成之共生矩陣之例的圖。 Fig. 4 is a diagram showing an example of a co-occurrence matrix generated by the character exploration device shown in Fig. 1.
圖5係表示圖1所示之文字探勘裝置所顯示之包含整體共生網路之視窗之例的圖。 FIG. 5 is a diagram showing an example of a window including the overall symbiosis network displayed by the text mining device shown in FIG. 1.
圖6係表示於圖5所示之視窗內指定關注詞之第1操作之圖。 FIG. 6 is a diagram showing the first operation of specifying the word of interest in the window shown in FIG. 5.
圖7係表示於圖5所示之視窗內指定關注詞之第2操作之圖。 FIG. 7 is a diagram showing the second operation of specifying the word of interest in the window shown in FIG. 5.
圖8係表示於圖5所示之視窗內指定關注詞之第3操作之圖。 FIG. 8 is a diagram showing the third operation of specifying the word of interest in the window shown in FIG. 5.
圖9係表示於圖5所示之視窗內指定關注詞之第4操作之圖。 FIG. 9 is a diagram showing the fourth operation of specifying the word of interest in the window shown in FIG. 5.
圖10係表示於圖5所示之視窗內指定關注詞之第5操作之圖。 FIG. 10 is a diagram showing the fifth operation of specifying the word of interest in the window shown in FIG. 5.
圖11係表示於圖5所示之視窗內指定關注詞之第6操作之圖。 FIG. 11 is a diagram showing the sixth operation of specifying the word of interest in the window shown in FIG. 5.
圖12係表示圖1所示之文字探勘裝置所顯示之包含限定共生網路之視窗之例的圖。 FIG. 12 is a diagram showing an example of a window including a limited symbiotic network displayed by the text mining device shown in FIG. 1.
圖13係表示圖1所示之文字探勘裝置所顯示之包含限定共生網路之視窗之例的圖。 FIG. 13 is a diagram showing an example of a window including a limited symbiotic network displayed by the text mining device shown in FIG. 1.
圖14係表示圖1所示之文字探勘裝置之顯示畫面之例的圖。 Fig. 14 is a diagram showing an example of a display screen of the character exploration device shown in Fig. 1.
圖15係表示圖1所示之文字探勘裝置之顯示畫面之例的圖。 Fig. 15 is a diagram showing an example of a display screen of the character exploration device shown in Fig. 1.
圖16係表示圖1所示之文字探勘裝置之顯示畫面之例的圖。 Fig. 16 is a diagram showing an example of a display screen of the character exploration device shown in Fig. 1.
圖17係表示圖1所示之文字探勘裝置中之將視窗合併之操作的圖。 FIG. 17 is a diagram showing the operation of merging windows in the text exploration device shown in FIG. 1.
圖18係表示進行圖17所示之操作後之顯示畫面之圖。 FIG. 18 is a diagram showing the display screen after the operation shown in FIG. 17 is performed.
圖19係表示共生網路之例之圖。 Fig. 19 is a diagram showing an example of a symbiotic network.
以下,參照圖式,對本發明之實施形態之文字探勘方法、文字探勘程式及文字探勘裝置進行說明。本實施形態之文字探勘方法典型而言為使用電腦執行。本實施形態之文字探勘程式係用以使用電腦實施文字探勘方法之程式。本實施形態之文字探勘裝置典型而言為使用電腦構成。執行文字探勘程式之電腦係作為文字探勘裝置發揮功能。 Hereinafter, with reference to the drawings, the text mining method, text mining program, and text mining device of the embodiment of the present invention will be described. The character exploration method of this embodiment is typically executed by a computer. The text exploration program of this embodiment is a program used to implement the text exploration method using a computer. The character exploration device of this embodiment is typically constructed using a computer. The computer running the text mining program functions as a text mining device.
圖1係表示本發明之實施形態之文字探勘裝置之構成的方塊圖。圖1所示之文字探勘裝置10具備指示輸入部11、文字資料儲存部12、單詞擷取部13、共生矩陣生成部14、共生網路生成部15、及畫面顯示部16。文字探勘裝置10係基於儲存於文字資料儲存部12之文字資料生成共生網路作為文字資料之分析結果,並顯示包含所生成之共生網路之畫面。 Fig. 1 is a block diagram showing the structure of a character exploration device according to an embodiment of the present invention. The
文字探勘裝置10之動作之概要如下所述。對指示輸入部11,輸入來自利用者(文字資料之分析者)之指示。文字資料儲存部12儲存自由記述之1個以上之文字資料。單詞擷取部13自文字資料儲存部12讀出所指定之文字資料,對所讀出之文字資料進行詞素解析,由此自文字資料擷取單詞。共生矩陣生成部14係對利用單詞擷取部13擷取之單詞生成共生矩陣。共生網路生成部15基於利用共生矩陣生成部14生成之共生矩陣生成共生網路。畫面顯示部16顯示包含利用共生網路生成部15生成之共生網路之畫面。 The outline of the operation of the
利用者使用指示輸入部11,輸入指定分析對象之文字 資料之指示、指定關注詞之指示等。單詞擷取部13、共生網路生成部15、及畫面顯示部16根據來自利用者之指示,進行用以顯示包含共生網路之畫面之動作。於輸入有指定文字資料之指示時,生成基於所指定之文字資料之整體之整體共生網路,並顯示包含整體共生網路之畫面。於在包含整體共生網路之畫面內輸入有指定關注詞之指示時,生成基於所指定之文字資料中含關注詞之句子之限定共生網路,並顯示包含限定共生網路之畫面。 The user uses the
圖2係表示作為文字探勘裝置10而發揮功能之電腦之構成之方塊圖。圖2所示之電腦20具備CPU(Central Processing Unit,中央處理單元)21、主記憶體22、儲存部23、輸入部24、顯示部25、通信部26、及記錄媒體讀取部27。主記憶體22例如使用DRAM(Dynamic Random Access Memory,動態隨機存取記憶體)。儲存部23例如使用硬碟或固態驅動器。輸入部24例如包含鍵盤28或滑鼠29。顯示部25例如使用液晶顯示器。通信部26係有線通信或無線通信之介面線路。記錄媒體讀取部27係儲存有程式等之記錄媒體30之介面線路。記錄媒體30例如使用CD-ROM(Compact Disc-Read Only Memory,唯讀光碟)、DVD-ROM(Digital Versatile Disc-Read Only Memory,唯讀數位多功能光碟)、USB(Universal Serial Bus,通用序列匯流排)記憶體等之非短暫性之記錄媒體。 FIG. 2 is a block diagram showing the structure of a computer functioning as a
於電腦20執行文字探勘程式31之情況下,儲存部23儲存文字探勘程式31與文字資料32。文字探勘程式31與文字資料32例如可為自伺服器或其他電腦使用通信部26接收者,亦可為自記錄媒體30使用記錄媒體讀取部27讀出者。 When the
於執行文字探勘程式31時,文字探勘程式31與文字資 料32被複製傳送至主記憶體22。CPU 21將主記憶體22作為作業用記憶體利用,執行儲存於主記憶體22之文字探勘程式31,藉此進行自文字資料32擷取單詞之處理、對所擷取之單詞生成共生矩陣之處理、基於所生成之共生矩陣生成共生網路之處理、顯示包含所生成之共生網路之畫面之處理等。此時,電腦20作為文字探勘裝置10發揮功能。再者,以上所述之電腦20之構成只不過為一例,可使用任意之電腦構成文字探勘裝置10。 When the
圖3係表示文字探勘裝置10之動作之流程圖。於進行圖3所示之動作之前,文字資料儲存部12儲存有自由記述之1個以上之文字資料。各文字資料包含複數個句子。文字探勘裝置10對儲存於文字資料儲存部12之文字資料中由利用者指定之文字資料進行處理。 FIG. 3 is a flowchart showing the operation of the
於圖3中,指示輸入部11首先自利用者接收指定文字資料之指示(步驟S101)。此時,指示輸入部11除了接收指定文字資料之指示以外,亦可接收設定共生矩陣之基準值(詳細情況將於下文敍述)之指示、切換AND處理與OR處理(詳細情況將於下文敍述)之指示、設定共生網路之顯示態樣之詳情之指示等。所接收到之指示被輸出至文字探勘裝置10之各部。 In FIG. 3, the
其次,單詞擷取部13自文字資料儲存部12讀出所指定之文字資料(步驟S102)。其次,單詞擷取部13藉由對步驟S102中讀出之文字資料進行詞素解析,而自所讀出之文字資料擷取單詞(步驟S103)。此時,單詞擷取部13自所讀出之文字資料僅擷取之後的分析中需要之單詞。其次,共生矩陣生成部14對步驟S103中擷取之單詞使用步驟S102中讀出之文字資料生成共生矩陣(步驟S104)。 Next, the
圖4係表示利用共生矩陣生成部14生成之共生矩陣之例之圖。共生矩陣之要素係對單詞之對求出之Jaccard係數。關於分析對象之文字資料,將包含單詞Wa之句子之集合設為A,將包含單詞Wb之句子之集合設為B。關於單詞之對(Wa,Wb)之Jaccard係數K(Wa,Wb),利用下式(1)提供。 FIG. 4 is a diagram showing an example of the co-occurrence matrix generated by the co-occurrence matrix generating unit 14. The element of the co-occurrence matrix is the Jaccard coefficient obtained from the pair of words. Regarding the text data of the analysis object, set the set of sentences containing the word Wa as A, and set the set of sentences containing the word Wb as B. The Jaccard coefficient K(Wa, Wb) of the word pair (Wa, Wb) is provided by the following formula (1).
K(Wa,Wb)=|A∩B|/|A∪B|…(1) K(Wa,Wb)=|A∩B|/|A∪B|…(1)
其中,於式(1)中,記號∩表示求出交集之運算,記號∪表示求出聯集之運算,|S|表示集合S中所包含之要素之個數。 Among them, in formula (1), the symbol ∩ indicates the operation to find the intersection, the symbol ∪ indicates the operation to find the union, and |S| indicates the number of elements contained in the set S.
共生矩陣生成部14於步驟S104中,針對自步驟S102中讀出之文字資料之整體擷取之單詞之對之全部求出Jaccard係數,生成以求出之Jaccard係數為要素之共生矩陣。共生矩陣之列及行係與自步驟S102中讀出之文字資料之整體擷取之單詞之種類對應。於自所讀出之文字資料之整體擷取n種單詞時,步驟S104中生成之共生矩陣係對角要素全部為1之n列n行之對稱矩陣。 In step S104, the co-occurrence matrix generation unit 14 obtains Jaccard coefficients for all pairs of words extracted from the entire text data read in step S102, and generates a co-occurrence matrix with the obtained Jaccard coefficients as elements. The columns and rows of the co-occurrence matrix correspond to the types of words extracted from the entire text data read out in step S102. When extracting n types of words from the whole of the read text data, the co-occurrence matrix generated in step S104 is a symmetric matrix with n columns and n rows with all diagonal elements of 1.
再者,共生矩陣生成部14亦可將文字資料以句子以外之單位劃分而求出Jaccard係數。例如,共生矩陣生成部14亦可將包含單詞Wa之段落之集合設為A,將包含單詞Wb之段落之集合設為B,根據式(1)求出Jaccard係數。又,於文字資料中所包含之句子具有日期之情況下,共生矩陣生成部14亦可將文字資料分為包括具有相同日期之句子之複數個部分,將包含單詞Wa之部分之集合設為A,將包含單詞Wb之部分之集合設為B,根據式(1)求出Jaccard係數。又,共生矩陣生成部14亦可生成包含表示單詞之共生性之其他值(例如,Simpson係數或餘弦距離等)作為要素之共生矩陣。 Furthermore, the co-occurrence matrix generation unit 14 may also divide the text data into units other than sentences to obtain the Jaccard coefficient. For example, the co-occurrence matrix generating unit 14 may also set the set of paragraphs containing the word Wa as A, and set the set of paragraphs including the word Wb as B, and obtain the Jaccard coefficient according to formula (1). Moreover, in the case that the sentence contained in the text data has a date, the co-occurrence matrix generating unit 14 may also divide the text data into plural parts including sentences with the same date, and set the set of parts containing the word Wa as A , Set the set of parts containing the word Wb as B, and calculate the Jaccard coefficient according to formula (1). In addition, the co-occurrence matrix generating unit 14 may also generate a co-occurrence matrix including other values (for example, Simpson coefficient or cosine distance, etc.) representing the co-occurrence of words as elements.
其次,共生網路生成部15基於步驟S104中生成之共生 矩陣,生成整體共生網路(步驟S105)。其次,畫面顯示部16顯示包含步驟S105中生成之整體共生網路之畫面(步驟S106)。圖5係表示步驟S106中顯示之包含整體共生網路之視窗之例的圖。圖5所示之視窗41包含整體共生網路51與分析按鈕61。分析按鈕61係為了指示分析開始而設置。 Next, the symbiosis
共生網路生成部15具有共生矩陣之基準值(以下,設為V)。基準值V可為預先決定之值,亦可為使用指示輸入部11而由利用者設定之值。於步驟S104中生成之共生矩陣中,與單詞Wa對應之列中所包含之Jaccard係數K(Wa,*)之最大值為基準值V以上之情況下,共生網路生成部15使與單詞Wa對應之節點(記載為單詞Wa之節點)包含於整體共生網路中。又,於步驟S104中生成之共生矩陣中,單詞之對(Wa,Wb)之Jaccard係數K(Wa,Wb)為基準值V以上之情況下,共生網路生成部15使將與單詞Wa對應之節點及與單詞Wb對應之節點連接之邊包含於整體共生網路中。 The symbiosis
於圖5所示之整體共生網路51中,與出現頻度較大之單詞對應之節點顯示得較大。亦可為,於顯示包含共生網路之畫面時,於Jaccard係數K(Wa,Wb)較大時,較粗地顯示將與單詞Wa對應之節點及與單詞Wb對應之節點連接之邊。又,可根據Jaccard係數,切換邊之顏色,亦可切換邊之粗度與顏色之兩者。共生網路被分為能夠經由邊到達之複數個部分。於顯示包含共生網路之畫面時,亦可將各部分中所包含之複數個節點以分配至各部分之顏色顯示。再者,共生網路中所包含之節點與邊之位置並無意義。 In the
其次,指示輸入部11自利用者接收指定關注詞之指示(步驟S111)。於執行步驟S111時,顯示包含整體共生網路之畫面。 利用者藉由對滑鼠29進行操作,選擇整體共生網路之要素,而輸入指定關注詞之指示。再者,利用者於輸入指示時,亦可代替滑鼠29而使用鍵盤28,亦可進行直接觸摸顯示畫面等操作。以下,於執行步驟S111時,顯示包含圖5所示之視窗41之畫面。 Next, the
圖6~圖11係分別表示於視窗41內指定關注詞之第1~第6操作之圖。於圖6~圖11中,動作指示框表示操作之順序,白色箭頭表示滑鼠游標62之移動。動作指示框及箭頭於實際之畫面中未顯示。以下,將滑鼠游標62處於顯示畫面內之某要素之上時點選(雙擊)滑鼠29之按鈕稱為「點選(雙擊)要素」。 6 to 11 are diagrams respectively showing the first to sixth operations of specifying the word of interest in the
如圖6所示,利用者於視窗41內首先點選與指定為關注詞之單詞(此處為「露天浴池」)對應之節點(第一次之點選),其次點選分析按鈕61(第二次之點選)。藉由該操作,將與第一次點選之節點對應之單詞指定為關注詞。如此於包含整體共生網路之畫面內選擇整體共生網路中所包含之1個節點,並選擇分析開始,藉此輸入指定1個關注詞之指示。 As shown in Figure 6, the user first clicks the node (the first click) corresponding to the word designated as the word of interest (here, "open-air bath") in the
如圖7所示,利用者於視窗41內雙擊與指定為關注詞之單詞(此處為「露天浴池」)對應之節點。藉由該操作,將與雙擊之節點對應之單詞指定為關注詞。如此藉由於包含整體共生網路之畫面內繼續選擇整體共生網路中所包含之1個節點,而輸入指定1個關注詞之指示。 As shown in FIG. 7, the user double-clicks the node corresponding to the word designated as the word of interest (here, "open-air bath") in the
如圖8所示,利用者於視窗41內首先點選與指定為關注詞之單詞(此處為「露天浴池」)對應之節點(第一次之點選),其次點選與指定為關注詞之另一個單詞(此處為「價格」)對應之節點(第二次之點選),最後點選分析按鈕61(最後之點選)。藉由該操作, 將與第一次及第二次點選之節點對應之2個單詞指定為關注詞。利用者亦可於視窗41內依次點選p個(p為3以上之整數)節點,最後點選分析按鈕61。藉由該操作,將與p個節點對應之p個單詞指定為關注詞。如此藉由於包含整體共生網路之畫面內選擇整體共生網路中所包含之複數個節點,並選擇分析開始,而輸入指定複數個關注詞之指示。 As shown in Figure 8, the user first clicks on the node (the first click) corresponding to the word designated as the word of interest (here, "open-air bath") in the
如圖9所示,利用者於視窗41內雙擊將與指定為關注詞之2個單詞(此處為「露天浴池」與「階梯」)對應之2個節點連接之邊。藉此,將與連接於雙擊之邊之2個節點對應之2個單詞指定為關注詞。如此藉由於包含整體共生網路之畫面內繼續選擇整體共生網路中所包含之1個邊,而輸入指定2個關注詞之指示。 As shown in FIG. 9, the user double-clicks in the
如圖10所示,利用者於視窗41內首先點選將與指定為關注詞之2個單詞(此處為「露天浴池」與「階梯」)對應之2個節點連接之邊(第一次之點選),其次點選分析按鈕61(第二次之點選)。藉此,將與連接於第一次點選之邊之2個節點對應之2個單詞指定為關注詞。如此藉由於包含整體共生網路之畫面內選擇整體共生網路中所包含之1個邊,並選擇分析開始,而輸入指定2個關注詞之指示。 As shown in Figure 10, the user first clicks in the
如圖11所示,利用者於視窗41內首先點選將與指定為關注詞之2個單詞(此處為「露天浴池」與「階梯」)對應之2個節點連接之邊(第一次之點選),其次點選將與指定為關注詞之另2個單詞(此處為「價格」與「考慮」)對應之2個節點連接之邊(第二次之點選),最後點選分析按鈕61(最後之點選)。藉由該操作,將與連接於第一次與第二次點選之2個邊之4個節點對應之4個單詞指定為關注詞。利用者亦可於視窗41內依次點選q條(q為3以上之整數)邊,最後 點選分析按鈕61。藉由該操作,將與連接於q條邊之2q個節點對應之2q個單詞指定為關注詞。如此藉由於包含整體共生網路之畫面內選擇整體共生網路中所包含之複數條邊,並選擇分析開始,而輸入指定複數個關注詞之指示。 As shown in Figure 11, the user first clicks in the
指示輸入部11於步驟S111中,除了接收指定關注詞之指示以外,亦可接收設定共生矩陣之基準值之指示、切換AND處理與OR處理之指示、設定共生網路之顯示態樣之詳細情況之指示等。所接收之指示被輸出至文字探勘裝置10之各部。 In step S111, the
其次,單詞擷取部13藉由自步驟S102中讀出之文字資料擷取包含步驟S111中指定之關注詞之句子,而求出包括含關注詞之句子之限定文字資料(步驟S112)。 Next, the
單詞擷取部13具有於指定複數個關注詞之情況下表示進行AND處理與OR處理中哪一者之旗標。旗標之值可為預先決定之值,亦可為使用指示輸入部11由利用者設定之值。於旗標表示AND處理之情況下,單詞擷取部13藉由自所讀出之文字資料擷取包含所指定之複數個關注詞之全部之句子,而求出限定文字資料。於旗標表示OR處理之情況下,單詞擷取部13藉由自所讀出之文字資料擷取包含所指定之任一個關注詞之句子,而求出限定文字資料。 The
其次,單詞擷取部13藉由對步驟S112中求出之限定文字資料進行詞素解析,而自限定文字資料擷取單詞(步驟S113)。其次,共生矩陣生成部14對步驟S113中擷取之單詞,使用步驟S112中求出之限定文字資料生成共生矩陣(步驟S114)。其次,共生網路生成部15基於步驟S114中生成之共生矩陣,生成限定共生網路(步驟S115)。再者,於步驟S103~S105與步驟S113~S115之間,處理對象 不同,但是處理內容相同。 Next, the
一般而言,自步驟S112中求出之限定文字資料擷取之單詞之種類較自步驟S102中讀出之文字資料擷取之單詞之種類少。步驟S114中生成之共生矩陣與步驟S104中生成之共生矩陣不同。步驟S115中生成之限定共生網路與步驟S105中生成之整體共生網路不同。 Generally speaking, the types of words extracted from the limited text data obtained in step S112 are less than the types of words extracted from the text data read out in step S102. The co-occurrence matrix generated in step S114 is different from the co-occurrence matrix generated in step S104. The limited symbiosis network generated in step S115 is different from the overall symbiosis network generated in step S105.
其次,畫面顯示部16顯示包含步驟S115中生成之限定共生網路之畫面(步驟S116)。圖12及圖13係表示步驟S116中顯示之包含限定共生網路之視窗之例的圖。圖12所示之視窗42包含指定1個關注詞(此處為「露天浴池」)時之限定共生網路52。圖13所示之視窗43包含指定2個關注詞(此處為「露天浴池」與「浴場」)時之限定共生網路53。 Next, the
圖14及圖15係表示文字探勘裝置10之顯示畫面之例之圖。畫面顯示部16可將包含整體共生網路之視窗與包含限定共生網路之視窗不重疊地並排顯示,亦可將兩者重疊顯示。於圖14所示之畫面71中,包含整體共生網路51之視窗41與包含限定共生網路52之視窗42不重疊地並排顯示。利用者於畫面71中,可同時觀察整體共生網路51與限定共生網路52。於圖15所示之畫面72中,包含限定共生網路52之視窗42與包含整體共生網路51之視窗41重疊顯示。利用者於畫面72中,可切換觀察整體共生網路51與限定共生網路52。 14 and 15 are diagrams showing examples of the display screen of the
其次,指示輸入部11自利用者接收指示(步驟S121)。其次,文字探勘裝置10判斷步驟S121中接收之指示是否為指定關注詞之指示(步驟S122)。於步驟S122中為Yes(是)之情況下,文字探勘裝置10之控制朝步驟S112前進。於該情況下,對步驟S121中指定之 關注詞執行步驟S112~S116,顯示包含基於包括含步驟S121中指定之關注詞之句子之限定文字資料之限定共生網路之畫面。 Next, the
圖16係表示文字探勘裝置10之顯示畫面之例之圖。於圖16所示之畫面73中,包含整體共生網路51之視窗41與包含限定共生網路52之視窗42重疊,顯示有包含指定「浴場」作為關注詞時之限定共生網路54之視窗44。畫面73係於在步驟S111中將「露天浴池」指定為關注詞、且於步驟S121中將「浴場」指定為關注詞時顯示。利用者於畫面73中,可切換觀察整體共生網路51與限定共生網路52、54。 FIG. 16 is a diagram showing an example of the display screen of the
於步驟S122中為No(否)之情況下,文字探勘裝置10之控制朝步驟S123前進。於該情況下,步驟S121中接收之指示例如為使視窗移動之指示、將視窗設為非顯示之指示、關閉視窗之指示、合併視窗之指示等。利用者藉由於顯示包含整體共生網路與限定共生網路之畫面時對指示輸入部11進行操作,而輸入該等指示。畫面顯示部16根據步驟S121中接收之指示,顯示更新後之畫面(步驟S123)。然後,文字探勘裝置10之控制朝步驟S121前進。 In the case of No in step S122, the control of the
圖17係表示合併視窗之操作之圖。於圖17所示之畫面74,顯示包含將「露天浴池」指定為關注詞時之限定共生網路52之視窗42與包含將「浴場」指定為關注詞時之限定共生網路54之視窗44。利用者於畫面74中,可同時觀察2個限定共生網路52、54。 Figure 17 is a diagram showing the operation of merging windows. The
圖17所示之附有陰影線之箭頭表示於按壓滑鼠29之按鈕之狀態下滑鼠游標62移動之情形。該箭頭於實際之畫面中未顯示。利用者進行於畫面74內點住限定共生網路52並於限定共生網路54內鬆開之操作(拖放操作)。更詳細而言,利用者於滑鼠游標62處 於視窗42內時按壓滑鼠29之按鈕,並保持按壓滑鼠29之按鈕之狀態使滑鼠游標62移動至視窗44內,於滑鼠游標62處於視窗44內時鬆開滑鼠29之按鈕。藉由該操作,而輸入合併視窗之指示。 The hatched arrow shown in FIG. 17 represents the movement of the
圖18係表示進行圖17所示之操作之後之顯示畫面的圖。於圖18所示之畫面75,顯示有將複數個限定共生網路以標籤形式顯示之視窗45。於圖18中,選擇記載為「露天浴池」之標籤64,於視窗45顯示將「露天浴池」指定為關注詞時之限定共生網路52。於選擇記載為「浴場」之標籤63時,於視窗45顯示圖17所示之限定共生網路54。 Fig. 18 is a diagram showing a display screen after the operation shown in Fig. 17 is performed. In the screen 75 shown in FIG. 18, a
於利用者點選視窗45內之關閉按鈕(×標記)時,視窗45關閉。於利用者點選標籤63內之關閉按鈕時,標籤63成為不顯示。於利用者點選標籤64內之關閉按鈕時,標籤64成為不顯示,於視窗45顯示限定共生網路54。 When the user clicks the close button (x mark) in the
如以上所示,本實施形態之文字探勘方法具備:自文字資料擷取單詞之步驟(步驟S102、S103、S112、S113)、對所擷取之單詞生成共生矩陣之步驟(步驟S104、S114)、基於所生成之共生矩陣生成共生網路之步驟(步驟S105、S115)、及顯示包含共生網路之畫面之步驟(步驟S106、S116)。於在包含基於所指定之文字資料之整體之第1共生網路(整體共生網路51)之第1畫面(包含視窗41之畫面)內輸入指定關注詞之指示時,擷取單詞之步驟(步驟S112、S113)係自包括所指定之文字資料中含關注詞之部分(含關注詞之句子)之限定文字資料擷取單詞,生成共生矩陣之步驟(步驟S114)係對所擷取之單詞使用限定文字資料生成第2共生矩陣,生成共生網路之步驟(步驟S115)係基於第2共生矩陣生成第2共生網路(限定共生 網路52~54),顯示畫面之步驟(步驟S116)係顯示包含第2共生網路之第2畫面(包含視窗42~45之畫面)。如此,於本實施形態之文字探勘方法中,於在包含基於所指定之文字資料之整體之第1共生網路之第1畫面內輸入指定關注詞之指示時,顯示包含基於所指定之文字資料中含關注詞之部分之第2共生網路之第2畫面。因此,可藉由簡單之操作顯示包含指定關注詞時之共生網路之畫面。 As shown above, the text mining method of this embodiment includes the steps of extracting words from text data (steps S102, S103, S112, S113), and the steps of generating a co-occurrence matrix for the extracted words (steps S104, S114) , Steps of generating a symbiosis network based on the generated symbiosis matrix (steps S105, S115), and displaying a screen containing the symbiosis network (steps S106, S116). When inputting the instruction of the designated word of interest in the first screen (the screen containing the window 41) of the first symbiosis network (the overall symbiosis network 51) based on the specified text data, the steps to capture the word ( Steps S112, S113) are to extract words from the limited text data that includes the part of the word of interest (sentences containing the word of interest) in the specified text data, and the step of generating a co-occurrence matrix (step S114) is to extract the words Use the limited text data to generate the second symbiosis matrix, the step of generating the symbiosis network (step S115) is the step of generating the second symbiosis network (
又,藉由於第1畫面內選擇第1共生網路中所包含之1個或複數個節點,並選擇分析開始,而輸入將與節點對應之單詞指定為關注詞之指示(圖6、圖8)。如此藉由於第1畫面內選擇1個或複數個節點以及分析開始,可藉由簡單之操作輸入指定1個或複數個關注詞之指示,並顯示包含指定1個或複數個關注詞時之共生網路之畫面。又,藉由於第1畫面內繼續選擇第1共生網路中所包含之1個節點,而輸入將與節點對應之單詞指定為關注詞之指示(圖7)。如此藉由於第1畫面內繼續選擇1個節點,可藉由簡單之操作輸入指定1個關注詞之指示,並顯示包含指定1個關注詞時之共生網路之畫面。 In addition, by selecting one or more nodes included in the first symbiosis network in the first screen, and selecting the start of analysis, input the instruction to designate the word corresponding to the node as the word of interest (Figure 6, Figure 8) ). In this way, by selecting one or more nodes in the first screen and starting the analysis, you can input instructions for specifying one or more words of interest by simple operations, and display the symbiosis when specifying one or more words of interest. The screen of the network. In addition, by continuing to select a node included in the first symbiosis network in the first screen, input the instruction to designate the word corresponding to the node as the word of interest (Figure 7). In this way, by continuing to select a node in the first screen, an instruction to specify a word of interest can be input by a simple operation, and a screen containing the symbiosis network when a word of interest is specified is displayed.
又,藉由於第1畫面內繼續選擇第1共生網路中所包含之1條邊,而輸入將與連接於邊之2個節點對應之單詞指定為關注詞之指示(圖9)。如此藉由於第1畫面內繼續選擇1條邊,可藉由簡單之操作輸入指定2個關注詞之指示,並顯示包含指定2個關注詞時之共生網路之畫面。又,藉由於第1畫面內選擇第1共生網路中所包含之1條或複數條邊,並選擇分析開始,而輸入將與連接於邊之複數個節點對應之單詞指定為關注詞之指示(圖10、圖11)。如此藉由於第1畫面內選擇1條或複數條邊以及分析開始,可藉由簡單之操作輸入 指定複數個關注詞之指示,並顯示包含指定複數個關注詞時之共生網路之畫面。 In addition, by continuing to select an edge included in the first symbiosis network in the first screen, inputting the words corresponding to the two nodes connected to the edge is an instruction to designate the word of interest (Figure 9). In this way, by continuing to select an edge in the first screen, an instruction to specify two words of interest can be input by a simple operation, and a screen containing the symbiosis network when two words of interest are specified is displayed. In addition, by selecting one or more edges included in the first symbiosis network in the first screen, and selecting the start of analysis, input the instruction to designate the words corresponding to the plural nodes connected to the edges as the words of interest ( Figure 10, Figure 11). In this way, by selecting one or more edges in the first screen and starting the analysis, you can input instructions for specifying multiple words of interest by simple operations, and display the screen containing the symbiosis network when specifying multiple words of interest.
又,於在包含複數個第2共生網路(限定共生網路52、54)之第2畫面(畫面74)內輸入合併指示時(圖17),顯示畫面之步驟將複數個第2共生網路以標籤形式顯示(圖18)。藉此,可精簡地顯示複數個第2共生網路。又,藉由於第2畫面內點住一個第2共生網路(限定共生網路52)並於另一個第2共生網路(限定共生網路54)內鬆開,而輸入合併指示。因此,可藉由簡單之操作輸入合併指示,並精簡地顯示複數個第2共生網路。 In addition, when the merge instruction is entered in the second screen (screen 74) that contains a plurality of second symbiosis networks (
限定文字資料亦可由所指定之文字資料中含關注詞之句子所構成。於該情況下,於輸入指定關注詞之指示時,可將所指定之文字資料以句子單位劃分而求出限定文字資料,顯示包含基於所求出之限定文字資料之第2共生網路之畫面。指定複數個關注詞時之限定文字資料亦可由所指定之文字資料中含複數個關注詞之全部之句子所構成。於該情況下,可顯示包含對複數個關注詞進行AND處理時之第2共生網路之畫面。指定複數個關注詞時之限定文字資料亦可由所指定之文字資料中含複數個關注詞之任一者之句子所構成。於該情況下,可顯示包含對複數個關注詞進行OR處理時之第2共生網路之畫面。又,生成共生矩陣之步驟係生成以Jaccard係數作為要素之共生矩陣。因此,可較佳地分析文字資料中所包含之單詞之共生性。 The restricted text data can also be composed of sentences containing the word of interest in the specified text data. In this case, when inputting instructions for specifying the word of interest, the specified text data can be divided into sentence units to obtain the limited text data, and the screen containing the second symbiosis network based on the obtained limited text data is displayed . The limited text data when specifying a plurality of words of interest can also be composed of all sentences containing the plurality of words of interest in the specified text data. In this case, it is possible to display the screen containing the second symbiosis network when AND processing plural words of interest. The limited text data when specifying a plurality of words of interest can also be composed of sentences containing any of the plurality of words of interest in the specified text data. In this case, it is possible to display the screen containing the second symbiosis network when OR processing a plurality of words of interest. In addition, the step of generating a co-occurrence matrix is to generate a co-occurrence matrix with Jaccard coefficients as elements. Therefore, the symbiosis of words contained in the text data can be better analyzed.
本實施形態之文字探勘裝置10及文字探勘程式31具有與上述文字探勘方法相同之特徵,發揮相同之效果。根據本實施形態之文字探勘方法、文字探勘裝置10及文字探勘程式31,可藉由 簡單之操作顯示包含指定關注詞時之共生網路之畫面。 The
以上對本發明詳細地進行了說明,但以上之說明於所有方面均為例示性者而並非限制性者。應當瞭解的是,可於不脫離本發明之範圍內提出多種其他變更或變形。 The present invention has been described in detail above, but the above description is illustrative in all aspects and not restrictive. It should be understood that many other changes or modifications can be made without departing from the scope of the present invention.
Claims (20)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2018052074A JP6987003B2 (en) | 2018-03-20 | 2018-03-20 | Text mining methods, text mining programs, and text mining equipment |
JP2018-052074 | 2018-03-20 |
Publications (2)
Publication Number | Publication Date |
---|---|
TW201945958A TW201945958A (en) | 2019-12-01 |
TWI703457B true TWI703457B (en) | 2020-09-01 |
Family
ID=68065531
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW108106540A TWI703457B (en) | 2018-03-20 | 2019-02-26 | Text exploration method, text exploration program and text exploration device |
Country Status (4)
Country | Link |
---|---|
JP (1) | JP6987003B2 (en) |
KR (1) | KR102162779B1 (en) |
CN (1) | CN110309290B (en) |
TW (1) | TWI703457B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWM523901U (en) * | 2016-01-04 | 2016-06-11 | 信義房屋仲介股份有限公司 | Search engine device for performing semantic keyword analysis |
CN107193803A (en) * | 2017-05-26 | 2017-09-22 | 北京东方科诺科技发展有限公司 | A kind of particular task text key word extracting method based on semanteme |
US20170337262A1 (en) * | 2016-05-19 | 2017-11-23 | Quid, Inc. | Pivoting from a graph of semantic similarity of documents to a derivative graph of relationships between entities mentioned in the documents |
Family Cites Families (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2806867B2 (en) * | 1995-03-13 | 1998-09-30 | 株式会社トレンディ | Document database construction method, display method, and display device |
JPH10283367A (en) * | 1997-04-09 | 1998-10-23 | Mitsubishi Electric Corp | Hypermedia device |
JP4404323B2 (en) * | 1999-02-05 | 2010-01-27 | 経済産業大臣 | Thesaurus browsing system and method |
JP5059282B2 (en) * | 2003-10-14 | 2012-10-24 | ソニー株式会社 | Information providing system, information providing server, user terminal device, content display device, computer program, and content display method |
JP2006215936A (en) * | 2005-02-07 | 2006-08-17 | Hitachi Ltd | Search system and search method |
JP2007193380A (en) * | 2006-01-16 | 2007-08-02 | So-Net Entertainment Corp | Information processor, information processing method and computer program |
JP5534167B2 (en) * | 2009-12-16 | 2014-06-25 | 日本電気株式会社 | Graph creation device, graph creation method, and graph creation program |
JP5331723B2 (en) * | 2010-02-05 | 2013-10-30 | 株式会社エヌ・ティ・ティ・データ | Feature word extraction device, feature word extraction method, and feature word extraction program |
US20120066628A1 (en) * | 2010-09-09 | 2012-03-15 | Microsoft Corporation | Drag-able tabs |
JP2014085992A (en) * | 2012-10-26 | 2014-05-12 | Hitachi Ltd | Document recognition support device, document recognition support method and document recognition support program |
JP5903376B2 (en) * | 2012-12-11 | 2016-04-13 | 日本電信電話株式会社 | Information recommendation device, information recommendation method, and information recommendation program |
US9177104B2 (en) * | 2013-03-29 | 2015-11-03 | Case Western Reserve University | Discriminatively weighted multi-scale local binary patterns |
KR101512084B1 (en) * | 2013-11-15 | 2015-04-17 | 한국과학기술원 | Web search system for providing 3 dimensional web search interface based virtual reality and method thereof |
JP6287192B2 (en) * | 2013-12-26 | 2018-03-07 | キヤノンマーケティングジャパン株式会社 | Information processing apparatus, information processing method, and program |
JP6364086B2 (en) * | 2014-08-22 | 2018-07-25 | 株式会社日立製作所 | Self-produced information processing system and method |
JP6280859B2 (en) * | 2014-11-20 | 2018-02-14 | 日本電信電話株式会社 | Behavior network information extraction apparatus, behavior network information extraction method, and behavior network information extraction program |
CN104375989A (en) * | 2014-12-01 | 2015-02-25 | 国家电网公司 | Natural language text keyword association network construction system |
JP6524790B2 (en) * | 2015-05-14 | 2019-06-05 | 富士ゼロックス株式会社 | INFORMATION PROCESSING APPARATUS AND INFORMATION PROCESSING PROGRAM |
WO2017061253A1 (en) * | 2015-10-09 | 2017-04-13 | アイビーリサーチ株式会社 | Display control device, display control method, and display control program |
CN107766318B (en) * | 2016-08-17 | 2021-03-16 | 北京金山安全软件有限公司 | Keyword extraction method and device and electronic equipment |
CN107451120B (en) * | 2017-08-01 | 2020-10-30 | 中国人民解放军火箭军工程大学 | Content conflict detection method and system for open text information |
-
2018
- 2018-03-20 JP JP2018052074A patent/JP6987003B2/en active Active
-
2019
- 2019-01-31 KR KR1020190013093A patent/KR102162779B1/en active IP Right Grant
- 2019-01-31 CN CN201910096738.5A patent/CN110309290B/en active Active
- 2019-02-26 TW TW108106540A patent/TWI703457B/en active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWM523901U (en) * | 2016-01-04 | 2016-06-11 | 信義房屋仲介股份有限公司 | Search engine device for performing semantic keyword analysis |
US20170337262A1 (en) * | 2016-05-19 | 2017-11-23 | Quid, Inc. | Pivoting from a graph of semantic similarity of documents to a derivative graph of relationships between entities mentioned in the documents |
CN107193803A (en) * | 2017-05-26 | 2017-09-22 | 北京东方科诺科技发展有限公司 | A kind of particular task text key word extracting method based on semanteme |
Also Published As
Publication number | Publication date |
---|---|
KR102162779B1 (en) | 2020-10-07 |
CN110309290A (en) | 2019-10-08 |
CN110309290B (en) | 2023-06-06 |
JP2019164593A (en) | 2019-09-26 |
TW201945958A (en) | 2019-12-01 |
JP6987003B2 (en) | 2021-12-22 |
KR20190110428A (en) | 2019-09-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8504348B2 (en) | User simulation for viewing web analytics data | |
US9519698B1 (en) | Visualization of graphical representations of log files | |
US10318646B2 (en) | Generating a structured document guiding view | |
US10366154B2 (en) | Information processing device, information processing method, and computer program product | |
US20220245556A1 (en) | Data distillery for signal detection | |
US20060190684A1 (en) | Reverse value attribute extraction | |
CA2677220A1 (en) | Retrieval mechanism for web visit simulator | |
CN107077349A (en) | Job creation with data preview | |
TW201807597A (en) | Text mining method, text mining program, and text mining apparatus | |
TWI703457B (en) | Text exploration method, text exploration program and text exploration device | |
KR101850853B1 (en) | Method and apparatus of search using big data | |
CN112667517A (en) | Method, device, equipment and storage medium for acquiring automatic test script | |
JP2017016294A (en) | Information processing device, control method thereof, and program | |
CN114969315A (en) | Intelligent crowdsourcing marking method and system in professional field | |
JP6675868B2 (en) | Information processing apparatus, information processing method, and program | |
JP2006190147A (en) | Device, method and program for displaying dependency | |
JP2006313483A (en) | Content evaluation method | |
TWI736860B (en) | Text exploration method, recording medium with text exploration program recorded, and text exploration device | |
JP2004185346A (en) | Method and system for supporting project work | |
JP2021165892A (en) | Information processing device, information processing method and program | |
Schleußinger et al. | Evaluating a Visual Search Interface | |
Khobragade et al. | Facebook Data Mining and Sentiment Analysis Using R Language. | |
JP4728878B2 (en) | Time series analysis support system, time series analysis support method, and time series analysis support program | |
JP2007272517A (en) | Micro-scenario data analysis system and micro scenario data analysis program | |
JP2005190404A (en) | System, method and program for proposing learning course |