TWI703457B - Text exploration method, text exploration program and text exploration device - Google Patents

Text exploration method, text exploration program and text exploration device Download PDF

Info

Publication number
TWI703457B
TWI703457B TW108106540A TW108106540A TWI703457B TW I703457 B TWI703457 B TW I703457B TW 108106540 A TW108106540 A TW 108106540A TW 108106540 A TW108106540 A TW 108106540A TW I703457 B TWI703457 B TW I703457B
Authority
TW
Taiwan
Prior art keywords
symbiosis
screen
word
interest
network
Prior art date
Application number
TW108106540A
Other languages
Chinese (zh)
Other versions
TW201945958A (en
Inventor
柿木未希
Original Assignee
日商斯庫林集團股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日商斯庫林集團股份有限公司 filed Critical 日商斯庫林集團股份有限公司
Publication of TW201945958A publication Critical patent/TW201945958A/en
Application granted granted Critical
Publication of TWI703457B publication Critical patent/TWI703457B/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)

Abstract

本發明之文字探勘方法具備如下步驟:自文字資料擷取單詞之步驟;對所擷取之生成共生矩陣之步驟;基於所生成之共生矩陣生成共生網路之步驟;及顯示包含所生成之共生網路之畫面之步驟。於在包含基於所指定之文字資料之整體之第1共生網路之第1畫面內輸入指定關注詞之指示時,自包括所指定之文字資料中含關注詞之部分之限定文字資料擷取單詞,對所擷取之單詞使用限定文字資料生成第2共生矩陣,基於第2共生矩陣生成第2共生網路,顯示包含第2共生網路之第2畫面。 The text mining method of the present invention has the following steps: a step of extracting words from text data; a step of generating a symbiosis matrix from the extracted; a step of generating a symbiosis network based on the generated symbiosis matrix; and displaying the generated symbiosis The steps of the network screen. When inputting instructions for specifying the word of interest in the first screen of the first symbiosis network containing the entirety of the specified text data, extract words from the limited text data that includes the part of the specified text data that contains the word of interest , Use limited text data for the extracted words to generate a second co-occurrence matrix, generate a second co-occurrence network based on the second co-occurrence matrix, and display the second screen containing the second co-occurrence network.

Description

文字探勘方法、文字探勘程式及文字探勘裝置 Text exploration method, text exploration program and text exploration device

本發明係關於文字探勘,尤其關於顯示包含單詞之共生網路之畫面之文字探勘方法、文字探勘程式及文字探勘裝置。 The present invention relates to text mining, in particular to text mining methods, text mining programs, and text mining devices that display pictures of symbiotic networks containing words.

近年來,對自由記述之文字資料進行分析,並根據分析結果求出有用之資訊之文字探勘受到關注。於文字探勘中,例如,藉由自分析對象之文字資料擷取單詞,並對單詞之出現頻度或出現傾向等進行解析,而求出資訊。 In recent years, the analysis of freely written text data and the search for useful information based on the analysis results have attracted attention. In text exploration, for example, by extracting words from the text data of the analysis object, and analyzing the appearance frequency or tendency of the words, the information is obtained.

於對自由記述之文字資料進行分析時,分析者並非於初始階段中主觀地選擇對象,而必須掌握文字資料之整體情況。因此,有分析者使用文字資料中所包含之單詞之共生網路之情形。 When analyzing freely described textual data, the analyst does not subjectively select objects in the initial stage, but must grasp the overall situation of the textual data. Therefore, there are cases where an analyst uses a symbiotic network of words contained in text data.

圖19係表示共生網路之例之圖。共生網路係自文字資料擷取較多地包含於相同句子中之單詞之對,將其結果利用無向圖表達者。於分析對象之文字資料中單詞Wa與單詞Wb較多地包含於相同句子中之情況下,於共生網路中,包含與單詞Wa對應之節點、與單詞Wb對應之節點、及將兩者連接之邊。圖19所示之共生網路包含與「職員(staff)」對應之節點、與「對應」對應之節點、及將兩者連接之邊。若觀察圖19所示之共生網路則可知,於分析對象之文字資料中「職員」與「對應」較多地包含於相同句子中。 Fig. 19 is a diagram showing an example of a symbiotic network. The symbiosis network extracts more pairs of words contained in the same sentence from text data, and uses undirected graphs to express the results. When the word Wa and word Wb in the text data of the analysis object are mostly contained in the same sentence, the symbiosis network includes the node corresponding to the word Wa, the node corresponding to the word Wb, and connecting the two The side. The symbiosis network shown in Fig. 19 includes a node corresponding to "staff", a node corresponding to "corresponding", and an edge connecting the two. Observing the symbiosis network shown in Fig. 19, it can be seen that in the text data of the analysis object, "staff" and "corresponding" are mostly contained in the same sentence.

一般而言,共生網路係基於所指定之文字資料之整體 而生成。以下,將此種共生網路稱為「整體共生網路」。分析者根據自己建立之假設或分析目的而自整體共生網路選擇複數個應關注之單詞(以下,稱為關注詞),考慮關注詞進行以後之分析。 Generally speaking, the symbiosis network is generated based on the whole specified text data. Hereinafter, this kind of symbiosis network is referred to as the "holistic symbiosis network." The analyst selects a plurality of words that should be paid attention to from the overall symbiosis network according to the hypothesis or analysis purpose established by himself (hereinafter referred to as the word of interest), and considers the words of interest for subsequent analysis.

分析者於選擇關注詞時,為了判斷已選擇之關注詞是否適合分析目的等,而考察於包含關注詞之句子之中如何使用關注詞。因此,分析者有時使用基於包括所指定之文字資料中含關注詞之句子之文字資料(以下,稱為限定文字資料)的共生網路。再者,此處所言之「含關注詞之文章」並非僅指包含關注詞之單一之句子之情況,存在意指包括含關注詞之句子之段落等被分割為塊單位之複數個句子(句子之集合)的情況。以下,將此種共生網路稱為「限定共生網路」。分析者藉由使用限定共生網路,可掌握限定文字資料之內容。分析者反覆參照整體共生網路與限定共生網路,直至選擇所有關注詞為止。 When the analyst selects the word of interest, in order to determine whether the selected word of interest is suitable for the purpose of analysis, etc., he examines how to use the word of interest in the sentence containing the word of interest. Therefore, analysts sometimes use a symbiosis network based on text data (hereinafter referred to as limited text data) including sentences containing words of interest in the specified text data. Furthermore, the "article containing the word of interest" mentioned here does not only refer to the case of a single sentence containing the word of interest. It means that the paragraph including the sentence containing the word of interest is divided into plural sentences (sentences). Of the collection). Hereinafter, this kind of symbiosis network is referred to as a "limited symbiosis network." The analyst can grasp the content of the limited text data by using the limited symbiosis network. The analyst repeatedly refers to the overall symbiosis network and the limited symbiosis network until all the words of interest are selected.

以下,考慮生成文字資料中所包含之單詞之共生網路,顯示包含所生成之共生網路之畫面之文字探勘裝置。於日本專利特開平8-314980號公報中,記載有對複數個文書之各者生成整體共生網路,顯示包含所生成之複數個整體共生網路之畫面之文件資料庫顯示裝置。該顯示裝置係自複數個整體共生網路之中檢索利用者輸入之單詞,將檢索出之單詞於畫面內強調顯示。 In the following, consider generating a symbiosis network of words contained in text data, and a text exploration device that displays a screen containing the generated symbiosis network. In Japanese Patent Laid-Open No. 8-314980, there is a document database display device that generates an overall symbiosis network for each of a plurality of documents, and displays a screen containing the generated plurality of overall symbiosis networks. The display device retrieves the words input by the user from a plurality of overall symbiosis networks, and highlights the retrieved words on the screen.

習知之文字探勘裝置係基於所指定之文字資料之整體而生成共生網路。因此,根據習知之文字探勘裝置,可容易地顯示包含整體共生網路之畫面。 The conventional text exploration device generates a symbiosis network based on the whole specified text data. Therefore, according to the conventional text exploration device, the screen including the overall symbiotic network can be easily displayed.

另一方面,於使用習知之文字探勘裝置顯示包含限定共生網路之畫面時,分析者必須進行繁雜之操作。具體而言,分析者每次自整體共生網路之中選擇1個關注詞時,均必須基於所指定之文字資料生成限定文字資料,並將所生成之限定文字資料賦予至文字探勘裝置。又,分析者於選擇關注詞時,參照整體共生網路與限定共生網路之兩者。因此,文字探勘裝置必須保存整體共生網路之圖像資料與限定共生網路之圖像資料之兩者。然而,於生成較多之共生網路之情況下,圖像資料之保存與管理變得困難。 On the other hand, when using a conventional text mining device to display a screen containing a limited symbiotic network, the analyst must perform complicated operations. Specifically, every time an analyst selects a word of interest from the overall symbiosis network, he must generate limited text data based on the specified text data, and assign the generated limited text data to the text exploration device. In addition, the analyst refers to both the overall symbiosis network and the limited symbiosis network when selecting the words of interest. Therefore, the text mining device must store both the image data of the overall symbiosis network and the image data of the limited symbiosis network. However, when more symbiotic networks are generated, the preservation and management of image data becomes difficult.

因此,本發明之目的在於提供一種可將包含指定關注詞時之共生網路之畫面利用簡單之操作顯示的文字探勘方法、文字探勘程式及文字探勘裝置。 Therefore, the object of the present invention is to provide a text mining method, text mining program, and text mining device that can display the screen of the symbiosis network containing the specified word of interest with simple operations.

本發明之第1態樣係一種文字探勘方法,其係顯示包含文字資料之分析結果之畫面者;其特徵在於,其具備如下步驟:自文字資料擷取單詞之步驟;對上述單詞生成共生矩陣之步驟;基於上述共生矩陣生成共生網路之步驟;及顯示包含上述共生網路之畫面之步驟;於在包含基於所指定之文字資料之整體之第1共生網路之第1畫面內輸入指定關注詞之指示時,擷取上述單詞之步驟係自包括上述所指定之文字資料中含上述關注詞之部分之限定文字資料擷取上述單詞,生成上述共生矩陣之步驟係對上述單詞使用上述限定文字資料生成第2共生矩陣,生成上述共生網路之步驟係基於上述第2共生矩陣生成第2共生網路,顯示上述畫面之步驟係顯示包含上述 第2共生網路之第2畫面。 The first aspect of the present invention is a text exploration method, which displays a screen containing the analysis results of text data; it is characterized in that it has the following steps: a step of extracting words from text data; and generating a co-occurrence matrix for the words The steps of; the steps of generating a symbiosis network based on the above symbiosis matrix; and the steps of displaying the screen containing the above symbiosis network; enter the designation in the first screen of the first symbiosis network including the entirety based on the specified text data When the word of interest is instructed, the step of extracting the word is to extract the word from the limited text data that includes the part of the word of interest in the specified text data, and the step of generating the co-occurrence matrix is to apply the restriction to the word The text data generates a second symbiosis matrix, the step of generating the symbiosis network is based on the second symbiosis matrix to generate a second symbiosis network, and the step of displaying the screen is to display a second screen containing the second symbiosis network.

本發明之第2態樣係於本發明之第1態樣中,特徵在於,藉由於上述第1畫面內選擇上述第1共生網路中所包含之1個或複數個節點,並選擇分析開始,而輸入將與上述節點對應之單詞指定為上述關注詞之指示。 The second aspect of the present invention is in the first aspect of the present invention, and is characterized in that by selecting one or more nodes included in the first symbiosis network in the first screen, and selecting the analysis start , And the input indicates that the word corresponding to the above node is designated as the above-mentioned word of interest.

本發明之第3態樣係於本發明之第1態樣中,特徵在於,藉由於上述第1畫面內繼續選擇上述第1共生網路中所包含之1個節點,而輸入將與上述節點對應之單詞指定為上述關注詞之指示。 The third aspect of the present invention is in the first aspect of the present invention, and is characterized in that by continuing to select a node included in the first symbiosis network in the first screen, the input will be the same as the node The corresponding word is designated as an indication of the above-mentioned word of interest.

本發明之第4態樣係於本發明之第1態樣中,特徵在於,藉由於上述第1畫面內繼續選擇上述第1共生網路中所包含之1條邊,而輸入將與連接於上述邊之2個節點對應之單詞指定為上述關注詞之指示。 The fourth aspect of the present invention is in the first aspect of the present invention, and is characterized in that by continuing to select an edge included in the first symbiosis network in the first screen, the input will be connected to the The words corresponding to the two nodes of the edge are designated as the indication of the above-mentioned word of interest.

本發明之第5態樣係於本發明之第1態樣中,特徵在於,藉由於上述第1畫面內選擇上述第1共生網路中所包含之1條或複數條邊,並選擇分析開始,而輸入將與連接於上述邊之複數個節點對應之單詞指定為上述關注詞之指示。 The fifth aspect of the present invention is in the first aspect of the present invention, characterized in that by selecting one or more edges included in the first symbiosis network in the first screen, and selecting the start of analysis, The input indicates that the words corresponding to the plural nodes connected to the aforementioned edges are designated as the aforementioned words of interest.

本發明之第6態樣係於本發明之第1態樣中,特徵在於,於包含複數個第2共生網路之第2畫面內輸入合併指示時,顯示上述畫面之步驟係將上述複數個第2共生網路以標籤形式顯示。 The sixth aspect of the present invention is in the first aspect of the present invention, and is characterized in that when the merge instruction is entered in the second screen including a plurality of second symbiotic networks, the step of displaying the above screen is to change the above The second symbiosis network is displayed as a label.

本發明之第7態樣係於本發明之第6態樣中,特徵在於,藉由於上述第2畫面內點住一個第2共生網路並於另一個第2共生網路內鬆開,而輸入上述合併指示。 The seventh aspect of the present invention is in the sixth aspect of the present invention, and is characterized in that by clicking on a second symbiosis network in the second screen and releasing it in another second symbiosis network, Enter the above merge instructions.

本發明之第8態樣係於本發明之第1態樣中,特徵在 於,上述限定文字資料包括上述所指定之文字資料中含上述關注詞之句子。 The eighth aspect of the present invention is in the first aspect of the present invention, and is characterized in that the above-mentioned limited text data includes a sentence containing the aforementioned word of interest in the above-mentioned designated text data.

本發明之第9態樣係於本發明之第8態樣中,特徵在於,指定複數個關注詞時之上述限定文字資料包括上述所指定之文字資料中含上述複數個關注詞之全部之句子。 The ninth aspect of the present invention is in the eighth aspect of the present invention, and is characterized in that the above-mentioned limited text data when specifying a plurality of words of interest include all sentences in the specified text data that contain the plurality of words of interest. .

本發明之第10態樣係於本發明之第8態樣中,特徵在於,指定複數個關注詞時之上述限定文字資料包括上述所指定之文字資料中含上述複數個關注詞之任一者之句子。 The tenth aspect of the present invention is in the eighth aspect of the present invention, and is characterized in that the above-mentioned limited text data when specifying a plurality of words of interest includes any one of the above-mentioned plurality of words of interest in the specified text data The sentence.

本發明之第11態樣係於本發明之第1態樣中,特徵在於,生成上述共生矩陣之步驟係生成以Jaccard係數作為要素之共生矩陣。 The eleventh aspect of the present invention is in the first aspect of the present invention, and is characterized in that the step of generating the co-occurrence matrix is to generate a co-occurrence matrix with Jaccard coefficients as elements.

本發明之第12態樣係一種文字探勘程式,其係用以顯示包含文字資料之分析結果之畫面者;其特徵在於,由CPU利用記憶體而使電腦執行如下步驟:自文字資料擷取單詞之步驟;對上述單詞生成共生矩陣之步驟;基於上述共生矩陣生成共生網路之步驟;及顯示包含上述共生網路之畫面之步驟;於在包含基於所指定之文字資料之整體之第1共生網路之第1畫面內輸入指定關注詞之指示時,擷取上述單詞之步驟係自包括上述所指定之文字資料中含上述關注詞之部分之限定文字資料擷取上述單詞,生成上述共生矩陣之步驟係對上述單詞使用上述限定文字資料生成第2共生矩陣,生成上述共生網路之步驟係基於上述第2共生矩陣生成第2共生網路,顯示上述畫面之步驟係顯示包含上述 第2共生網路之第2畫面。 The twelfth aspect of the present invention is a text exploration program, which is used to display a screen containing the analysis result of text data; it is characterized in that the CPU uses the memory to make the computer perform the following steps: extract words from text data The step of; the step of generating a symbiosis matrix for the aforementioned words; the step of generating a symbiosis network based on the aforementioned symbiosis matrix; and the step of displaying a screen containing the aforementioned symbiosis network; in the first symbiosis including the entirety based on the specified text data When inputting instructions for specifying the word of interest in the first screen of the network, the step of extracting the word is to extract the word from the limited text data that includes the part of the word of interest in the specified text data to generate the co-occurrence matrix The step is to generate a second symbiosis matrix using the limited text data for the above words, the step of generating the symbiosis network is to generate a second symbiosis network based on the second symbiosis matrix, and the step of displaying the screen is to display that the second symbiosis is included The second screen of the Internet.

本發明之第13態樣係於本發明之第12態樣中,特徵在於,藉由於上述第1畫面內選擇上述第1共生網路中所包含之1個或複數個節點,並選擇分析開始,而輸入將與上述節點對應之單詞指定為上述關注詞之指示。 The thirteenth aspect of the present invention is in the twelfth aspect of the present invention, and is characterized in that one or more nodes included in the first symbiosis network are selected in the first screen, and the analysis starts , And the input indicates that the word corresponding to the above node is designated as the above-mentioned word of interest.

本發明之第14態樣係於本發明之第12態樣中,特徵在於,藉由於上述第1畫面內繼續選擇上述第1共生網路中所包含之1個節點,而輸入將與上述節點對應之單詞指定為上述關注詞之指示。 The fourteenth aspect of the present invention is in the twelfth aspect of the present invention, and is characterized in that by continuing to select a node included in the first symbiotic network in the first screen, the input will be the same as the node The corresponding word is designated as an indication of the above-mentioned word of interest.

本發明之第15態樣係於本發明之第12態樣中,特徵在於,藉由於上述第1畫面內繼續選擇上述第1共生網路中所包含之1條邊,而輸入將與連接於上述邊之2個節點對應之單詞指定為上述關注詞之指示。 The fifteenth aspect of the present invention is in the twelfth aspect of the present invention, and is characterized in that by continuing to select an edge included in the first symbiosis network in the first screen, the input will be connected to the The words corresponding to the two nodes of the edge are designated as the indication of the above-mentioned word of interest.

本發明之第16態樣係於本發明之第12態樣中,特徵在於,藉由於上述第1畫面內選擇上述第1共生網路中所包含之1條或複數條邊,並選擇分析開始,而輸入將與連接於上述邊之複數個節點對應之單詞指定為上述關注詞之指示。 The sixteenth aspect of the present invention is in the twelfth aspect of the present invention, and is characterized in that by selecting one or more edges included in the first symbiosis network in the first screen, and selecting the start of analysis, The input indicates that the words corresponding to the plural nodes connected to the aforementioned edges are designated as the aforementioned words of interest.

本發明之第17態樣係於本發明之第12態樣中,特徵在於,於在包含複數個第2共生網路之第2畫面內輸入合併指示時,顯示上述畫面之步驟係將上述複數個第2共生網路以標籤形式顯示。 The seventeenth aspect of the present invention is in the twelfth aspect of the present invention, and is characterized in that when a merge instruction is entered in a second screen including a plurality of second symbiotic networks, the step of displaying the screen is to change the plurality of The second symbiosis network is displayed as a label.

本發明之第18態樣係於本發明之第17態樣中,特徵在於,藉由於上述第2畫面內點住一個第2共生網路並於另一個第2共生網路內鬆開,而輸入上述合併指示。 The eighteenth aspect of the present invention is in the seventeenth aspect of the present invention, and is characterized in that by clicking on a second symbiosis network in the second screen and releasing it in another second symbiosis network, Enter the above merge instructions.

本發明之第19態樣係一種文字探勘裝置,其係顯示包 含文字資料之分析結果之畫面者;其特徵在於,其具備:單詞擷取部,其自文字資料擷取單詞;共生矩陣生成部,其對上述單詞生成共生矩陣;共生網路生成部,其基於上述共生矩陣生成共生網路;及畫面顯示部,其顯示包含上述共生網路之畫面;於在包含基於所指定之文字資料之整體之第1共生網路之第1畫面內輸入指定關注詞之指示時,上述單詞擷取部自包括上述指定之文字資料中含上述關注詞之部分之限定文字資料擷取上述單詞,上述共生矩陣生成部對上述單詞使用上述限定文字資料生成第2共生矩陣,上述共生網路生成部基於上述第2共生矩陣生成第2共生網路,上述畫面顯示部顯示包含上述第2共生網路之第2畫面。 The 19th aspect of the present invention is a text exploration device that displays a screen containing the analysis result of text data; it is characterized in that it includes: a word extraction unit that extracts words from text data; and a symbiosis matrix generation unit , Which generates a symbiosis matrix for the above words; a symbiosis network generation section, which generates a symbiosis network based on the symbiosis matrix; and a screen display section, which displays a screen containing the symbiosis network; When inputting instructions to specify the word of interest in the first screen of the overall first symbiosis network, the word extraction unit extracts the word from the limited text data that includes the portion of the word of interest in the specified text data, and the symbiosis The matrix generating unit generates a second co-occurrence matrix using the limited text data for the word, the co-occurrence network generating unit generates a second co-occurrence network based on the second co-occurrence matrix, and the screen display unit displays the second co-occurrence network including the second co-occurrence network 2 pictures.

本發明之第20態樣係於本發明之第19態樣中,特徵在於,於在包含複數個第2共生網路之第2畫面內輸入合併指示時,上述畫面顯示部將上述複數個第2共生網路以標籤形式顯示。 The twentieth aspect of the present invention is in the nineteenth aspect of the present invention, and is characterized in that when a merge instruction is input in a second screen including a plurality of second symbiotic networks, the screen display unit displays the plurality of second symbiosis networks. 2 The symbiosis network is displayed as a label.

根據上述第1、第12或第19態樣,於在包含基於所指定之文字資料之整體之第1共生網路之第1畫面內輸入指定關注詞之指示時,顯示包含基於所指定之文字資料中含關注詞之部分之第2共生網路之第2畫面。因此,可藉由簡單之操作顯示包含指定關注詞時之共生網路之畫面。 According to the first, twelfth, or nineteenth aspect above, when inputting the instruction of the designated word of interest in the first screen of the first symbiosis network containing the entirety of the designated text data, the display contains the text based on the designated The second screen of the second symbiosis network of the part of the data that contains the word of interest. Therefore, it is possible to display the screen of the symbiosis network including the specified word of interest with a simple operation.

根據上述第2或第13態樣,藉由於第1畫面內選擇1個或複數個節點以及分析開始,可藉由簡單之操作輸入指定1個或複數個關注詞之指示,並顯示包含指定1個或複數個關注詞時之共生網路之畫面。 According to the second or 13th aspect above, by selecting one or more nodes in the first screen and starting the analysis, you can input instructions to specify one or more words of interest by simple operations, and display including the specified 1 The screen of the symbiosis network when one or more words follow.

根據上述第3或第14態樣,藉由於第1畫面內繼續選擇1個節點,可藉由簡單之操作輸入指定1個關注詞之指示,並顯示包含指定1個關注詞時之共生網路之畫面。 According to the third or fourteenth aspect above, by continuing to select a node in the first screen, you can input instructions to specify a word of interest with a simple operation, and display the symbiosis network including the specified word of interest The picture.

根據上述第4或第15態樣,藉由於第1畫面內繼續選擇1條邊,可藉由簡單之操作輸入指定2個關注詞之指示,並顯示包含指定2個關注詞時之共生網路之畫面。 According to the 4th or 15th aspect above, by continuing to select an edge in the first screen, you can input instructions for specifying 2 words of interest with a simple operation, and display the symbiosis network that contains the specified 2 words of interest. Picture.

根據上述第5或第16態樣,藉由於第1畫面內選擇1條或複數條邊與分析開始,可藉由簡單之操作輸入指定複數個關注詞之指示,並顯示包含指定複數個關注詞時之共生網路之畫面。 According to the 5th or 16th aspect above, by selecting one or more edges in the first screen and starting the analysis, you can input instructions for specifying multiple words of interest with a simple operation, and display when the specified multiple words of interest are included The picture of the symbiosis network.

根據上述第6、第17或第20態樣,藉由於輸入合併指示時將複數個第2共生網路以標籤形式顯示,可精簡地顯示複數個第2共生網路。 According to the sixth, 17th, or 20th aspect, by displaying a plurality of second symbiosis networks in label form when inputting a merge instruction, a plurality of second symbiosis networks can be displayed concisely.

根據上述第7或第18態樣,藉由於第2畫面內將第2共生網路點住或鬆開,可藉由簡單之操作輸入合併指示,並精簡地顯示複數個第2共生網路。 According to the seventh or eighteenth aspect, by clicking or releasing the second symbiosis network in the second screen, a simple operation can be used to input the merge instruction, and a plurality of second symbiosis networks can be displayed concisely.

根據上述第8態樣,於輸入指定關注詞之指示時,可將所指定之文字資料以句子單位劃分而求出限定文字資料,並顯示包含基於所求出之限定文字資料之第2共生網路之畫面。 According to the eighth aspect above, when inputting instructions for specifying the word of interest, the specified text data can be divided into sentence units to obtain the limited text data, and the second symbiosis network containing the obtained limited text data can be displayed. Picture of the road.

根據上述第9或第10態樣,可顯示包含對複數個關注詞進行AND處理或OR處理時之第2共生網路之畫面。 According to the ninth or tenth aspect above, the screen including the second symbiosis network when AND processing or OR processing plural words of interest can be displayed.

根據上述第11態樣,藉由生成以Jaccard係數作為要素之共生矩陣,可較佳地分析文字資料中所包含之單詞之共生性。 According to the eleventh aspect, by generating a co-occurrence matrix with Jaccard coefficients as elements, the co-occurrence of words contained in the text data can be better analyzed.

10‧‧‧文字探勘裝置 10‧‧‧Text exploration device

11‧‧‧指示輸入部 11‧‧‧Instruction input section

12‧‧‧文字資料儲存部 12‧‧‧Text data storage department

13‧‧‧單詞擷取部 13‧‧‧Word Extraction Section

14‧‧‧共生矩陣生成部 14‧‧‧Symbiosis Matrix Generation Department

15‧‧‧共生網路生成部 15‧‧‧Symbiosis Network Generation Department

16‧‧‧畫面顯示部 16‧‧‧Screen display

20‧‧‧電腦 20‧‧‧Computer

21‧‧‧CPU 21‧‧‧CPU

22‧‧‧主記憶體 22‧‧‧Main memory

23‧‧‧儲存部 23‧‧‧Storage Department

24‧‧‧輸入部 24‧‧‧Input part

25‧‧‧顯示部 25‧‧‧Display

26‧‧‧通信部 26‧‧‧Ministry of Communications

27‧‧‧記錄媒體讀取部 27‧‧‧Recording media reading section

28‧‧‧鍵盤 28‧‧‧Keyboard

29‧‧‧滑鼠 29‧‧‧Mouse

30‧‧‧記錄媒體 30‧‧‧Recording media

31‧‧‧文字探勘程式 31‧‧‧Text exploration program

32‧‧‧文字資料 32‧‧‧Text data

41~45‧‧‧視窗 41~45‧‧‧Window

51‧‧‧整體共生網路 51‧‧‧Integral Symbiosis Network

52~54‧‧‧限定共生網路 52~54‧‧‧Limited symbiosis network

61‧‧‧分析按鈕 61‧‧‧Analyze button

62‧‧‧滑鼠游標 62‧‧‧Mouse cursor

63~64‧‧‧標籤 63~64‧‧‧label

71~75‧‧‧畫面 71~75‧‧‧Screen

圖1係表示本發明之實施形態之文字探勘裝置之構成的方塊圖。 Fig. 1 is a block diagram showing the structure of a character exploration device according to an embodiment of the present invention.

圖2係表示作為圖1所示之文字探勘裝置而發揮功能之電腦之構成的方塊圖。 Fig. 2 is a block diagram showing the structure of a computer functioning as the text mining device shown in Fig. 1.

圖3係表示圖1所示之文字探勘裝置之動作之流程圖。 Fig. 3 is a flowchart showing the operation of the text exploration device shown in Fig. 1.

圖4係表示由圖1所示之文字探勘裝置生成之共生矩陣之例的圖。 Fig. 4 is a diagram showing an example of a co-occurrence matrix generated by the character exploration device shown in Fig. 1.

圖5係表示圖1所示之文字探勘裝置所顯示之包含整體共生網路之視窗之例的圖。 FIG. 5 is a diagram showing an example of a window including the overall symbiosis network displayed by the text mining device shown in FIG. 1.

圖6係表示於圖5所示之視窗內指定關注詞之第1操作之圖。 FIG. 6 is a diagram showing the first operation of specifying the word of interest in the window shown in FIG. 5.

圖7係表示於圖5所示之視窗內指定關注詞之第2操作之圖。 FIG. 7 is a diagram showing the second operation of specifying the word of interest in the window shown in FIG. 5.

圖8係表示於圖5所示之視窗內指定關注詞之第3操作之圖。 FIG. 8 is a diagram showing the third operation of specifying the word of interest in the window shown in FIG. 5.

圖9係表示於圖5所示之視窗內指定關注詞之第4操作之圖。 FIG. 9 is a diagram showing the fourth operation of specifying the word of interest in the window shown in FIG. 5.

圖10係表示於圖5所示之視窗內指定關注詞之第5操作之圖。 FIG. 10 is a diagram showing the fifth operation of specifying the word of interest in the window shown in FIG. 5.

圖11係表示於圖5所示之視窗內指定關注詞之第6操作之圖。 FIG. 11 is a diagram showing the sixth operation of specifying the word of interest in the window shown in FIG. 5.

圖12係表示圖1所示之文字探勘裝置所顯示之包含限定共生網路之視窗之例的圖。 FIG. 12 is a diagram showing an example of a window including a limited symbiotic network displayed by the text mining device shown in FIG. 1.

圖13係表示圖1所示之文字探勘裝置所顯示之包含限定共生網路之視窗之例的圖。 FIG. 13 is a diagram showing an example of a window including a limited symbiotic network displayed by the text mining device shown in FIG. 1.

圖14係表示圖1所示之文字探勘裝置之顯示畫面之例的圖。 Fig. 14 is a diagram showing an example of a display screen of the character exploration device shown in Fig. 1.

圖15係表示圖1所示之文字探勘裝置之顯示畫面之例的圖。 Fig. 15 is a diagram showing an example of a display screen of the character exploration device shown in Fig. 1.

圖16係表示圖1所示之文字探勘裝置之顯示畫面之例的圖。 Fig. 16 is a diagram showing an example of a display screen of the character exploration device shown in Fig. 1.

圖17係表示圖1所示之文字探勘裝置中之將視窗合併之操作的圖。 FIG. 17 is a diagram showing the operation of merging windows in the text exploration device shown in FIG. 1.

圖18係表示進行圖17所示之操作後之顯示畫面之圖。 FIG. 18 is a diagram showing the display screen after the operation shown in FIG. 17 is performed.

圖19係表示共生網路之例之圖。 Fig. 19 is a diagram showing an example of a symbiotic network.

以下,參照圖式,對本發明之實施形態之文字探勘方法、文字探勘程式及文字探勘裝置進行說明。本實施形態之文字探勘方法典型而言為使用電腦執行。本實施形態之文字探勘程式係用以使用電腦實施文字探勘方法之程式。本實施形態之文字探勘裝置典型而言為使用電腦構成。執行文字探勘程式之電腦係作為文字探勘裝置發揮功能。 Hereinafter, with reference to the drawings, the text mining method, text mining program, and text mining device of the embodiment of the present invention will be described. The character exploration method of this embodiment is typically executed by a computer. The text exploration program of this embodiment is a program used to implement the text exploration method using a computer. The character exploration device of this embodiment is typically constructed using a computer. The computer running the text mining program functions as a text mining device.

圖1係表示本發明之實施形態之文字探勘裝置之構成的方塊圖。圖1所示之文字探勘裝置10具備指示輸入部11、文字資料儲存部12、單詞擷取部13、共生矩陣生成部14、共生網路生成部15、及畫面顯示部16。文字探勘裝置10係基於儲存於文字資料儲存部12之文字資料生成共生網路作為文字資料之分析結果,並顯示包含所生成之共生網路之畫面。 Fig. 1 is a block diagram showing the structure of a character exploration device according to an embodiment of the present invention. The character exploration device 10 shown in FIG. 1 includes an instruction input unit 11, a character data storage unit 12, a word extraction unit 13, a symbiosis matrix generation unit 14, a symbiosis network generation unit 15, and a screen display unit 16. The text mining device 10 generates a symbiosis network based on the text data stored in the text data storage unit 12 as an analysis result of the text data, and displays a screen containing the generated symbiosis network.

文字探勘裝置10之動作之概要如下所述。對指示輸入部11,輸入來自利用者(文字資料之分析者)之指示。文字資料儲存部12儲存自由記述之1個以上之文字資料。單詞擷取部13自文字資料儲存部12讀出所指定之文字資料,對所讀出之文字資料進行詞素解析,由此自文字資料擷取單詞。共生矩陣生成部14係對利用單詞擷取部13擷取之單詞生成共生矩陣。共生網路生成部15基於利用共生矩陣生成部14生成之共生矩陣生成共生網路。畫面顯示部16顯示包含利用共生網路生成部15生成之共生網路之畫面。 The outline of the operation of the text mining device 10 is as follows. To the instruction input unit 11, an instruction from a user (analyzer of text data) is input. The text data storage unit 12 stores one or more text data that can be freely described. The word extraction unit 13 reads the designated text data from the text data storage unit 12, and performs morphological analysis on the read text data, thereby extracting words from the text data. The co-occurrence matrix generation unit 14 generates a co-occurrence matrix for the words extracted by the word extraction unit 13. The symbiosis network generation unit 15 generates a symbiosis network based on the symbiosis matrix generated by the symbiosis matrix generation unit 14. The screen display unit 16 displays a screen including the symbiosis network generated by the symbiosis network generation unit 15.

利用者使用指示輸入部11,輸入指定分析對象之文字 資料之指示、指定關注詞之指示等。單詞擷取部13、共生網路生成部15、及畫面顯示部16根據來自利用者之指示,進行用以顯示包含共生網路之畫面之動作。於輸入有指定文字資料之指示時,生成基於所指定之文字資料之整體之整體共生網路,並顯示包含整體共生網路之畫面。於在包含整體共生網路之畫面內輸入有指定關注詞之指示時,生成基於所指定之文字資料中含關注詞之句子之限定共生網路,並顯示包含限定共生網路之畫面。 The user uses the instruction input unit 11 to input instructions to specify the text data of the analysis object, instructions to specify the word of interest, and the like. The word extraction unit 13, the symbiosis network generation unit 15, and the screen display unit 16 perform actions for displaying a screen including the symbiosis network according to instructions from the user. When inputting instructions with designated text data, generate an overall symbiosis network based on the entirety of the designated text data, and display a screen containing the overall symbiosis network. When inputting instructions for specifying the word of interest in the screen containing the overall symbiosis network, a limited symbiosis network based on the sentence containing the word of interest in the specified text data is generated, and the screen containing the limited symbiosis network is displayed.

圖2係表示作為文字探勘裝置10而發揮功能之電腦之構成之方塊圖。圖2所示之電腦20具備CPU(Central Processing Unit,中央處理單元)21、主記憶體22、儲存部23、輸入部24、顯示部25、通信部26、及記錄媒體讀取部27。主記憶體22例如使用DRAM(Dynamic Random Access Memory,動態隨機存取記憶體)。儲存部23例如使用硬碟或固態驅動器。輸入部24例如包含鍵盤28或滑鼠29。顯示部25例如使用液晶顯示器。通信部26係有線通信或無線通信之介面線路。記錄媒體讀取部27係儲存有程式等之記錄媒體30之介面線路。記錄媒體30例如使用CD-ROM(Compact Disc-Read Only Memory,唯讀光碟)、DVD-ROM(Digital Versatile Disc-Read Only Memory,唯讀數位多功能光碟)、USB(Universal Serial Bus,通用序列匯流排)記憶體等之非短暫性之記錄媒體。 FIG. 2 is a block diagram showing the structure of a computer functioning as a text exploration device 10. The computer 20 shown in FIG. 2 includes a CPU (Central Processing Unit) 21, a main memory 22, a storage unit 23, an input unit 24, a display unit 25, a communication unit 26, and a recording medium reading unit 27. The main memory 22 uses, for example, DRAM (Dynamic Random Access Memory, dynamic random access memory). The storage unit 23 uses, for example, a hard disk or a solid-state drive. The input unit 24 includes a keyboard 28 or a mouse 29, for example. The display unit 25 uses, for example, a liquid crystal display. The communication unit 26 is an interface line for wired communication or wireless communication. The recording medium reading unit 27 is an interface circuit of the recording medium 30 storing programs and the like. The recording medium 30 uses, for example, CD-ROM (Compact Disc-Read Only Memory), DVD-ROM (Digital Versatile Disc-Read Only Memory), USB (Universal Serial Bus, universal serial bus). Row) non-transitory recording media such as memory.

於電腦20執行文字探勘程式31之情況下,儲存部23儲存文字探勘程式31與文字資料32。文字探勘程式31與文字資料32例如可為自伺服器或其他電腦使用通信部26接收者,亦可為自記錄媒體30使用記錄媒體讀取部27讀出者。 When the computer 20 executes the text exploration program 31, the storage unit 23 stores the text exploration program 31 and the text data 32. The text mining program 31 and text data 32 can be, for example, those received from a server or other computer using the communication unit 26, or can be read from the recording medium 30 using the recording medium reading unit 27.

於執行文字探勘程式31時,文字探勘程式31與文字資 料32被複製傳送至主記憶體22。CPU 21將主記憶體22作為作業用記憶體利用,執行儲存於主記憶體22之文字探勘程式31,藉此進行自文字資料32擷取單詞之處理、對所擷取之單詞生成共生矩陣之處理、基於所生成之共生矩陣生成共生網路之處理、顯示包含所生成之共生網路之畫面之處理等。此時,電腦20作為文字探勘裝置10發揮功能。再者,以上所述之電腦20之構成只不過為一例,可使用任意之電腦構成文字探勘裝置10。 When the text mining program 31 is executed, the text mining program 31 and text data 32 are copied and transmitted to the main memory 22. The CPU 21 uses the main memory 22 as a working memory and executes the text exploration program 31 stored in the main memory 22, thereby performing the process of extracting words from the text data 32 and generating a co-occurrence matrix for the extracted words Processing, processing of generating a symbiosis network based on the generated symbiosis matrix, processing of displaying a screen containing the generated symbiosis network, etc. At this time, the computer 20 functions as the character exploration device 10. Furthermore, the configuration of the computer 20 described above is just an example, and any computer can be used to construct the text exploration device 10.

圖3係表示文字探勘裝置10之動作之流程圖。於進行圖3所示之動作之前,文字資料儲存部12儲存有自由記述之1個以上之文字資料。各文字資料包含複數個句子。文字探勘裝置10對儲存於文字資料儲存部12之文字資料中由利用者指定之文字資料進行處理。 FIG. 3 is a flowchart showing the operation of the text exploration device 10. Before the operation shown in FIG. 3 is performed, the text data storage unit 12 stores more than one text data that can be freely described. Each text data contains multiple sentences. The text mining device 10 processes text data designated by the user among text data stored in the text data storage unit 12.

於圖3中,指示輸入部11首先自利用者接收指定文字資料之指示(步驟S101)。此時,指示輸入部11除了接收指定文字資料之指示以外,亦可接收設定共生矩陣之基準值(詳細情況將於下文敍述)之指示、切換AND處理與OR處理(詳細情況將於下文敍述)之指示、設定共生網路之顯示態樣之詳情之指示等。所接收到之指示被輸出至文字探勘裝置10之各部。 In FIG. 3, the instruction input unit 11 first receives an instruction to specify text data from the user (step S101). At this time, in addition to receiving instructions for specifying text data, the instruction input unit 11 can also receive instructions for setting the reference value of the co-occurrence matrix (details will be described below), switching between AND processing and OR processing (details will be described below) The instructions for setting the details of the symbiosis network display status, etc. The received instructions are output to each part of the text exploration device 10.

其次,單詞擷取部13自文字資料儲存部12讀出所指定之文字資料(步驟S102)。其次,單詞擷取部13藉由對步驟S102中讀出之文字資料進行詞素解析,而自所讀出之文字資料擷取單詞(步驟S103)。此時,單詞擷取部13自所讀出之文字資料僅擷取之後的分析中需要之單詞。其次,共生矩陣生成部14對步驟S103中擷取之單詞使用步驟S102中讀出之文字資料生成共生矩陣(步驟S104)。 Next, the word extraction unit 13 reads out the designated text data from the text data storage unit 12 (step S102). Next, the word extracting unit 13 extracts words from the read text data by performing morphological analysis on the text data read in step S102 (step S103). At this time, the word extraction unit 13 extracts only the words needed in the subsequent analysis from the read text data. Next, the co-occurrence matrix generation unit 14 uses the text data read out in step S102 to generate a co-occurrence matrix for the words extracted in step S103 (step S104).

圖4係表示利用共生矩陣生成部14生成之共生矩陣之例之圖。共生矩陣之要素係對單詞之對求出之Jaccard係數。關於分析對象之文字資料,將包含單詞Wa之句子之集合設為A,將包含單詞Wb之句子之集合設為B。關於單詞之對(Wa,Wb)之Jaccard係數K(Wa,Wb),利用下式(1)提供。 FIG. 4 is a diagram showing an example of the co-occurrence matrix generated by the co-occurrence matrix generating unit 14. The element of the co-occurrence matrix is the Jaccard coefficient obtained from the pair of words. Regarding the text data of the analysis object, set the set of sentences containing the word Wa as A, and set the set of sentences containing the word Wb as B. The Jaccard coefficient K(Wa, Wb) of the word pair (Wa, Wb) is provided by the following formula (1).

K(Wa,Wb)=|A∩B|/|A∪B|…(1) K(Wa,Wb)=|A∩B|/|A∪B|…(1)

其中,於式(1)中,記號∩表示求出交集之運算,記號∪表示求出聯集之運算,|S|表示集合S中所包含之要素之個數。 Among them, in formula (1), the symbol ∩ indicates the operation to find the intersection, the symbol ∪ indicates the operation to find the union, and |S| indicates the number of elements contained in the set S.

共生矩陣生成部14於步驟S104中,針對自步驟S102中讀出之文字資料之整體擷取之單詞之對之全部求出Jaccard係數,生成以求出之Jaccard係數為要素之共生矩陣。共生矩陣之列及行係與自步驟S102中讀出之文字資料之整體擷取之單詞之種類對應。於自所讀出之文字資料之整體擷取n種單詞時,步驟S104中生成之共生矩陣係對角要素全部為1之n列n行之對稱矩陣。 In step S104, the co-occurrence matrix generation unit 14 obtains Jaccard coefficients for all pairs of words extracted from the entire text data read in step S102, and generates a co-occurrence matrix with the obtained Jaccard coefficients as elements. The columns and rows of the co-occurrence matrix correspond to the types of words extracted from the entire text data read out in step S102. When extracting n types of words from the whole of the read text data, the co-occurrence matrix generated in step S104 is a symmetric matrix with n columns and n rows with all diagonal elements of 1.

再者,共生矩陣生成部14亦可將文字資料以句子以外之單位劃分而求出Jaccard係數。例如,共生矩陣生成部14亦可將包含單詞Wa之段落之集合設為A,將包含單詞Wb之段落之集合設為B,根據式(1)求出Jaccard係數。又,於文字資料中所包含之句子具有日期之情況下,共生矩陣生成部14亦可將文字資料分為包括具有相同日期之句子之複數個部分,將包含單詞Wa之部分之集合設為A,將包含單詞Wb之部分之集合設為B,根據式(1)求出Jaccard係數。又,共生矩陣生成部14亦可生成包含表示單詞之共生性之其他值(例如,Simpson係數或餘弦距離等)作為要素之共生矩陣。 Furthermore, the co-occurrence matrix generation unit 14 may also divide the text data into units other than sentences to obtain the Jaccard coefficient. For example, the co-occurrence matrix generating unit 14 may also set the set of paragraphs containing the word Wa as A, and set the set of paragraphs including the word Wb as B, and obtain the Jaccard coefficient according to formula (1). Moreover, in the case that the sentence contained in the text data has a date, the co-occurrence matrix generating unit 14 may also divide the text data into plural parts including sentences with the same date, and set the set of parts containing the word Wa as A , Set the set of parts containing the word Wb as B, and calculate the Jaccard coefficient according to formula (1). In addition, the co-occurrence matrix generating unit 14 may also generate a co-occurrence matrix including other values (for example, Simpson coefficient or cosine distance, etc.) representing the co-occurrence of words as elements.

其次,共生網路生成部15基於步驟S104中生成之共生 矩陣,生成整體共生網路(步驟S105)。其次,畫面顯示部16顯示包含步驟S105中生成之整體共生網路之畫面(步驟S106)。圖5係表示步驟S106中顯示之包含整體共生網路之視窗之例的圖。圖5所示之視窗41包含整體共生網路51與分析按鈕61。分析按鈕61係為了指示分析開始而設置。 Next, the symbiosis network generation unit 15 generates an overall symbiosis network based on the symbiosis matrix generated in step S104 (step S105). Next, the screen display unit 16 displays a screen including the overall symbiosis network generated in step S105 (step S106). FIG. 5 is a diagram showing an example of the window including the overall symbiotic network displayed in step S106. The window 41 shown in FIG. 5 includes an overall symbiosis network 51 and an analysis button 61. The analysis button 61 is provided to instruct the start of analysis.

共生網路生成部15具有共生矩陣之基準值(以下,設為V)。基準值V可為預先決定之值,亦可為使用指示輸入部11而由利用者設定之值。於步驟S104中生成之共生矩陣中,與單詞Wa對應之列中所包含之Jaccard係數K(Wa,*)之最大值為基準值V以上之情況下,共生網路生成部15使與單詞Wa對應之節點(記載為單詞Wa之節點)包含於整體共生網路中。又,於步驟S104中生成之共生矩陣中,單詞之對(Wa,Wb)之Jaccard係數K(Wa,Wb)為基準值V以上之情況下,共生網路生成部15使將與單詞Wa對應之節點及與單詞Wb對應之節點連接之邊包含於整體共生網路中。 The symbiosis network generation unit 15 has a reference value of the symbiosis matrix (hereinafter, referred to as V). The reference value V may be a predetermined value, or may be a value set by the user using the instruction input unit 11. In the co-occurrence matrix generated in step S104, when the maximum value of the Jaccard coefficient K(Wa,*) contained in the column corresponding to the word Wa is greater than the reference value V, the co-occurrence network generating unit 15 uses the word Wa The corresponding node (the node recorded as the word Wa) is included in the overall symbiosis network. In addition, in the co-occurrence matrix generated in step S104, when the Jaccard coefficient K(Wa, Wb) of the word pair (Wa, Wb) is greater than the reference value V, the co-occurrence network generating unit 15 will correspond to the word Wa The nodes of and the edges connected to the nodes corresponding to the word Wb are included in the overall symbiosis network.

於圖5所示之整體共生網路51中,與出現頻度較大之單詞對應之節點顯示得較大。亦可為,於顯示包含共生網路之畫面時,於Jaccard係數K(Wa,Wb)較大時,較粗地顯示將與單詞Wa對應之節點及與單詞Wb對應之節點連接之邊。又,可根據Jaccard係數,切換邊之顏色,亦可切換邊之粗度與顏色之兩者。共生網路被分為能夠經由邊到達之複數個部分。於顯示包含共生網路之畫面時,亦可將各部分中所包含之複數個節點以分配至各部分之顏色顯示。再者,共生網路中所包含之節點與邊之位置並無意義。 In the overall symbiosis network 51 shown in FIG. 5, the nodes corresponding to the words with higher occurrence frequency are displayed larger. Alternatively, when displaying a screen that includes a symbiotic network, when the Jaccard coefficient K(Wa, Wb) is large, the edge connecting the node corresponding to the word Wa and the node corresponding to the word Wb is displayed boldly. In addition, the color of the side can be switched according to the Jaccard coefficient, and both the thickness and color of the side can be switched. The symbiosis network is divided into multiple parts that can be reached via edges. When displaying the screen containing the symbiotic network, the multiple nodes contained in each part can also be displayed in the colors assigned to each part. Furthermore, the positions of nodes and edges included in the symbiosis network are meaningless.

其次,指示輸入部11自利用者接收指定關注詞之指示(步驟S111)。於執行步驟S111時,顯示包含整體共生網路之畫面。 利用者藉由對滑鼠29進行操作,選擇整體共生網路之要素,而輸入指定關注詞之指示。再者,利用者於輸入指示時,亦可代替滑鼠29而使用鍵盤28,亦可進行直接觸摸顯示畫面等操作。以下,於執行步驟S111時,顯示包含圖5所示之視窗41之畫面。 Next, the instruction input unit 11 receives an instruction to specify the word of interest from the user (step S111). When step S111 is executed, a screen including the overall symbiotic network is displayed. The user selects the elements of the overall symbiosis network by operating the mouse 29, and inputs instructions for specifying the word of interest. Furthermore, when inputting instructions, the user may use the keyboard 28 instead of the mouse 29, and may also perform operations such as directly touching the display screen. Hereinafter, when step S111 is executed, a screen including the window 41 shown in FIG. 5 is displayed.

圖6~圖11係分別表示於視窗41內指定關注詞之第1~第6操作之圖。於圖6~圖11中,動作指示框表示操作之順序,白色箭頭表示滑鼠游標62之移動。動作指示框及箭頭於實際之畫面中未顯示。以下,將滑鼠游標62處於顯示畫面內之某要素之上時點選(雙擊)滑鼠29之按鈕稱為「點選(雙擊)要素」。 6 to 11 are diagrams respectively showing the first to sixth operations of specifying the word of interest in the window 41. In FIG. 6 to FIG. 11, the action instruction frame indicates the sequence of operations, and the white arrow indicates the movement of the mouse cursor 62. The action indicator frame and arrow are not displayed in the actual screen. Hereinafter, the button of clicking (double-clicking) the mouse 29 when the mouse cursor 62 is on an element in the display screen is referred to as "clicking (double-clicking) an element".

如圖6所示,利用者於視窗41內首先點選與指定為關注詞之單詞(此處為「露天浴池」)對應之節點(第一次之點選),其次點選分析按鈕61(第二次之點選)。藉由該操作,將與第一次點選之節點對應之單詞指定為關注詞。如此於包含整體共生網路之畫面內選擇整體共生網路中所包含之1個節點,並選擇分析開始,藉此輸入指定1個關注詞之指示。 As shown in Figure 6, the user first clicks the node (the first click) corresponding to the word designated as the word of interest (here, "open-air bath") in the window 41, and then clicks the analysis button 61 ( Click the second time). With this operation, the word corresponding to the node clicked for the first time is designated as the word of interest. In this way, select a node included in the overall symbiosis network in the screen including the overall symbiosis network, and select the start of analysis, thereby inputting instructions for specifying a word of interest.

如圖7所示,利用者於視窗41內雙擊與指定為關注詞之單詞(此處為「露天浴池」)對應之節點。藉由該操作,將與雙擊之節點對應之單詞指定為關注詞。如此藉由於包含整體共生網路之畫面內繼續選擇整體共生網路中所包含之1個節點,而輸入指定1個關注詞之指示。 As shown in FIG. 7, the user double-clicks the node corresponding to the word designated as the word of interest (here, "open-air bath") in the window 41. With this operation, the word corresponding to the double-clicked node is designated as the word of interest. In this way, by continuing to select one node included in the overall symbiosis network in the screen containing the overall symbiosis network, and input an instruction to specify a word of interest.

如圖8所示,利用者於視窗41內首先點選與指定為關注詞之單詞(此處為「露天浴池」)對應之節點(第一次之點選),其次點選與指定為關注詞之另一個單詞(此處為「價格」)對應之節點(第二次之點選),最後點選分析按鈕61(最後之點選)。藉由該操作, 將與第一次及第二次點選之節點對應之2個單詞指定為關注詞。利用者亦可於視窗41內依次點選p個(p為3以上之整數)節點,最後點選分析按鈕61。藉由該操作,將與p個節點對應之p個單詞指定為關注詞。如此藉由於包含整體共生網路之畫面內選擇整體共生網路中所包含之複數個節點,並選擇分析開始,而輸入指定複數個關注詞之指示。 As shown in Figure 8, the user first clicks on the node (the first click) corresponding to the word designated as the word of interest (here, "open-air bath") in the window 41, and then clicks and designates as the focus Another word of the word (here, "price") corresponds to the node (the second click), and finally click the analysis button 61 (the last click). Through this operation, two words corresponding to the nodes clicked the first time and the second time are designated as the words of interest. The user can also click p (p is an integer greater than 3) nodes in the window 41, and finally click the analysis button 61. Through this operation, p words corresponding to p nodes are designated as words of interest. In this way, by selecting a plurality of nodes included in the overall symbiosis network in the screen containing the overall symbiosis network, and selecting the start of analysis, input instructions for specifying a plurality of words of interest.

如圖9所示,利用者於視窗41內雙擊將與指定為關注詞之2個單詞(此處為「露天浴池」與「階梯」)對應之2個節點連接之邊。藉此,將與連接於雙擊之邊之2個節點對應之2個單詞指定為關注詞。如此藉由於包含整體共生網路之畫面內繼續選擇整體共生網路中所包含之1個邊,而輸入指定2個關注詞之指示。 As shown in FIG. 9, the user double-clicks in the window 41 to connect the two nodes corresponding to the two words designated as the word of interest (here, "open-air bath" and "stairway"). In this way, two words corresponding to the two nodes connected to the side of the double-click are designated as the words of interest. In this way, by continuing to select an edge included in the overall symbiosis network in the screen containing the overall symbiosis network, and input instructions for specifying 2 words of interest.

如圖10所示,利用者於視窗41內首先點選將與指定為關注詞之2個單詞(此處為「露天浴池」與「階梯」)對應之2個節點連接之邊(第一次之點選),其次點選分析按鈕61(第二次之點選)。藉此,將與連接於第一次點選之邊之2個節點對應之2個單詞指定為關注詞。如此藉由於包含整體共生網路之畫面內選擇整體共生網路中所包含之1個邊,並選擇分析開始,而輸入指定2個關注詞之指示。 As shown in Figure 10, the user first clicks in the window 41 to connect the two nodes corresponding to the two words designated as the word of interest (here, "open-air bath" and "stairway") (the first time Click), and then click the analysis button 61 (click for the second time). In this way, two words corresponding to the two nodes connected to the edge clicked for the first time are designated as the words of interest. In this way, by selecting one side included in the overall symbiosis network in the screen including the overall symbiosis network, and selecting the start of analysis, input instructions for specifying 2 words of interest.

如圖11所示,利用者於視窗41內首先點選將與指定為關注詞之2個單詞(此處為「露天浴池」與「階梯」)對應之2個節點連接之邊(第一次之點選),其次點選將與指定為關注詞之另2個單詞(此處為「價格」與「考慮」)對應之2個節點連接之邊(第二次之點選),最後點選分析按鈕61(最後之點選)。藉由該操作,將與連接於第一次與第二次點選之2個邊之4個節點對應之4個單詞指定為關注詞。利用者亦可於視窗41內依次點選q條(q為3以上之整數)邊,最後 點選分析按鈕61。藉由該操作,將與連接於q條邊之2q個節點對應之2q個單詞指定為關注詞。如此藉由於包含整體共生網路之畫面內選擇整體共生網路中所包含之複數條邊,並選擇分析開始,而輸入指定複數個關注詞之指示。 As shown in Figure 11, the user first clicks in the window 41 the edge that connects the two nodes corresponding to the two words designated as the word of interest (here, "open-air bath" and "stairway") (first time Click on), then click on the edge that connects the 2 nodes corresponding to the other 2 words (here, "price" and "consideration") designated as the word of interest (the second click), and finally click Select the analysis button 61 (the last click). Through this operation, 4 words corresponding to 4 nodes connected to the 2 edges of the first and second clicks are designated as the words of interest. The user can also click on the q (q is an integer greater than 3) side in the window 41, and finally click the analysis button 61. By this operation, 2q words corresponding to 2q nodes connected to q edges are designated as the words of interest. In this way, by selecting a plurality of edges included in the overall symbiosis network in the screen containing the overall symbiosis network, and selecting the start of analysis, input instructions for specifying a plurality of words of interest.

指示輸入部11於步驟S111中,除了接收指定關注詞之指示以外,亦可接收設定共生矩陣之基準值之指示、切換AND處理與OR處理之指示、設定共生網路之顯示態樣之詳細情況之指示等。所接收之指示被輸出至文字探勘裝置10之各部。 In step S111, the instruction input unit 11, in addition to receiving instructions for specifying the word of interest, it can also receive instructions for setting the reference value of the symbiosis matrix, instructions for switching between AND processing and OR processing, and setting details of the symbiosis network display mode The instructions etc. The received instructions are output to each part of the text exploration device 10.

其次,單詞擷取部13藉由自步驟S102中讀出之文字資料擷取包含步驟S111中指定之關注詞之句子,而求出包括含關注詞之句子之限定文字資料(步驟S112)。 Next, the word extracting unit 13 extracts the sentence containing the word of interest specified in step S111 from the text data read out in step S102, and obtains the limited text data of the sentence containing the word of interest (step S112).

單詞擷取部13具有於指定複數個關注詞之情況下表示進行AND處理與OR處理中哪一者之旗標。旗標之值可為預先決定之值,亦可為使用指示輸入部11由利用者設定之值。於旗標表示AND處理之情況下,單詞擷取部13藉由自所讀出之文字資料擷取包含所指定之複數個關注詞之全部之句子,而求出限定文字資料。於旗標表示OR處理之情況下,單詞擷取部13藉由自所讀出之文字資料擷取包含所指定之任一個關注詞之句子,而求出限定文字資料。 The word extraction unit 13 has a flag indicating which of the AND processing and the OR processing is performed when a plurality of words of interest are designated. The value of the flag may be a predetermined value, or may be a value set by the user using the instruction input unit 11. In the case where the flag indicates AND processing, the word extraction unit 13 extracts all sentences including the designated plural attention words from the read text data to obtain the limited text data. In the case where the flag indicates OR processing, the word extraction unit 13 extracts a sentence containing any specified word of interest from the read text data to obtain the limited text data.

其次,單詞擷取部13藉由對步驟S112中求出之限定文字資料進行詞素解析,而自限定文字資料擷取單詞(步驟S113)。其次,共生矩陣生成部14對步驟S113中擷取之單詞,使用步驟S112中求出之限定文字資料生成共生矩陣(步驟S114)。其次,共生網路生成部15基於步驟S114中生成之共生矩陣,生成限定共生網路(步驟S115)。再者,於步驟S103~S105與步驟S113~S115之間,處理對象 不同,但是處理內容相同。 Next, the word extraction unit 13 extracts words from the limited text data by performing morphological analysis on the limited text data obtained in step S112 (step S113). Next, the co-occurrence matrix generation unit 14 uses the limited character data obtained in step S112 to generate a co-occurrence matrix for the words extracted in step S113 (step S114). Next, the symbiosis network generation unit 15 generates a limited symbiosis network based on the symbiosis matrix generated in step S114 (step S115). Furthermore, between steps S103 to S105 and steps S113 to S115, the processing objects are different, but the processing content is the same.

一般而言,自步驟S112中求出之限定文字資料擷取之單詞之種類較自步驟S102中讀出之文字資料擷取之單詞之種類少。步驟S114中生成之共生矩陣與步驟S104中生成之共生矩陣不同。步驟S115中生成之限定共生網路與步驟S105中生成之整體共生網路不同。 Generally speaking, the types of words extracted from the limited text data obtained in step S112 are less than the types of words extracted from the text data read out in step S102. The co-occurrence matrix generated in step S114 is different from the co-occurrence matrix generated in step S104. The limited symbiosis network generated in step S115 is different from the overall symbiosis network generated in step S105.

其次,畫面顯示部16顯示包含步驟S115中生成之限定共生網路之畫面(步驟S116)。圖12及圖13係表示步驟S116中顯示之包含限定共生網路之視窗之例的圖。圖12所示之視窗42包含指定1個關注詞(此處為「露天浴池」)時之限定共生網路52。圖13所示之視窗43包含指定2個關注詞(此處為「露天浴池」與「浴場」)時之限定共生網路53。 Next, the screen display unit 16 displays a screen including the limited symbiotic network generated in step S115 (step S116). FIGS. 12 and 13 are diagrams showing examples of windows including a limited symbiotic network displayed in step S116. The window 42 shown in FIG. 12 includes a limited symbiosis network 52 when one word of interest (here, "open-air bath") is specified. The window 43 shown in FIG. 13 includes a limited symbiosis network 53 when two attention words (here, "open-air bath" and "bath") are specified.

圖14及圖15係表示文字探勘裝置10之顯示畫面之例之圖。畫面顯示部16可將包含整體共生網路之視窗與包含限定共生網路之視窗不重疊地並排顯示,亦可將兩者重疊顯示。於圖14所示之畫面71中,包含整體共生網路51之視窗41與包含限定共生網路52之視窗42不重疊地並排顯示。利用者於畫面71中,可同時觀察整體共生網路51與限定共生網路52。於圖15所示之畫面72中,包含限定共生網路52之視窗42與包含整體共生網路51之視窗41重疊顯示。利用者於畫面72中,可切換觀察整體共生網路51與限定共生網路52。 14 and 15 are diagrams showing examples of the display screen of the character exploration device 10. The screen display unit 16 can display the windows including the overall symbiosis network and the windows including the limited symbiosis network side by side without overlapping, or display the two overlapped. In the screen 71 shown in FIG. 14, the window 41 including the overall symbiosis network 51 and the window 42 including the limited symbiosis network 52 are displayed side by side without overlapping. The user can observe the overall symbiosis network 51 and the limited symbiosis network 52 in the screen 71 at the same time. In the screen 72 shown in FIG. 15, the window 42 including the limited symbiosis network 52 and the window 41 including the overall symbiosis network 51 are overlapped and displayed. The user can switch to observe the overall symbiosis network 51 and the limited symbiosis network 52 in the screen 72.

其次,指示輸入部11自利用者接收指示(步驟S121)。其次,文字探勘裝置10判斷步驟S121中接收之指示是否為指定關注詞之指示(步驟S122)。於步驟S122中為Yes(是)之情況下,文字探勘裝置10之控制朝步驟S112前進。於該情況下,對步驟S121中指定之 關注詞執行步驟S112~S116,顯示包含基於包括含步驟S121中指定之關注詞之句子之限定文字資料之限定共生網路之畫面。 Next, the instruction input unit 11 receives an instruction from the user (step S121). Next, the text mining device 10 determines whether the instruction received in step S121 is an instruction to specify the word of interest (step S122). In the case of Yes in step S122, the control of the character exploration device 10 proceeds to step S112. In this case, steps S112 to S116 are performed on the word of interest specified in step S121, and a screen containing a limited symbiosis network based on the limited text data including the sentence containing the word of interest specified in step S121 is displayed.

圖16係表示文字探勘裝置10之顯示畫面之例之圖。於圖16所示之畫面73中,包含整體共生網路51之視窗41與包含限定共生網路52之視窗42重疊,顯示有包含指定「浴場」作為關注詞時之限定共生網路54之視窗44。畫面73係於在步驟S111中將「露天浴池」指定為關注詞、且於步驟S121中將「浴場」指定為關注詞時顯示。利用者於畫面73中,可切換觀察整體共生網路51與限定共生網路52、54。 FIG. 16 is a diagram showing an example of the display screen of the character exploration device 10. In the screen 73 shown in Fig. 16, the window 41 including the overall symbiosis network 51 overlaps the window 42 including the limited symbiosis network 52, and the window including the limited symbiosis network 54 when the word "bath" is designated as the word of interest is displayed. 44. The screen 73 is displayed when "open-air bath" is designated as the word of interest in step S111, and "bath" is designated as the word of interest in step S121. On the screen 73, the user can switch to observe the overall symbiosis network 51 and the limited symbiosis network 52, 54.

於步驟S122中為No(否)之情況下,文字探勘裝置10之控制朝步驟S123前進。於該情況下,步驟S121中接收之指示例如為使視窗移動之指示、將視窗設為非顯示之指示、關閉視窗之指示、合併視窗之指示等。利用者藉由於顯示包含整體共生網路與限定共生網路之畫面時對指示輸入部11進行操作,而輸入該等指示。畫面顯示部16根據步驟S121中接收之指示,顯示更新後之畫面(步驟S123)。然後,文字探勘裝置10之控制朝步驟S121前進。 In the case of No in step S122, the control of the character exploration device 10 proceeds to step S123. In this case, the instruction received in step S121 is, for example, an instruction to move the window, an instruction to set the window to non-display, an instruction to close the window, an instruction to merge windows, etc. The user inputs the instructions by operating the instruction input unit 11 when displaying the screen including the overall symbiosis network and the limited symbiosis network. The screen display unit 16 displays the updated screen according to the instruction received in step S121 (step S123). Then, the control of the character surveying apparatus 10 proceeds to step S121.

圖17係表示合併視窗之操作之圖。於圖17所示之畫面74,顯示包含將「露天浴池」指定為關注詞時之限定共生網路52之視窗42與包含將「浴場」指定為關注詞時之限定共生網路54之視窗44。利用者於畫面74中,可同時觀察2個限定共生網路52、54。 Figure 17 is a diagram showing the operation of merging windows. The screen 74 shown in Fig. 17 shows a window 42 containing a limited symbiosis network 52 when "open-air bath" is designated as the word of interest and a window 44 containing a limited symbiosis network 54 when "bath" is designated as the word of interest . The user can observe two limited symbiotic networks 52 and 54 on the screen 74 at the same time.

圖17所示之附有陰影線之箭頭表示於按壓滑鼠29之按鈕之狀態下滑鼠游標62移動之情形。該箭頭於實際之畫面中未顯示。利用者進行於畫面74內點住限定共生網路52並於限定共生網路54內鬆開之操作(拖放操作)。更詳細而言,利用者於滑鼠游標62處 於視窗42內時按壓滑鼠29之按鈕,並保持按壓滑鼠29之按鈕之狀態使滑鼠游標62移動至視窗44內,於滑鼠游標62處於視窗44內時鬆開滑鼠29之按鈕。藉由該操作,而輸入合併視窗之指示。 The hatched arrow shown in FIG. 17 represents the movement of the mouse cursor 62 when the button of the mouse 29 is pressed. The arrow is not displayed in the actual screen. The user performs an operation (drag and drop operation) of tapping the limited symbiosis network 52 in the screen 74 and releasing it in the limited symbiosis network 54. In more detail, the user presses the button of the mouse 29 when the mouse cursor 62 is in the window 42 and keeps the state of pressing the button of the mouse 29 to move the mouse cursor 62 into the window 44 on the mouse cursor 62 Release the button of the mouse 29 when you are in the window 44. With this operation, enter the instructions for merging windows.

圖18係表示進行圖17所示之操作之後之顯示畫面的圖。於圖18所示之畫面75,顯示有將複數個限定共生網路以標籤形式顯示之視窗45。於圖18中,選擇記載為「露天浴池」之標籤64,於視窗45顯示將「露天浴池」指定為關注詞時之限定共生網路52。於選擇記載為「浴場」之標籤63時,於視窗45顯示圖17所示之限定共生網路54。 Fig. 18 is a diagram showing a display screen after the operation shown in Fig. 17 is performed. In the screen 75 shown in FIG. 18, a window 45 for displaying a plurality of limited symbiotic networks in the form of labels is displayed. In FIG. 18, the label 64 recorded as "open-air bath" is selected, and the limited symbiosis network 52 when "open-air bath" is designated as the word of interest is displayed in the window 45. When the label 63 recorded as "bath" is selected, the limited symbiosis network 54 shown in FIG. 17 is displayed in the window 45.

於利用者點選視窗45內之關閉按鈕(×標記)時,視窗45關閉。於利用者點選標籤63內之關閉按鈕時,標籤63成為不顯示。於利用者點選標籤64內之關閉按鈕時,標籤64成為不顯示,於視窗45顯示限定共生網路54。 When the user clicks the close button (x mark) in the window 45, the window 45 is closed. When the user clicks the close button in the label 63, the label 63 is not displayed. When the user clicks the close button in the label 64, the label 64 is not displayed, and the limited symbiosis network 54 is displayed in the window 45.

如以上所示,本實施形態之文字探勘方法具備:自文字資料擷取單詞之步驟(步驟S102、S103、S112、S113)、對所擷取之單詞生成共生矩陣之步驟(步驟S104、S114)、基於所生成之共生矩陣生成共生網路之步驟(步驟S105、S115)、及顯示包含共生網路之畫面之步驟(步驟S106、S116)。於在包含基於所指定之文字資料之整體之第1共生網路(整體共生網路51)之第1畫面(包含視窗41之畫面)內輸入指定關注詞之指示時,擷取單詞之步驟(步驟S112、S113)係自包括所指定之文字資料中含關注詞之部分(含關注詞之句子)之限定文字資料擷取單詞,生成共生矩陣之步驟(步驟S114)係對所擷取之單詞使用限定文字資料生成第2共生矩陣,生成共生網路之步驟(步驟S115)係基於第2共生矩陣生成第2共生網路(限定共生 網路52~54),顯示畫面之步驟(步驟S116)係顯示包含第2共生網路之第2畫面(包含視窗42~45之畫面)。如此,於本實施形態之文字探勘方法中,於在包含基於所指定之文字資料之整體之第1共生網路之第1畫面內輸入指定關注詞之指示時,顯示包含基於所指定之文字資料中含關注詞之部分之第2共生網路之第2畫面。因此,可藉由簡單之操作顯示包含指定關注詞時之共生網路之畫面。 As shown above, the text mining method of this embodiment includes the steps of extracting words from text data (steps S102, S103, S112, S113), and the steps of generating a co-occurrence matrix for the extracted words (steps S104, S114) , Steps of generating a symbiosis network based on the generated symbiosis matrix (steps S105, S115), and displaying a screen containing the symbiosis network (steps S106, S116). When inputting the instruction of the designated word of interest in the first screen (the screen containing the window 41) of the first symbiosis network (the overall symbiosis network 51) based on the specified text data, the steps to capture the word ( Steps S112, S113) are to extract words from the limited text data that includes the part of the word of interest (sentences containing the word of interest) in the specified text data, and the step of generating a co-occurrence matrix (step S114) is to extract the words Use the limited text data to generate the second symbiosis matrix, the step of generating the symbiosis network (step S115) is the step of generating the second symbiosis network (limited symbiosis network 52~54) based on the second symbiosis matrix, and displaying the screen (step S116) It displays the second screen including the second symbiotic network (including the screens of windows 42~45). In this way, in the text exploration method of this embodiment, when inputting the instruction of the specified word of interest in the first screen of the first symbiosis network including the entirety of the specified text data, the display contains the specified text data The second screen of the second symbiosis network in the part containing the word of interest. Therefore, it is possible to display the screen of the symbiosis network including the specified word of interest with a simple operation.

又,藉由於第1畫面內選擇第1共生網路中所包含之1個或複數個節點,並選擇分析開始,而輸入將與節點對應之單詞指定為關注詞之指示(圖6、圖8)。如此藉由於第1畫面內選擇1個或複數個節點以及分析開始,可藉由簡單之操作輸入指定1個或複數個關注詞之指示,並顯示包含指定1個或複數個關注詞時之共生網路之畫面。又,藉由於第1畫面內繼續選擇第1共生網路中所包含之1個節點,而輸入將與節點對應之單詞指定為關注詞之指示(圖7)。如此藉由於第1畫面內繼續選擇1個節點,可藉由簡單之操作輸入指定1個關注詞之指示,並顯示包含指定1個關注詞時之共生網路之畫面。 In addition, by selecting one or more nodes included in the first symbiosis network in the first screen, and selecting the start of analysis, input the instruction to designate the word corresponding to the node as the word of interest (Figure 6, Figure 8) ). In this way, by selecting one or more nodes in the first screen and starting the analysis, you can input instructions for specifying one or more words of interest by simple operations, and display the symbiosis when specifying one or more words of interest. The screen of the network. In addition, by continuing to select a node included in the first symbiosis network in the first screen, input the instruction to designate the word corresponding to the node as the word of interest (Figure 7). In this way, by continuing to select a node in the first screen, an instruction to specify a word of interest can be input by a simple operation, and a screen containing the symbiosis network when a word of interest is specified is displayed.

又,藉由於第1畫面內繼續選擇第1共生網路中所包含之1條邊,而輸入將與連接於邊之2個節點對應之單詞指定為關注詞之指示(圖9)。如此藉由於第1畫面內繼續選擇1條邊,可藉由簡單之操作輸入指定2個關注詞之指示,並顯示包含指定2個關注詞時之共生網路之畫面。又,藉由於第1畫面內選擇第1共生網路中所包含之1條或複數條邊,並選擇分析開始,而輸入將與連接於邊之複數個節點對應之單詞指定為關注詞之指示(圖10、圖11)。如此藉由於第1畫面內選擇1條或複數條邊以及分析開始,可藉由簡單之操作輸入 指定複數個關注詞之指示,並顯示包含指定複數個關注詞時之共生網路之畫面。 In addition, by continuing to select an edge included in the first symbiosis network in the first screen, inputting the words corresponding to the two nodes connected to the edge is an instruction to designate the word of interest (Figure 9). In this way, by continuing to select an edge in the first screen, an instruction to specify two words of interest can be input by a simple operation, and a screen containing the symbiosis network when two words of interest are specified is displayed. In addition, by selecting one or more edges included in the first symbiosis network in the first screen, and selecting the start of analysis, input the instruction to designate the words corresponding to the plural nodes connected to the edges as the words of interest ( Figure 10, Figure 11). In this way, by selecting one or more edges in the first screen and starting the analysis, you can input instructions for specifying multiple words of interest by simple operations, and display the screen containing the symbiosis network when specifying multiple words of interest.

又,於在包含複數個第2共生網路(限定共生網路52、54)之第2畫面(畫面74)內輸入合併指示時(圖17),顯示畫面之步驟將複數個第2共生網路以標籤形式顯示(圖18)。藉此,可精簡地顯示複數個第2共生網路。又,藉由於第2畫面內點住一個第2共生網路(限定共生網路52)並於另一個第2共生網路(限定共生網路54)內鬆開,而輸入合併指示。因此,可藉由簡單之操作輸入合併指示,並精簡地顯示複數個第2共生網路。 In addition, when the merge instruction is entered in the second screen (screen 74) that contains a plurality of second symbiosis networks (limited symbiosis networks 52, 54) (Figure 17), the steps for displaying the screen will be the plural second symbiosis networks The road is displayed as a label (Figure 18). In this way, a plurality of second symbiotic networks can be displayed concisely. In addition, by tapping a second symbiosis network (limited symbiosis network 52) on the second screen and releasing it in another second symbiosis network (limited symbiosis network 54), input the merge instruction. Therefore, it is possible to input a merge instruction with a simple operation, and to display a plurality of second symbiotic networks in a simplified manner.

限定文字資料亦可由所指定之文字資料中含關注詞之句子所構成。於該情況下,於輸入指定關注詞之指示時,可將所指定之文字資料以句子單位劃分而求出限定文字資料,顯示包含基於所求出之限定文字資料之第2共生網路之畫面。指定複數個關注詞時之限定文字資料亦可由所指定之文字資料中含複數個關注詞之全部之句子所構成。於該情況下,可顯示包含對複數個關注詞進行AND處理時之第2共生網路之畫面。指定複數個關注詞時之限定文字資料亦可由所指定之文字資料中含複數個關注詞之任一者之句子所構成。於該情況下,可顯示包含對複數個關注詞進行OR處理時之第2共生網路之畫面。又,生成共生矩陣之步驟係生成以Jaccard係數作為要素之共生矩陣。因此,可較佳地分析文字資料中所包含之單詞之共生性。 The restricted text data can also be composed of sentences containing the word of interest in the specified text data. In this case, when inputting instructions for specifying the word of interest, the specified text data can be divided into sentence units to obtain the limited text data, and the screen containing the second symbiosis network based on the obtained limited text data is displayed . The limited text data when specifying a plurality of words of interest can also be composed of all sentences containing the plurality of words of interest in the specified text data. In this case, it is possible to display the screen containing the second symbiosis network when AND processing plural words of interest. The limited text data when specifying a plurality of words of interest can also be composed of sentences containing any of the plurality of words of interest in the specified text data. In this case, it is possible to display the screen containing the second symbiosis network when OR processing a plurality of words of interest. In addition, the step of generating a co-occurrence matrix is to generate a co-occurrence matrix with Jaccard coefficients as elements. Therefore, the symbiosis of words contained in the text data can be better analyzed.

本實施形態之文字探勘裝置10及文字探勘程式31具有與上述文字探勘方法相同之特徵,發揮相同之效果。根據本實施形態之文字探勘方法、文字探勘裝置10及文字探勘程式31,可藉由 簡單之操作顯示包含指定關注詞時之共生網路之畫面。 The character exploration device 10 and the character exploration program 31 of this embodiment have the same characteristics as the above-mentioned character exploration method, and exert the same effects. According to the text mining method, text mining device 10, and text mining program 31 of the present embodiment, the screen of the symbiosis network including the specified word of interest can be displayed with a simple operation.

以上對本發明詳細地進行了說明,但以上之說明於所有方面均為例示性者而並非限制性者。應當瞭解的是,可於不脫離本發明之範圍內提出多種其他變更或變形。 The present invention has been described in detail above, but the above description is illustrative in all aspects and not restrictive. It should be understood that many other changes or modifications can be made without departing from the scope of the present invention.

Claims (20)

一種文字探勘方法,其係顯示包含文字資料之分析結果之畫面且由電腦所執行者;其特徵在於,其具備如下步驟:自文字資料擷取單詞之步驟;對上述單詞生成共生矩陣之步驟;基於上述共生矩陣生成共生網路之步驟;及顯示包含上述共生網路之畫面之步驟;於在包含基於所指定之文字資料之整體之第1共生網路之第1畫面內輸入指定關注詞之指示時,擷取上述單詞之步驟係自包括上述所指定之文字資料中含上述關注詞之部分之限定文字資料擷取上述單詞,生成上述共生矩陣之步驟係對上述單詞使用上述限定文字資料生成第2共生矩陣,生成上述共生網路之步驟係基於上述第2共生矩陣生成第2共生網路,顯示上述畫面之步驟係顯示包含上述第2共生網路之第2畫面。 A text exploration method, which displays a screen containing the analysis results of text data and is executed by a computer; it is characterized in that it has the following steps: a step of extracting words from text data; a step of generating a co-occurrence matrix for the words; The step of generating a symbiosis network based on the aforementioned symbiosis matrix; and the step of displaying a screen containing the aforementioned symbiosis network; entering the specified word of interest in the first screen including the first symbiosis network based on the entirety of the specified text data When instructed, the step of extracting the above word is to extract the word from the limited text data that includes the part of the word of interest in the specified text data, and the step of generating the co-occurrence matrix is to generate the above word using the limited text data In the second symbiosis matrix, the step of generating the symbiosis network is based on the second symbiosis matrix to generate a second symbiosis network, and the step of displaying the screen is to display the second screen including the second symbiosis network. 如請求項1之文字探勘方法,其中,藉由於上述第1畫面內選擇上述第1共生網路中所包含之1個或複數個節點,並選擇分析開始,而輸入將與上述節點對應之單詞指定為上述關注詞之指示。 Such as the text exploration method of claim 1, wherein, by selecting one or more nodes included in the first symbiosis network in the first screen, and selecting the start of analysis, input the word corresponding to the node Designated as an indication of the above words of interest. 如請求項1之文字探勘方法,其中,藉由於上述第1畫面內繼續選擇上述第1共生網路中所包含之1個節點,而輸入將與上述節點對應之單詞指定為上述關注詞之指示。 Such as the text exploration method of claim 1, wherein, by continuing to select a node included in the first symbiosis network in the first screen, and inputting the instruction to designate the word corresponding to the node as the word of interest . 如請求項1之文字探勘方法,其中,藉由於上述第1畫面內繼續選擇上述第1共生網路中所包含之1條邊,而輸入將與連接於上述邊之2個節點對應之單詞指定為上述關注詞之指示。 Such as the text exploration method of claim 1, wherein, by continuing to select an edge included in the first symbiosis network in the first screen, and inputting the word corresponding to the two nodes connected to the edge is designated as Instructions for the words of interest above. 如請求項1之文字探勘方法,其中,藉由於上述第1畫面內選擇 上述第1共生網路中所包含之1條或複數條邊,並選擇分析開始,而輸入將與連接於上述邊之複數個節點對應之單詞指定為上述關注詞之指示。 Such as the text exploration method of claim 1, wherein, by selecting in the first screen above One or more edges included in the first symbiosis network, select the analysis to start, and input the words corresponding to the plurality of nodes connected to the edges as instructions to designate the words of interest. 如請求項1之文字探勘方法,其中,於包含複數個第2共生網路之第2畫面內輸入合併指示時,顯示上述畫面之步驟係將上述複數個第2共生網路以標籤形式顯示。 For example, the text exploration method of claim 1, wherein, when a merge instruction is entered in a second screen containing a plurality of second symbiosis networks, the step of displaying the above screen is to display the plurality of second symbiosis networks in the form of labels. 如請求項6之文字探勘方法,其中,藉由於上述第2畫面內點住一個第2共生網路並於另一個第2共生網路內鬆開,而輸入上述合併指示。 For example, the text exploration method of claim 6, wherein, by clicking on a second symbiosis network in the second screen and releasing it in another second symbiosis network, the merge instruction is input. 如請求項1之文字探勘方法,其中,上述限定文字資料包括上述指定之文字資料中含上述關注詞之句子。 Such as the text exploration method of claim 1, wherein the above-mentioned limited text data includes the sentence containing the above-mentioned word of interest in the above-mentioned designated text data. 如請求項8之文字探勘方法,其中,指定複數個關注詞時之上述限定文字資料包括上述指定之文字資料中含上述複數個關注詞之全部之句子。 Such as the text exploration method of claim 8, wherein the above-mentioned limited text data when specifying a plurality of words of interest include all sentences in the specified text data that contain the plurality of words of interest. 如請求項8之文字探勘方法,其中,指定複數個關注詞時之上述限定文字資料包括上述指定之文字資料中含上述複數個關注詞之任一者之句子。 For example, the text exploration method of claim 8, wherein the above-mentioned limited text data when a plurality of words of interest are designated includes a sentence containing any one of the above-mentioned plural words of interest in the designated text data. 如請求項1之文字探勘方法,其中,生成上述共生矩陣之步驟係生成以Jaccard係數作為要素之共生矩陣。 Such as the text exploration method of claim 1, wherein the step of generating the above-mentioned co-occurrence matrix is to generate a co-occurrence matrix with Jaccard coefficient as an element. 一種記錄有文字探勘程式之記錄媒體,該文字探勘程式係用以顯示包含文字資料之分析結果之畫面者;其特徵在於,由CPU利用記憶體而使電腦執行如下步驟:自文字資料擷取單詞之步驟;對上述單詞生成共生矩陣之步驟; 基於上述共生矩陣生成共生網路之步驟;及顯示包含上述共生網路之畫面之步驟;於在包含基於所指定之文字資料之整體之第1共生網路之第1畫面內輸入指定關注詞之指示時,擷取上述單詞之步驟係自包括上述所指定之文字資料中含上述關注詞之部分之限定文字資料擷取上述單詞,生成上述共生矩陣之步驟係對上述單詞使用上述限定文字資料生成第2共生矩陣,生成上述共生網路之步驟係基於上述第2共生矩陣生成第2共生網路,顯示上述畫面之步驟係顯示包含上述第2共生網路之第2畫面。 A recording medium recorded with a text exploration program, the text exploration program is used to display the screen containing the analysis result of the text data; it is characterized in that the CPU uses the memory to make the computer execute the following steps: extract words from the text data Steps; Steps to generate a co-occurrence matrix for the above words; The step of generating a symbiosis network based on the aforementioned symbiosis matrix; and the step of displaying a screen containing the aforementioned symbiosis network; entering the specified word of interest in the first screen including the first symbiosis network based on the entirety of the specified text data When instructed, the step of extracting the above word is to extract the word from the limited text data that includes the part of the word of interest in the specified text data, and the step of generating the co-occurrence matrix is to generate the above word using the limited text data In the second symbiosis matrix, the step of generating the symbiosis network is based on the second symbiosis matrix to generate a second symbiosis network, and the step of displaying the screen is to display the second screen including the second symbiosis network. 如請求項12之記錄有文字探勘程式之記錄媒體,其中,藉由於上述第1畫面內選擇上述第1共生網路中所包含之1個或複數個節點,並選擇分析開始,而輸入將與上述節點對應之單詞指定為上述關注詞之指示。 For example, the recording medium of the text exploration program recorded in the request item 12, in which, by selecting one or more nodes included in the first symbiosis network in the first screen, and selecting the start of analysis, the input will be The word corresponding to the aforementioned node is designated as an indication of the aforementioned word of interest. 如請求項12之記錄有文字探勘程式之記錄媒體,其中,藉由於上述第1畫面內繼續選擇上述第1共生網路中所包含之1個節點,而輸入將與上述節點對應之單詞指定為上述關注詞之指示。 For example, the recording medium of the text exploration program recorded in the request item 12, wherein, by continuing to select a node included in the first symbiotic network in the first screen, and inputting the word corresponding to the node Instructions for the words of interest above. 如請求項12之記錄有文字探勘程式之記錄媒體,其中,藉由於上述第1畫面內繼續選擇上述第1共生網路中所包含之1條邊,而輸入將與連接於上述邊之2個節點對應之單詞指定為上述關注詞之指示。 For example, the recording medium of the text exploration program recorded in the request item 12, in which, by continuing to select one edge included in the first symbiosis network in the first screen above, the input will be connected to the two nodes connected to the above edge The corresponding word is designated as an indication of the above-mentioned word of interest. 如請求項12之記錄有文字探勘程式之記錄媒體,其中,藉由於上述第1畫面內選擇上述第1共生網路中所包含之1條或複數條邊,並選擇分析開始,而輸入將與連接於上述邊之複數個節點對應之單詞指定為上述關注詞之指示。 If the recording medium of the request item 12 is recorded with a text exploration program, the input will be connected to the input by selecting one or more edges included in the first symbiosis network in the first screen and selecting the start of analysis The words corresponding to the plural nodes of the aforementioned edges are designated as the indications of the aforementioned words of interest. 如請求項12之記錄有文字探勘程式之記錄媒體,其中,於在包含複數個第2共生網路之第2畫面內輸入合併指示時,顯示上述畫面之步驟係將上述複數個第2共生網路以標籤形式顯示。 For example, if the request item 12 records the recording medium of the text exploration program, in which, when the merge instruction is entered in the second screen containing a plurality of second symbiosis networks, the step of displaying the above screen is to convert the plurality of second symbiosis networks The road is displayed as a label. 如請求項17之記錄有文字探勘程式之記錄媒體,其中,藉由於上述第2畫面內點住一個第2共生網路並於另一個第2共生網路內鬆開,而輸入上述合併指示。 For example, the recording medium of the text exploration program recorded in the request item 17, wherein, by clicking on a second symbiosis network in the second screen and releasing it in another second symbiosis network, input the merge instruction. 一種文字探勘裝置,其係顯示包含文字資料之分析結果之畫面者;其特徵在於,其具備:單詞擷取部,其自文字資料擷取單詞;共生矩陣生成部,其對上述單詞生成共生矩陣;共生網路生成部,其基於上述共生矩陣生成共生網路;及畫面顯示部,其顯示包含上述共生網路之畫面;於在包含基於所指定之文字資料之整體之第1共生網路之第1畫面內輸入指定關注詞之指示時,上述單詞擷取部自包括上述指定之文字資料中含上述關注詞之部分之限定文字資料擷取上述單詞,上述共生矩陣生成部對上述單詞使用上述限定文字資料生成第2共生矩陣,上述共生網路生成部基於上述第2共生矩陣生成第2共生網路,上述畫面顯示部顯示包含上述第2共生網路之第2畫面。 A text exploration device that displays a screen containing the analysis result of text data; it is characterized in that it includes: a word extraction unit that extracts words from text data; a co-occurrence matrix generation unit that generates a co-occurrence matrix for the words ; The symbiosis network generation part, which generates the symbiosis network based on the above-mentioned symbiosis matrix; and the screen display part, which displays the screen containing the above-mentioned symbiosis network; in the first symbiosis network that contains the whole based on the specified text data When an instruction to specify a word of interest is input in the first screen, the word extraction unit extracts the word from the limited text data that includes the portion of the word of interest in the specified text data, and the co-occurrence matrix generation unit uses the word The text data is limited to generate a second symbiosis matrix, the symbiosis network generation unit generates a second symbiosis network based on the second symbiosis matrix, and the screen display unit displays a second screen including the second symbiosis network. 如請求項19之文字探勘裝置,其中,於在包含複數個第2共生網路之第2畫面內輸入合併指示時,上述畫面顯示部係將上述複數個第2共生網路以標籤形式顯示。 For example, in the text exploration device of claim 19, when a merge instruction is input in a second screen containing a plurality of second symbiosis networks, the screen display unit displays the plurality of second symbiosis networks in label form.
TW108106540A 2018-03-20 2019-02-26 Text exploration method, text exploration program and text exploration device TWI703457B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2018052074A JP6987003B2 (en) 2018-03-20 2018-03-20 Text mining methods, text mining programs, and text mining equipment
JP2018-052074 2018-03-20

Publications (2)

Publication Number Publication Date
TW201945958A TW201945958A (en) 2019-12-01
TWI703457B true TWI703457B (en) 2020-09-01

Family

ID=68065531

Family Applications (1)

Application Number Title Priority Date Filing Date
TW108106540A TWI703457B (en) 2018-03-20 2019-02-26 Text exploration method, text exploration program and text exploration device

Country Status (4)

Country Link
JP (1) JP6987003B2 (en)
KR (1) KR102162779B1 (en)
CN (1) CN110309290B (en)
TW (1) TWI703457B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWM523901U (en) * 2016-01-04 2016-06-11 信義房屋仲介股份有限公司 Search engine device for performing semantic keyword analysis
CN107193803A (en) * 2017-05-26 2017-09-22 北京东方科诺科技发展有限公司 A kind of particular task text key word extracting method based on semanteme
US20170337262A1 (en) * 2016-05-19 2017-11-23 Quid, Inc. Pivoting from a graph of semantic similarity of documents to a derivative graph of relationships between entities mentioned in the documents

Family Cites Families (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2806867B2 (en) * 1995-03-13 1998-09-30 株式会社トレンディ Document database construction method, display method, and display device
JPH10283367A (en) * 1997-04-09 1998-10-23 Mitsubishi Electric Corp Hypermedia device
JP4404323B2 (en) * 1999-02-05 2010-01-27 経済産業大臣 Thesaurus browsing system and method
JP5059282B2 (en) * 2003-10-14 2012-10-24 ソニー株式会社 Information providing system, information providing server, user terminal device, content display device, computer program, and content display method
JP2006215936A (en) * 2005-02-07 2006-08-17 Hitachi Ltd Search system and search method
JP2007193380A (en) * 2006-01-16 2007-08-02 So-Net Entertainment Corp Information processor, information processing method and computer program
JP5534167B2 (en) * 2009-12-16 2014-06-25 日本電気株式会社 Graph creation device, graph creation method, and graph creation program
JP5331723B2 (en) * 2010-02-05 2013-10-30 株式会社エヌ・ティ・ティ・データ Feature word extraction device, feature word extraction method, and feature word extraction program
US20120066628A1 (en) * 2010-09-09 2012-03-15 Microsoft Corporation Drag-able tabs
JP2014085992A (en) * 2012-10-26 2014-05-12 Hitachi Ltd Document recognition support device, document recognition support method and document recognition support program
JP5903376B2 (en) * 2012-12-11 2016-04-13 日本電信電話株式会社 Information recommendation device, information recommendation method, and information recommendation program
US9177104B2 (en) * 2013-03-29 2015-11-03 Case Western Reserve University Discriminatively weighted multi-scale local binary patterns
KR101512084B1 (en) * 2013-11-15 2015-04-17 한국과학기술원 Web search system for providing 3 dimensional web search interface based virtual reality and method thereof
JP6287192B2 (en) * 2013-12-26 2018-03-07 キヤノンマーケティングジャパン株式会社 Information processing apparatus, information processing method, and program
JP6364086B2 (en) * 2014-08-22 2018-07-25 株式会社日立製作所 Self-produced information processing system and method
JP6280859B2 (en) * 2014-11-20 2018-02-14 日本電信電話株式会社 Behavior network information extraction apparatus, behavior network information extraction method, and behavior network information extraction program
CN104375989A (en) * 2014-12-01 2015-02-25 国家电网公司 Natural language text keyword association network construction system
JP6524790B2 (en) * 2015-05-14 2019-06-05 富士ゼロックス株式会社 INFORMATION PROCESSING APPARATUS AND INFORMATION PROCESSING PROGRAM
WO2017061253A1 (en) * 2015-10-09 2017-04-13 アイビーリサーチ株式会社 Display control device, display control method, and display control program
CN107766318B (en) * 2016-08-17 2021-03-16 北京金山安全软件有限公司 Keyword extraction method and device and electronic equipment
CN107451120B (en) * 2017-08-01 2020-10-30 中国人民解放军火箭军工程大学 Content conflict detection method and system for open text information

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWM523901U (en) * 2016-01-04 2016-06-11 信義房屋仲介股份有限公司 Search engine device for performing semantic keyword analysis
US20170337262A1 (en) * 2016-05-19 2017-11-23 Quid, Inc. Pivoting from a graph of semantic similarity of documents to a derivative graph of relationships between entities mentioned in the documents
CN107193803A (en) * 2017-05-26 2017-09-22 北京东方科诺科技发展有限公司 A kind of particular task text key word extracting method based on semanteme

Also Published As

Publication number Publication date
KR102162779B1 (en) 2020-10-07
CN110309290A (en) 2019-10-08
CN110309290B (en) 2023-06-06
JP2019164593A (en) 2019-09-26
TW201945958A (en) 2019-12-01
JP6987003B2 (en) 2021-12-22
KR20190110428A (en) 2019-09-30

Similar Documents

Publication Publication Date Title
US8504348B2 (en) User simulation for viewing web analytics data
US9519698B1 (en) Visualization of graphical representations of log files
US10318646B2 (en) Generating a structured document guiding view
US10366154B2 (en) Information processing device, information processing method, and computer program product
US20220245556A1 (en) Data distillery for signal detection
US20060190684A1 (en) Reverse value attribute extraction
CA2677220A1 (en) Retrieval mechanism for web visit simulator
CN107077349A (en) Job creation with data preview
TW201807597A (en) Text mining method, text mining program, and text mining apparatus
TWI703457B (en) Text exploration method, text exploration program and text exploration device
KR101850853B1 (en) Method and apparatus of search using big data
CN112667517A (en) Method, device, equipment and storage medium for acquiring automatic test script
JP2017016294A (en) Information processing device, control method thereof, and program
CN114969315A (en) Intelligent crowdsourcing marking method and system in professional field
JP6675868B2 (en) Information processing apparatus, information processing method, and program
JP2006190147A (en) Device, method and program for displaying dependency
JP2006313483A (en) Content evaluation method
TWI736860B (en) Text exploration method, recording medium with text exploration program recorded, and text exploration device
JP2004185346A (en) Method and system for supporting project work
JP2021165892A (en) Information processing device, information processing method and program
Schleußinger et al. Evaluating a Visual Search Interface
Khobragade et al. Facebook Data Mining and Sentiment Analysis Using R Language.
JP4728878B2 (en) Time series analysis support system, time series analysis support method, and time series analysis support program
JP2007272517A (en) Micro-scenario data analysis system and micro scenario data analysis program
JP2005190404A (en) System, method and program for proposing learning course