TWI679545B - Domain knowledge base system and method in the same - Google Patents

Domain knowledge base system and method in the same Download PDF

Info

Publication number
TWI679545B
TWI679545B TW107128203A TW107128203A TWI679545B TW I679545 B TWI679545 B TW I679545B TW 107128203 A TW107128203 A TW 107128203A TW 107128203 A TW107128203 A TW 107128203A TW I679545 B TWI679545 B TW I679545B
Authority
TW
Taiwan
Prior art keywords
data
unit
sub
subunit
domain
Prior art date
Application number
TW107128203A
Other languages
Chinese (zh)
Other versions
TW202009734A (en
Inventor
林宣華
Shian-Hua Lin
顏皓民
Hao-min YAN
Original Assignee
國立暨南國際大學
National Chi Nan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 國立暨南國際大學, National Chi Nan University filed Critical 國立暨南國際大學
Priority to TW107128203A priority Critical patent/TWI679545B/en
Application granted granted Critical
Publication of TWI679545B publication Critical patent/TWI679545B/en
Publication of TW202009734A publication Critical patent/TW202009734A/en

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本發明係一種自動化建立領域知識庫系統,其包含:伺服裝置;資料收集主單元設置於伺服裝置且電性連接至伺服裝置;資料探索主單元(Knowledge Exploration)設置於伺服裝置,且電性連接至伺服裝置;以及知識樹子單元(KT organizer),設置於資料探索主單元,且電性連接至伺服裝置,而收集資料透過知識樹子單元轉換成知識資料;其中將網路上抓取到的關鍵資料透過資料收集主單元轉換成收集資料,並將收集資料透過資料探索主單元轉換成知識資料。 The invention is an automatic establishment domain knowledge base system, which includes: a servo device; a data collection main unit is provided on the servo device and is electrically connected to the servo device; a data exploration main unit (Knowledge Exploration) is provided on the servo device and is electrically connected To the servo device; and the knowledge tree sub-unit (KT organizer), which is set in the data exploration main unit and is electrically connected to the servo device, and the collected data is converted into knowledge data through the knowledge tree sub-unit; the key data captured on the network The data collection main unit is converted into collected data, and the collected data is converted into knowledge data through the data exploration main unit.

Description

自動化建立領域知識庫系統及其方法 Automatic establishment of domain knowledge base system and method

本發明是關於一種自動化建立領域知識庫系統及其方法,運用本發明之經由進行的自動化建立領域知識庫方式,讓使用者在透過本發明時可更準確及自動化的正確的了解專業領域。 The present invention relates to an automatic establishment of a domain knowledge base system and a method thereof. By using the method of establishing the domain knowledge base automatically through the present invention, a user can more accurately and automatically understand the professional field through the present invention.

近年來,網路上的資訊交流已是大家需要且必須學會的工具之一。 In recent years, information exchange on the Internet has become one of the tools that everyone needs and must learn.

由於現今網路傳輸速度越來越快,且伺服器所需記憶體的存取量也越來越大的趨勢下,如何將資料做有效率、有系統的分類的知識管理系統是當今社會的一大課題。 As the speed of network transmission is getting faster and faster, and the amount of memory required by servers is getting larger and larger, how to make data an efficient and systematic classification of knowledge management systems is the current society. A big subject.

有鑑於此,在知識管理系統的應用領域中,經常使用樹狀結構的文件分類管理,一般通稱「知識樹」(Knowledge Tree),運用在儲存檔案與其分類歸檔的依據,檔案的取得特別適用於知識管理系統,檔案通常為由網路、檔案系統、資料庫中自動擷取而得的文件;樹狀結構的文件分類管理知識樹就像一個具有分類名稱、分類屬性檔案系統,分類之下亦可有其他分類或文件,也會有需要能連續回溯過去時點的整體狀況。 In view of this, in the application field of the knowledge management system, the classification management of files in a tree structure is often used, generally known as the "Knowledge Tree", which is used to store the files and the basis of their classification and archiving. The acquisition of files is particularly suitable for Knowledge management system. Files are usually files that are automatically retrieved from the network, file system, and database. A tree-structured document classification management knowledge tree is like a file system with classification names and classification attributes. There may be other classifications or documents, and there may be a need to be able to continuously trace back to the overall situation of the past.

故此,良好的回饋設計,不僅可還精準分類文件內容,以供文件管理之用,亦可連續回饋整體分類知識樹,以供知識管理之用,或分析特定文件或分類範圍的一連重要關鍵字隨時間改變之趨勢,以供分析市場趨勢、商業情報之用。 Therefore, a good feedback design can not only accurately classify the file content for file management, but also continuously feedback the overall classification knowledge tree for knowledge management, or analyze a series of important keywords for a specific file or classification range. Trends that change over time for analysis of market trends and business intelligence.

有鑑於上述先前技術之問題,本發明之目的就是在提供一種自動化建立領域知識庫系統及其方法,以解決習知手動建立知識庫以管理文件資料庫不完備之缺陷。 In view of the above-mentioned problems of the prior art, an object of the present invention is to provide an automatic establishment of a domain knowledge base system and a method thereof, so as to solve the shortcomings of conventionally manually building a knowledge base to manage an incomplete file database.

根據本發明之目的,提出自動化建立領域知識庫系統,其包含:伺服裝置;資料收集主單元,設置於伺服裝置,且電性連接至伺服裝置;資料探索主單元,設置於伺服裝置,且電性連接至伺服裝置;以及知識樹子單元,設置於資料探索主單元,且電性連接至伺服裝置,而收集資料透過知識樹子單元轉換成知識資料;其中將網路上抓取到的關鍵資料透過資料收集主單元轉換成收集資料,並將收集資料透過資料探索主單元轉換成知識資料。 According to the purpose of the present invention, an automated domain knowledge base system is proposed, which includes: a servo device; a data collection main unit provided on the servo device and electrically connected to the servo device; a data exploration main unit provided on the servo device and electrically And the knowledge tree sub-unit is set in the data exploration main unit and is electrically connected to the servo device, and the collected data is converted into knowledge data through the knowledge tree sub-unit; the key data captured on the network is transmitted through the data The main collection unit is converted into collected data, and the collected data is converted into knowledge data through the data exploration main unit.

較佳地,資料收集主單元更包括建立領域子單元、概念選擇子單元、網路爬蟲子單元、關鍵字抽取子單元以及資料分級子單元。 Preferably, the data collection main unit further includes a domain creation subunit, a concept selection subunit, a web crawler subunit, a keyword extraction subunit, and a data classification subunit.

較佳地,資料探索主單元更包括概念抽取子單元、主題分析子單元、一知識處理子單元。 Preferably, the data exploration main unit further includes a concept extraction subunit, a topic analysis subunit, and a knowledge processing subunit.

較佳地,伺服裝置可電連接聊天機器人裝置、搜尋引擎系統、關鍵推薦系統以及文件分類系統中。 Preferably, the servo device can be electrically connected to a chat robot device, a search engine system, a key recommendation system, and a document classification system.

根據本發明之目的,提出一種自動化建立領域知識庫方法,其包含以下步驟:透過伺服裝置從網路抓取資料,資料收集主單元的資料收集子單元選取關鍵資料,並在資料收集主單元的資料分級子單元做過濾分析處理產生收集資料;利用資料探索主單元的概念抽取子單元對收集資料做數據分析處理並擷取收集資料;收集資料透過資料探索主單元的主題分析子單元中定義主題資料,知識樹子單元運用演算法計算收集資料以及主題資料的相關率,並在收集資料以及主題資料的相關率之間再區分主要關係鏈以及特殊關係鏈,並在主要關係鏈上設立複數個節點以及特殊關係鏈上設立複數個子節點,並將複數個節點以及複數個子節點串聯建立知識樹領域。 According to the purpose of the present invention, a method for automatically establishing a domain knowledge base is proposed, which includes the following steps: fetching data from a network through a server device, selecting key data from a data collection subunit of the data collection main unit, and selecting the key data from the data collection main unit. The data classification sub-unit performs filtering and analysis processing to generate collected data; the concept extraction sub-unit of the data exploration main unit performs data analysis processing on the collected data and retrieves the collected data; the collected data is defined by the theme in the topic analysis sub-unit of the data exploration main unit The data and knowledge tree sub-units use algorithms to calculate the correlation rate between the collected data and the topic data, and then distinguish between the main relationship chain and the special relationship chain between the collected data and the topic data, and set up multiple nodes on the main relationship chain. And a plurality of child nodes are set up on a special relationship chain, and a plurality of nodes and a plurality of child nodes are connected in series to establish a knowledge tree domain.

較佳地,資料收集主單元其更包括建立領域子單元、概念選擇子單元、網路爬蟲子單元以及關鍵字抽取子單元。 Preferably, the data collection main unit further includes a domain creation subunit, a concept selection subunit, a web crawler subunit, and a keyword extraction subunit.

較佳地,資料探索主單元其更包括概念抽取子單元以及資料處理子單元。 Preferably, the data exploration main unit further includes a concept extraction subunit and a data processing subunit.

較佳地,知識樹領域會保存複數個節點以及複數個子節點,並回饋於另一資料中。 Preferably, a plurality of nodes and a plurality of child nodes are stored in the field of the knowledge tree and are fed back to another data.

上所述,依本發明之自動化建立領域知識庫系統及其方法,其可具有一或多個下述優點: As mentioned above, according to the automatic establishment domain knowledge base system and method of the present invention, it may have one or more of the following advantages:

(1)從網路上抓取資料時,可自行過濾並篩選及建立一套方便使用者在搜尋或機器辨識的系統。 (1) When capturing data from the Internet, you can filter and filter and build a system that is convenient for users to search or identify on the machine.

(2)本發明可相容於文件分類或推薦系統上,且可快速讓使用者了解且進行分析時係客觀及透明的實施。 (2) The present invention is compatible with document classification or recommendation systems, and can be quickly and objectively and transparently implemented for users to understand and analyze.

101‧‧‧領域知識管理系統 101‧‧‧Domain Knowledge Management System

102‧‧‧建立領域子單元 102‧‧‧ Establish domain subunit

103‧‧‧資料處理子單元 103‧‧‧Data Processing Subunit

104‧‧‧概念選擇子單元 104‧‧‧Concept selection subunit

105‧‧‧網路爬蟲子單元 105‧‧‧Web crawler subunit

106‧‧‧收集資料 106‧‧‧Collecting Information

107‧‧‧關鍵字抽取子單元 107‧‧‧Keyword Extraction Subunit

108‧‧‧資料分級子單元 108‧‧‧Data Classification Subunit

109‧‧‧概念抽取子單元 109‧‧‧Concept extraction subunit

110‧‧‧主題分析子單元 110‧‧‧Subject Analysis Subunit

111‧‧‧知識樹子單元 111‧‧‧ Knowledge Tree Subunit

112‧‧‧領域知識庫 112‧‧‧ Domain Knowledge Base

113‧‧‧資料收集主單元 113‧‧‧Data Collection Master Unit

114‧‧‧資料探索主單元 114‧‧‧Data Exploration Main Unit

115‧‧‧自動領域知識結構 115‧‧‧Automatic domain knowledge structure

116‧‧‧論壇 116‧‧‧Forum

117‧‧‧社群 117‧‧‧Community

118‧‧‧新聞 118‧‧‧News

119‧‧‧一般字典 119‧‧‧General dictionary

120‧‧‧專業字典 120‧‧‧ professional dictionary

121‧‧‧雲端字典 121‧‧‧Cloud Dictionary

122‧‧‧應用層面 122‧‧‧Application level

123‧‧‧搜尋引擎系統 123‧‧‧Search Engine System

124‧‧‧文件分類系統 124‧‧‧File Classification System

125‧‧‧關鍵推薦系統 125‧‧‧Key Recommendation System

126‧‧‧聊天機器人裝置 126‧‧‧chat robot device

201‧‧‧建立領域子單元 201‧‧‧ Establishing domain subunits

202‧‧‧概念選擇子單元 202‧‧‧Concept selection subunit

203‧‧‧網路爬蟲子單元 203‧‧‧Web crawler subunit

204‧‧‧關鍵字抽取子單元 204‧‧‧Keyword extraction subunit

205‧‧‧資料分級子單元 205‧‧‧Data Classification Subunit

206‧‧‧資料探索子單元 206‧‧‧Data Exploration Subunit

207‧‧‧資料收集主單元 207‧‧‧Data Collection Master Unit

301‧‧‧資料探索主單元 301‧‧‧Data Exploration Main Unit

302‧‧‧概念抽取子單元 302‧‧‧Concept extraction subunit

303‧‧‧主題分析子單元 303‧‧‧Thematic Analysis Subunit

304‧‧‧知識樹子單元 304‧‧‧ Knowledge Tree Subunit

305‧‧‧資料處理子單元 305‧‧‧Data Processing Subunit

306‧‧‧資料收集主單元 306‧‧‧Data Collection Master Unit

第1圖係為本發明實施例之自動化建立領域知識庫系統之關係示意圖。 FIG. 1 is a schematic diagram of the relationship of the automatic establishment of a domain knowledge base system according to an embodiment of the present invention.

第2圖係為本發明實施例之自動化建立領域知識庫方法之資料收集主單元示意圖。 FIG. 2 is a schematic diagram of a data collection main unit of a method for automatically establishing a domain knowledge base according to an embodiment of the present invention.

第3圖係為本發明實施例之自動化建立領域知識庫方法資料探索主單元示意圖。 FIG. 3 is a schematic diagram of a main unit for automatically establishing a domain knowledge base method and data exploration according to an embodiment of the present invention.

為利貴審查委員瞭解本發明之技術特徵、內容與優點及其所能達成之功效,茲將本發明配合附圖,並以實施例之表達形式詳細說明如下,而其中所使用之圖式,其主旨僅為示意及輔助說明書之用,未必為本發明實施後之真實比例與精準配置,故不應就所附之圖式的比例與配置關係解讀、侷限本發明於實際實施上的權利範圍,合先敘明。 In order to help the review committee understand the technical features, contents and advantages of the present invention and the effects that can be achieved, the present invention is described in detail in conjunction with the accompanying drawings in the form of embodiments, and the drawings used therein are The main purpose is only for the purpose of illustration and supplementary description. It may not be the actual proportion and precise configuration after the implementation of the invention. Therefore, the attached drawings should not be interpreted and limited to the scope of rights of the present invention in actual implementation. He Xianming.

如第1圖所示,為本發明之自動化建立領域知識庫系統之關係示意圖,其中包含:領域知識管理系統101、建立領域子單元102、資料處理子單元103、概念選擇子單元104、網路爬蟲子單元105、新資料106、關鍵字抽取子單元107、資料分級子單元108、概念抽取子單元109、主題分析子單元110、知識樹子單元111、領域知識庫112、資料收集主單元113、資料探索主單元114、 自動領域知識結構115、論壇116、社群117、新聞118、一般字典119、專業字典120、雲端字典121、應用層面122、搜尋引擎系統123、文件分類系統124、關鍵推薦系統125。 As shown in Figure 1, it is a schematic diagram of the relationship between the automatic establishment of a domain knowledge base system of the present invention, which includes: a domain knowledge management system 101, a domain creation sub-unit 102, a data processing sub-unit 103, a concept selection sub-unit 104, and a network. Crawler subunit 105, new data 106, keyword extraction subunit 107, data classification subunit 108, concept extraction subunit 109, topic analysis subunit 110, knowledge tree subunit 111, domain knowledge base 112, data collection main unit 113, Data Exploration Main Unit 114, Automatic domain knowledge structure 115, forum 116, community 117, news 118, general dictionary 119, professional dictionary 120, cloud dictionary 121, application level 122, search engine system 123, file classification system 124, key recommendation system 125.

其中領域知識管理系統101包含:建立領域子單元102及資料處理子單元103,建立領域子單元102係說明將關鍵資料做相關率編輯輸入至自動領域知識結構115中的資料收集主單元113,並運用領域知識庫112設置資料彼此之間的相連性產生收集資料,資料探索主單元114將收集資料中從領域的資料中找出領域的概念/關鍵字,並透過主題分析子單元110從領域的概念/關鍵字中定義主題知識樹子單元111利用領域的概念/關鍵字和主題建立領域知識樹系統將收集資料轉換成知識資料,並透過資料處理子單元103輸出知識資料。 The domain knowledge management system 101 includes: establishing a domain sub-unit 102 and a data processing sub-unit 103. Establishing the domain sub-unit 102 explains that key data is edited and input into the data collection main unit 113 in the automatic domain knowledge structure 115, and Use the domain knowledge base 112 to set the connectivity between the data to generate collected data. The data exploration main unit 114 will find the concept / keyword of the domain from the data in the collected data, and use the topic analysis subunit 110 The concept / keyword definition topic knowledge tree subunit 111 uses the domain concept / keyword and topic to establish a domain knowledge tree system to convert collected data into knowledge data, and outputs the knowledge data through the data processing subunit 103.

進一步說明當關鍵資料進入資料收集主單元113時,會先透過概念選擇子單元104、網路爬蟲子單元105篩選關鍵資料並確立資料本身的完整性及正確性,再透過領域知識庫112及新資料106補足並再加強收集收集資料的完整性,收集資料係從網路上的論壇116、社群117、新聞118獲得,最後再透過關鍵字抽取子單元107、資料分級子單元108將關鍵資料與新資料106相容產生收集資料。 It is further explained that when the key data enters the data collection main unit 113, it will first select the key data through the concept selection sub-unit 104 and the web crawler sub-unit 105 and establish the integrity and correctness of the data itself, and then use the domain knowledge base 112 and new The data 106 complements and further strengthens the integrity of the collected data. The collected data is obtained from the online forum 116, community 117, and news 118. Finally, the key data is extracted through the keyword extraction subunit 107 and the data classification subunit 108. The new data 106 is compatible to generate the collected data.

而收集資料會再進入資料探索主單元114中的雲端字典121進行檢索重要關鍵字,其中雲端字典121包含:一般字典119及專業字典120,累積常用或與專業領域相關的字典關鍵字;並再透過主題分析子單元110將收集資料進行集合後,將進行知識樹子單元111的樹枝狀剖析,其剖析方法係運用最小集合方式,計算收集資料間的相關率,並將資料間的相關率區分出主要關連鏈及特殊關連鏈的關係後,在主要關連鏈上設置節點,及特殊關連鏈上設置子節點, 並將節點及子節點串聯起來產生知識樹領域即是知識資料,其中相關率數值高為主要關連鏈及相關率數值低為特殊關連鏈。 The collected data will then enter the cloud dictionary 121 in the data exploration main unit 114 to retrieve important keywords. The cloud dictionary 121 includes: a general dictionary 119 and a professional dictionary 120, accumulating dictionary keywords commonly used or related to the professional field; and then After the collected data is collected through the theme analysis sub-unit 110, a dendritic analysis of the knowledge tree sub-unit 111 will be performed. The analysis method uses the minimum set method to calculate the correlation rate between the collected data and distinguish the correlation rate between the data. After the relationship between the main connected chain and the special connected chain, nodes are set on the main connected chain and child nodes are set on the special connected chain. Knowledge nodes are generated by connecting nodes and sub-nodes in series, which is the knowledge data, where the high correlation rate value is the main connection chain and the low correlation rate value is the special connection chain.

知識樹子單元111其內部公式運算如下:

Figure TWI679545B_D0001
The internal formula of the knowledge tree subunit 111 is calculated as follows:
Figure TWI679545B_D0001

Conf(k a

Figure TWI679545B_D0002
k b )>Conf(k b
Figure TWI679545B_D0003
k a )→k a
Figure TWI679545B_D0004
SK,k b
Figure TWI679545B_D0005
GK k a ,k b
Figure TWI679545B_D0006
DT,k a k b Conf ( k a
Figure TWI679545B_D0002
k b )> Conf ( k b
Figure TWI679545B_D0003
k a ) → k a
Figure TWI679545B_D0004
SK , k b
Figure TWI679545B_D0005
GK k a , k b
Figure TWI679545B_D0006
DT , k a k b

其中收集資料ka與知識資料kb係透過將收集資料ka與知識資料kb的交集除上收集資料ka需大於或等於主題分析TH,而收集資料ka與知識資料kb皆屬於領域知識樹DT,且收集資料ka與知識資料kb係分屬於不同的資料,知識資料kb係透過主要關係鏈GK鏈結,收集資料ka係透過特殊關係鏈SK鏈結。 The collected data k a and the knowledge data k b are obtained by dividing the intersection of the collected data k a and the knowledge data k b by the collected data k a needs to be greater than or equal to the subject analysis TH. The collected data k a and the knowledge data k b Domain knowledge tree DT, and the collected data k a and knowledge data k b belong to different data. The knowledge data k b is linked through the main relationship chain GK, and the collected data k a is linked through the special relationship chain SK.

當使用者欲從網路上抓取一組關鍵資料,透過資料收集主單元113將關鍵資料轉換成收集資料,而收集資料再透過資料探索主單元114轉換成知識資料。 When a user wants to capture a set of key data from the Internet, the key data is converted into collected data through the data collection main unit 113, and the collected data is converted into knowledge data through the data exploration main unit 114.

如第2圖所示,為本發明之自動化建立領域知識庫方法之資料收集主單元示意圖,其包含:建立領域子單元201、概念選擇子單元202、網路爬蟲子單元203、關鍵字抽取子單元204、資料分級子單元205、資料探索子單元206、資料收集主單元207。 As shown in FIG. 2, it is a schematic diagram of a data collection main unit of the method for automatically establishing a domain knowledge base according to the present invention, including: a domain sub-unit 201, a concept selection sub-unit 202, a web crawler sub-unit 203, and a keyword extraction sub-unit. Unit 204, data classification sub-unit 205, data exploration sub-unit 206, and data collection main unit 207.

其中當建立領域子單元201已將使用者輸入的關鍵字確定完成後,再透過概念選擇子單元202從重要關鍵字中挑出關鍵字做為關鍵字種子並組合後再提供給網路爬蟲子單元203。 Among them, after the domain-building sub-unit 201 has determined the keywords entered by the user, the keywords are selected from the important keywords as keyword seeds through the concept selection sub-unit 202 and combined and provided to the web crawler. Unit 203.

網路爬蟲子單元203利用關鍵字種子到網路上蒐集多筆領域相關資料並回存至領域知識庫112,關鍵字抽取子單元204會對新進的資料進行關鍵 字擷取,並透過資料分級子單元205利用關鍵字相關率來重新對蒐集的資料進行領域與非領域的分類,分類的資料和關鍵字會提供給資料探索主單元114進行領域知識的建構。 The web crawler sub-unit 203 uses the keyword seed to collect multiple domain-related data on the Internet and saves it to the domain knowledge base 112. The keyword extraction sub-unit 204 will key the newly entered data. The words are retrieved, and the data classification sub-unit 205 is used to re-categorize the collected data into domain and non-domain through the keyword correlation rate. The classified data and keywords are provided to the data exploration main unit 114 to construct domain knowledge.

進一步說明關鍵字抽取子單元204係運用使用者定義擷取的關鍵字長度,將新蒐集的資料進行關鍵字擷取,擷取的關鍵字會成為候選關鍵字存入領域知識庫112並將關鍵資料與新資料106透過資料分級子單元205相容成收集資料。 Further explain that the keyword extraction sub-unit 204 uses user-defined extracted keyword lengths to perform keyword extraction on newly collected data. The extracted keywords will become candidate keywords and be stored in the domain knowledge base 112 and the key will be The data is compatible with the new data 106 through the data classification subunit 205 to collect data.

如第3圖所示,為本發明之自動化建立領域知識庫方法資料探索主單元示意圖,其包含:資料探索主單元301、概念抽取子單元302、主題分析子單元303、知識樹子單元304、資料處理子單元305、資料收集主單元306。 As shown in FIG. 3, it is a schematic diagram of the main unit of data exploration of the method for automatically establishing a domain knowledge base in the present invention, which includes: a data exploration main unit 301, a concept extraction subunit 302, a topic analysis subunit 303, a knowledge tree subunit 304, and data The processing subunit 305 and the data collection main unit 306.

將上述透過資料收集主單元207產生的收集資料,概念抽取子單元302從資料收集主單元207提供的領域知識庫112中找出領域的概念/關鍵字,主題分析子單元303從領域的概念/關鍵字中定義主題,知識樹子單元304利用領域的概念/關鍵字和主題建立領域知識樹,而領域知識樹中的重要關鍵字會再回饋給資料收集主單元207進行下一次的關鍵字循環,使本發明可透過每一次的關鍵字收尋,可以有更多關鍵字相互支持串聯。 From the above-mentioned collected data generated by the data collection main unit 207, the concept extraction subunit 302 finds out the concept / keyword of the domain from the domain knowledge base 112 provided by the data collection main unit 207, and the topic analysis subunit 303 The keywords define the topic. The knowledge tree sub-unit 304 uses the domain concepts / keywords and topics to build a domain knowledge tree. The important keywords in the domain knowledge tree are fed back to the data collection main unit 207 for the next keyword cycle. By making the present invention searchable through each keyword, more keywords can be supported in series with each other.

進一步說明探索主單元301運用公式如下:

Figure TWI679545B_D0007
Further explain that the exploration main unit 301 uses the formula as follows:
Figure TWI679545B_D0007

其中kt係為主題數目、DK係為領域關鍵字、kd係為領域數目,其中領域數目kd係界定出領域關鍵字DK的邊界,以增加關鍵字彼此之間的鏈結性。 Where k t is the number of topics, DK is the domain keyword, and k d is the number of domains. The number of domains k d defines the boundary of the domain keywords DK to increase the linkability between the keywords.

進一步說明,知識樹子單元304可再藉由領域關鍵設定差及領域關鍵級差進行說明,其中領域關鍵設定差:

Figure TWI679545B_D0008
To further explain, the knowledge tree subunit 304 can be further described by the domain key setting difference and the domain key setting difference, where the domain key setting difference is:
Figure TWI679545B_D0008

DK係為領域關鍵字、DK係為領域上的另一組關鍵字,因此運用領域關鍵設定差,可將領域關鍵字定義出其關係鍊。 DK is a domain keyword, and DK is another set of keywords in the domain. Therefore, using the key setting of the domain is poor, the domain keyword can be used to define its relationship chain.

最後使用者可透過本發明讓關鍵字系統可重複自我學習下進行更順暢且與時俱增的更新關鍵字。 Finally, the user can use the present invention to make the keyword system repeat the self-learning for smoother and more frequent keyword updates.

綜合上述,本發明之一種自動化建立領域知識庫方法,其步驟包含: To sum up, the method for automatically establishing a domain knowledge base of the present invention includes the following steps:

步驟1:透過伺服裝置抓取資料並藉由資料收集主單元的資料收集子單元選取關鍵資料。 Step 1: Capture data through the servo device and select key data through the data collection sub-unit of the data collection main unit.

步驟2:關鍵資料在資料收集主單元的資料分級子單元做過濾分析處理產生收集資料。 Step 2: The key data is filtered and processed in the data classification sub-unit of the main data collection unit to generate collected data.

步驟3:收集資料透過資料探索主單元的概念抽取子單元做數據分析處理同時擷取收集資料。 Step 3: Collecting the data The concept of the main unit of data exploration extracts the sub-units for data analysis and processing, and simultaneously collects the collected data.

步驟4:收集資料透過資料探索主單元的主題分析子單元中定義主題資料。 Step 4: Collect data Define the topic data in the topic analysis sub-unit of the data exploration main unit.

步驟5:知識樹子單元運用演算法計算收集資料以及主題資料的相關率,並在收集資料以及主題資料的相關率之間再區分主要關係鏈,以及特殊關係鏈,並在主要關係鏈上設立複數個節點,以及特殊關係鏈上設立複數個子節點,並將複數個節點以及複數個子節點串聯,建立知識樹領域。 Step 5: The knowledge tree sub-unit uses an algorithm to calculate the correlation rate between the collected data and the topic data, and then distinguishes between the main relationship chain and the special relationship chain between the collected data and the topic data, and sets up plural numbers on the main relationship chain. Nodes, and a plurality of child nodes are set up on the special relationship chain, and a plurality of nodes and a plurality of child nodes are connected in series to establish a knowledge tree field.

以上所述僅為舉例性,而非為限制性者,任何未脫離本發明之精神與範疇,而對其進行之等效修改或變更,均應包含於後附之申請專利範圍中。 The above description is only exemplary, rather than limiting, and any equivalent modification or change made without departing from the spirit and scope of the present invention shall be included in the scope of the attached patent application.

Claims (8)

一種自動化建立領域知識庫系統,其包含:一伺服裝置,其設置以抓取一資料;一資料收集主單元,其設置於該伺服裝置並包含一資料收集子單元以及一資料分級子單元,該資料收集子單元於該伺服裝置取得一關鍵資料,該關鍵資料在該資料分級子單元做過濾分析處理產生一收集資料;一資料探索主單元,其設置於該伺服裝置並包含一概念抽取子單元、一主題分析子單元以及一知識樹子單元;其中該概念抽取子單元對該收集資料做一數據分析處理並擷取該收集資料,且該主題分析子單元對該收集資料定義一主題資料,並且該知識樹子單元運用一演算法計算該收集資料以及該主題資料的相關率,並在該收集資料以及該主題資料的相關率之間再區分一主要關係鏈以及一特殊關係鏈,並在該主要關係鏈上設立複數個節點以及該特殊關係鏈上設立複數個子節點,並將該複數個節點以及該複數個子節點串聯,從而建立一知識樹領域。A system for automatically establishing a domain knowledge base includes: a servo device configured to capture a data; a data collection main unit disposed on the servo device and including a data collection subunit and a data classification subunit, the The data collection sub-unit obtains key data from the servo device. The key data is filtered and analyzed in the data classification sub-unit to generate a collected data. A data exploration main unit is set on the servo device and includes a concept extraction sub-unit. A topic analysis subunit and a knowledge tree subunit; wherein the concept extraction subunit performs a data analysis process on the collected data and retrieves the collected data, and the topic analysis subunit defines a topic data on the collected data, and The knowledge tree sub-unit uses an algorithm to calculate the correlation rate between the collected data and the topic data, and further distinguishes a main relationship chain and a special relationship chain between the collected data and the correlation ratio of the topic data, and Set up multiple nodes on the relationship chain and set up multiple children on the special relationship chain Point, and the plurality of nodes and a plurality of sub node in series, so as to establish a knowledge tree fields. 如申請專利範圍第1項所述之自動化建立領域知識庫系統,其中該資料收集主單元更包括一建立領域子單元、一概念選擇子單元、一網路爬蟲子單元以及一關鍵字抽取子單元。As described in item 1 of the scope of the patent application, the domain building knowledge base system is automated, wherein the data collection main unit further includes a building domain sub-unit, a concept selection sub-unit, a web crawling sub-unit, and a keyword extraction sub-unit . 如申請專利範圍第1項所述之自動化建立領域知識庫系統,其中該資料探索主單元更包括一主題分析子單元、一資料處理子單元。The automatic establishment of a domain knowledge base system as described in the first item of the patent application scope, wherein the data exploration main unit further includes a topic analysis sub-unit and a data processing sub-unit. 如申請專利範圍第1項所述之自動化建立領域知識庫系統,其中該伺服裝置電連接一聊天機器人、一搜尋引擎系統、一關鍵推薦系統以及一文件分類系統中。The automatic establishment of a domain knowledge base system as described in the first patent application scope, wherein the servo device is electrically connected to a chat robot, a search engine system, a key recommendation system, and a document classification system. 一種自動化建立領域知識庫方法,其步驟包含:透過一伺服裝置抓取一資料,一資料收集主單元的一資料收集子單元選取一關鍵資料並在該資料收集主單元的一資料分級子單元做過濾分析處理產生一收集資料;以及利用一資料探索主單元的一概念抽取子單元對該收集資料做一數據分析處理並擷取該收集資料,該收集資料透過一資料探索主單元的一主題分析子單元中定義一主題資料,一知識樹子單元運用一演算法計算該收集資料以及該主題資料的相關率,並在該收集資料以及該主題資料的相關率之間再區分一主要關係鏈以及一特殊關係鏈,並在該主要關係鏈上設立複數個節點以及該特殊關係鏈上設立複數個子節點,並將該複數個節點以及該複數個子節點串聯,從而建立一知識樹領域。A method for automatically establishing a domain knowledge base includes the steps of: capturing a piece of data through a servo device, a data collection sub-unit of a data collection main unit, selecting a key piece of data, and doing it in a data classification sub-unit of the data collection main unit. Filtering analysis processing to generate a collection of data; and using a concept extraction sub-unit of a data exploration main unit to perform a data analysis process on the collected data and retrieve the collected data, the collected data is analyzed through a theme of a data exploration main unit A sub-unit is defined in a sub-unit. A knowledge tree sub-unit uses an algorithm to calculate the correlation rate between the collected data and the subject data, and distinguishes between a main relationship chain and a correlation between the collected data and the correlation rate of the subject data. A special relationship chain, a plurality of nodes are set up on the main relationship chain and a plurality of child nodes are set up on the special relationship chain, and the plurality of nodes and the plurality of child nodes are connected in series, thereby establishing a knowledge tree field. 如申請專利範圍第5項所述之自動化建立領域知識庫方法,其中該資料收集主單元更包括一建立領域子單元、一概念選擇子單元、一網路爬蟲子單元以及一關鍵字抽取子單元。The method for automatically establishing a domain knowledge base as described in item 5 of the scope of patent application, wherein the main data collection unit further includes a domain creation subunit, a concept selection subunit, a web crawler subunit, and a keyword extraction subunit. . 如申請專利範圍第5項所述之自動化建立領域知識庫方法,其中該資料探索主單元更包括一概念抽取子單元以及一資料處理子單元。The method for automatically establishing a domain knowledge base as described in item 5 of the scope of patent application, wherein the data exploration main unit further includes a concept extraction subunit and a data processing subunit. 如申請專利範圍第5項所述之自動化建立領域知識庫方法,其中該知識樹領域會保存該複數個節點以及該複數個子節點,並回饋於另一個該關鍵資料中。The method for automatically establishing a domain knowledge base as described in item 5 of the scope of the patent application, wherein the knowledge tree domain saves the plurality of nodes and the plurality of child nodes, and feeds back to another key material.
TW107128203A 2018-08-13 2018-08-13 Domain knowledge base system and method in the same TWI679545B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW107128203A TWI679545B (en) 2018-08-13 2018-08-13 Domain knowledge base system and method in the same

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW107128203A TWI679545B (en) 2018-08-13 2018-08-13 Domain knowledge base system and method in the same

Publications (2)

Publication Number Publication Date
TWI679545B true TWI679545B (en) 2019-12-11
TW202009734A TW202009734A (en) 2020-03-01

Family

ID=69582366

Family Applications (1)

Application Number Title Priority Date Filing Date
TW107128203A TWI679545B (en) 2018-08-13 2018-08-13 Domain knowledge base system and method in the same

Country Status (1)

Country Link
TW (1) TWI679545B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI235935B (en) * 2000-06-07 2005-07-11 Insyst Ltd A knowledge-engineering protocol-suite system
TW200538963A (en) * 2004-05-18 2005-12-01 Taiwan Semiconductor Mfg System and method for storing and accessing information via smart knowledge agents
US7177852B2 (en) * 2000-07-06 2007-02-13 British Telecommunications Public Limited Company Method and apparatus for extracting knowledge from software code or other structured data
US20100063799A1 (en) * 2003-06-12 2010-03-11 Patrick William Jamieson Process for Constructing a Semantic Knowledge Base Using a Document Corpus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI235935B (en) * 2000-06-07 2005-07-11 Insyst Ltd A knowledge-engineering protocol-suite system
US7177852B2 (en) * 2000-07-06 2007-02-13 British Telecommunications Public Limited Company Method and apparatus for extracting knowledge from software code or other structured data
US20100063799A1 (en) * 2003-06-12 2010-03-11 Patrick William Jamieson Process for Constructing a Semantic Knowledge Base Using a Document Corpus
TW200538963A (en) * 2004-05-18 2005-12-01 Taiwan Semiconductor Mfg System and method for storing and accessing information via smart knowledge agents

Also Published As

Publication number Publication date
TW202009734A (en) 2020-03-01

Similar Documents

Publication Publication Date Title
Szomszor et al. Semantic modelling of user interests based on cross-folksonomy analysis
Alani Position paper: ontology construction from online ontologies
CN102549571B (en) From the terrestrial reference of digital picture set
Liu et al. Finding media illustrating events
CN102750346B (en) Method, system and terminal device for recommending software
CN106980651B (en) Crawling seed list updating method and device based on knowledge graph
CN104778208A (en) Method and system for optimally grasping search engine SEO (search engine optimization) website data
CN101739407A (en) Method and system for automatically constructing information organization structure used for related information browse
CN105488211A (en) Method for determining user group based on feature analysis
CN104615627A (en) Event public sentiment information extracting method and system based on micro-blog platform
CN105183916A (en) Device and method for managing unstructured data
CN105512301A (en) User grouping method based on social content
Zhai et al. Random forest based traffic classification method in sdn
US9230210B2 (en) Information processing apparatus and method for obtaining a knowledge item based on relation information and an attribute of the relation
TWI679545B (en) Domain knowledge base system and method in the same
Kumar et al. Progressive machine learning approach with WebAstro for Web usage mining
Shavitt et al. Song clustering using peer-to-peer co-occurrences
Selvam et al. Social Event Detection–A Systematic Approach using Ontology and Linked Open Data with Significance to Semantic Links
Oramas Harvesting and structuring social data in music information retrieval
JP4745993B2 (en) Consciousness system construction device and consciousness system construction program
CN103177053B (en) Teaching plan editing dynamic resource recommendation method and teaching plan editing system thereof
Paniagua et al. Social events and social ties
JP3774145B2 (en) Web site internal structure estimation device, internal structure estimation method, program for the method, and recording medium recording the program
Crescenzi et al. Alfred: crowd assisted data extraction
CN104809148B (en) A kind of method and apparatus for determining mark post object

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees