TWI757957B - Automatic classification method and system of webpages - Google Patents
Automatic classification method and system of webpages Download PDFInfo
- Publication number
- TWI757957B TWI757957B TW109138812A TW109138812A TWI757957B TW I757957 B TWI757957 B TW I757957B TW 109138812 A TW109138812 A TW 109138812A TW 109138812 A TW109138812 A TW 109138812A TW I757957 B TWI757957 B TW I757957B
- Authority
- TW
- Taiwan
- Prior art keywords
- webpage
- keywords
- article
- matrix
- identifier
- Prior art date
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Information Transfer Between Computers (AREA)
Abstract
Description
本發明是有關於一種自動分類方法及系統,且特別是有關於一種網頁的自動分類方法及系統。 The present invention relates to an automatic classification method and system, and in particular, to an automatic classification method and system of web pages.
網路已成為生活中不可或缺的部分,人們時常透過電腦瀏覽網頁,當瀏覽到喜歡的網頁或重要的網頁時,可透過瀏覽器的功能儲存網頁,例如將網頁儲存在「我的最愛」中,以便下次打開瀏覽器時,可透過儲存在「我的最愛」中的網頁快速瀏覽儲存的網頁。 The Internet has become an indispensable part of life. People often browse web pages through computers. When they browse to favorite web pages or important web pages, they can save web pages through the browser's function, such as saving web pages in "My Favorites". , so that the next time you open your browser, you can quickly browse the saved pages from the pages saved in Favorites.
但是,在儲存網頁時,使用者常常需花很多時間想網頁的分類名稱,若分類名稱不準確,下次打開瀏覽器的時,使用者很難快速找到先前儲存的網頁進行瀏覽,造成使用不便。 However, when storing web pages, users often spend a lot of time thinking about the category names of the web pages. If the category names are inaccurate, it will be difficult for users to quickly find the previously stored web pages for browsing when the browser is opened next time, resulting in inconvenience. .
因此,如何對網頁提供準確的分類名稱,已成為業界努力的方向。 Therefore, how to provide accurate classification names for web pages has become the direction of the industry's efforts.
本發明係有關於一種網頁的自動分類方法及系統。 The present invention relates to an automatic classification method and system for web pages.
根據本發明之一實施例,提出一種網頁的自動分類方法。網頁的自動分類方法包括以下步驟。使用一應用程式介面(API)擷取一網站中之一網頁包含的複數個關鍵字,並給予網頁包含的每一關鍵字一識別符(ID)。以網站中所有網頁作為母體,計算網頁包含的每一關鍵字的TF-IDF值。根據網頁包含的每一關鍵字之識別符及網頁包含的每一關鍵字的TF-IDF值產生一矩陣。將矩陣輸入至一網頁分類模型以產生一預測的分類名稱。以預測的分類名稱儲存網頁。 According to an embodiment of the present invention, an automatic classification method for web pages is provided. The automatic classification method of web pages includes the following steps. An application programming interface (API) is used to retrieve a plurality of keywords contained in a webpage in a website, and an identifier (ID) is given to each keyword contained in the webpage. Taking all webpages in the website as the parent, calculate the TF-IDF value of each keyword contained in the webpage. A matrix is generated according to the identifier of each keyword contained in the web page and the TF-IDF value of each keyword contained in the web page. The matrix is input to a web page classification model to generate a predicted category name. Save the page with the predicted category name.
根據本發明之另一實施例,提出一種網頁的自動分類系統。網頁的自動分類系統包括一處理器及一網頁分類模型。處理器用以使用一應用程式介面(API)擷取一網站中之一網頁包含的複數個關鍵字,並給予網頁包含的每一關鍵字一識別符(ID)。處理器用以以網站中所有網頁作為母體,計算網頁包含的每一關鍵字的TF-IDF值。處理器用以根據網頁包含的每一關鍵字之識別符及網頁包含的每一關鍵字的TF-IDF值產生一矩陣。處理器用以將矩陣輸入至網頁分類模型以產生一預測的分類名稱。處理器用以以預測的分類名稱儲存網頁。 According to another embodiment of the present invention, an automatic classification system for web pages is provided. The automatic classification system for webpages includes a processor and a webpage classification model. The processor is used for using an application programming interface (API) to retrieve a plurality of keywords contained in a webpage in a website, and assigns an identifier (ID) to each keyword contained in the webpage. The processor is used for calculating the TF-IDF value of each keyword included in the webpage by taking all the webpages in the website as the parent. The processor is used for generating a matrix according to the identifier of each keyword contained in the web page and the TF-IDF value of each keyword contained in the web page. The processor is used to input the matrix to the webpage classification model to generate a predicted classification name. The processor is used for storing the web page with the predicted category name.
為了對本發明之上述及其他方面有更佳的瞭解,下文特舉實施例,並配合所附圖式詳細說明如下: In order to have a better understanding of the above-mentioned and other aspects of the present invention, the following specific examples are given and described in detail in conjunction with the accompanying drawings as follows:
100:自動分類系統 100: Automatic Classification System
110:處理器 110: Processor
120-1,120-2,120-10:網頁 120-1, 120-2, 120-10: web page
120:網站 120: Website
130:網頁分類模型 130: Web Page Classification Model
140,160:網站 140,160: Website
140-1,140-2,140-8,160-1,160-2,160-3:網頁 140-1, 140-2, 140-8, 160-1, 160-2, 160-3: web page
180:網站 180: Website
180-1,180-2:網頁 180-1, 180-2: web pages
180-11,180-21,180-22,180-23:文章 180-11, 180-21, 180-22, 180-23: Articles
API:應用程式介面 API: Application Programming Interface
CN120-1,CN140-1,CN140-8,CN160-1,CN160-3,CN180-11:分類名稱 CN 120-1, CN 140-1 , CN 140-8 , CN 160-1 , CN 160-3 , CN 180-11 : Classification name
KW1201,KW1202,KW1205,KW1401,KW1402,KW1406, KW 1201 ,KW 1202 ,KW 1205, KW 1401 ,KW 1402 ,KW 1406,
KW1801,KW1802,KW1806:關鍵字 KW 1801 , KW 1802 , KW 1806 : Keywords
PCN,PCN180-11:預測的分類名稱 PCN, PCN 180-11 : Predicted class names
MX,MX140-1,MX140-8,MX160-1,MX160-3,MX180-11:矩陣 MX,MX 140-1 ,MX 140-8 ,MX 160-1 ,MX 160-3 ,MX 180-11 : Matrix
S110,S120,S130,S140,S150,S210,S220,S230,S240,S310,S320,S330,S340,S350,S360,S370,S410,S420,S430,S440,S450,S460,S510,S520,S530,S540,S550,S560,S570:步驟 S110,S120,S130,S140,S150,S210,S220,S230,S240,S310,S320,S330,S340,S350,S360,S370,S410,S420,S430,S440,S450,S460,S510,S520,S530, S540, S550, S560, S570: Steps
第1圖繪示根據本發明一實施例之網頁的自動分類系統與網站的方塊圖。 FIG. 1 shows a block diagram of an automatic classification system for webpages and a website according to an embodiment of the present invention.
第2圖繪示根據本發明之一實施例之網頁的自動分類方法的流程圖。 FIG. 2 shows a flow chart of a method for automatically classifying webpages according to an embodiment of the present invention.
第3圖繪示根據本發明之一實施例之網頁的示意圖。 FIG. 3 shows a schematic diagram of a web page according to an embodiment of the present invention.
第4圖繪示根據本發明一實施例之矩陣的示意圖。 FIG. 4 is a schematic diagram of a matrix according to an embodiment of the present invention.
第5圖繪示根據本發明另一實施例之網頁的自動分類系統與網站的方塊圖。 FIG. 5 shows a block diagram of an automatic classification system for webpages and a website according to another embodiment of the present invention.
第6圖繪示根據本發明之另一實施例之網頁的自動分類方法中網頁分類模型130的訓練方法的流程圖。
FIG. 6 is a flowchart illustrating a training method of the
第7圖繪示根據本發明之另一實施例之網頁的示意圖。 FIG. 7 is a schematic diagram of a web page according to another embodiment of the present invention.
第8圖繪示根據本發明之另一實施例之矩陣的示意圖。 FIG. 8 is a schematic diagram of a matrix according to another embodiment of the present invention.
第9圖繪示根據本發明另一實施例之網頁的自動分類方法的流程圖。 FIG. 9 is a flow chart of a method for automatically classifying webpages according to another embodiment of the present invention.
第10圖繪示根據本發明另一實施例之網頁的自動分類系統與網站的方塊圖。 FIG. 10 shows a block diagram of an automatic classification system for webpages and a website according to another embodiment of the present invention.
第11圖繪示根據本發明之另一實施例之網頁的自動分類方法的流程圖。 FIG. 11 is a flowchart illustrating a method for automatically classifying webpages according to another embodiment of the present invention.
第12圖繪示根據本發明之一實施例之文章的示意圖。 FIG. 12 shows a schematic diagram of an article according to an embodiment of the present invention.
第13圖繪示根據本發明之另一實施例之網頁的自動分類方法的流程圖。 FIG. 13 is a flowchart illustrating a method for automatically classifying webpages according to another embodiment of the present invention.
請參照第1圖,其繪示根據本發明一實施例之網頁的自動分類系統100與網站120的方塊圖。網頁的自動分類系統100包括一處理器110及一網頁分類模型130。網頁的自動分類系統100例如是一智慧型手機、一平板電腦、一筆記型電腦或一桌上型電腦。網站120包括多個網頁,例如網頁120-1、120-2、...、120-10。網頁的自動分類系統100可瀏覽網站120中的網頁120-1、120-2、...、120-10,也可透過處理器110使用一應用程式介面API擷取網頁120-1、120-2、...、120-10中的資料。
Please refer to FIG. 1 , which shows a block diagram of an automatic web
以下搭配流程圖詳細說明上述各項元件之運作。請參照第2圖,其繪示根據本發明之一實施例之網頁的自動分類方法的流程圖。 The operation of the above components is described in detail with the flow chart below. Please refer to FIG. 2 , which shows a flowchart of a method for automatically classifying webpages according to an embodiment of the present invention.
步驟S110,使用一應用程式介面擷取一網站中之一網頁包含的複數個關鍵字,並給予網頁包含的每一關鍵字一識別符(ID)。請參照第3圖,其繪示根據本發明之一實施例之網頁120-1的示意圖。網頁120-1包含分類名稱CN120-1、及關鍵字KW1201、KW1202、...、KW1205。分類名稱例如為「運動類新聞」或「政治類新聞」..等。關鍵字例如為「中華隊」、「開球」、「全壘打」、「總統」或「市長」...等。處理器110使用應用程式介面擷取網站120中之網頁120-1包含的複數個關鍵字KW1201、KW1202、...、KW1205,並給予網頁120-1包含的每一關鍵字
KW1201、KW1202、...、KW1205一識別符。每一關鍵字KW1201、KW1202、...、KW1205給予不同的識別符。在一實施例中,應用程式介面具有一字典,應用程式介面根據字典給予每一關鍵字KW1201、KW1202、...、KW1205不同的識別符。
Step S110 , using an application programming interface to extract a plurality of keywords contained in a web page of a website, and assign an identifier (ID) to each keyword contained in the web page. Please refer to FIG. 3, which illustrates a schematic diagram of a web page 120-1 according to an embodiment of the present invention. The web page 120-1 contains the category name CN 120-1 , and the keywords KW 1201 , KW 1202 , . . . , KW 1205 . For example, the category name is "sports news" or "political news".. etc. Keywords such as "Chinese Team", "Kickoff", "Home Run", "President" or "Mayor"...etc. The
步驟S120,基於網站中所有網頁的數量,計算網頁包含的每一關鍵字的TF-IDF值。TF-IDF值的計算需要定義一母體。在此實施例中,母體為網站120中的所有網頁120-1、120-2、...、120-10。處理器110基於網站120中所有網頁120-1、120-2、...、120-10的數量(10),計算網頁120-1包含的每一關鍵字KW1201、KW1202、...、KW1205的TF-IDF值。
Step S120: Calculate the TF-IDF value of each keyword included in the webpage based on the number of all webpages in the website. The calculation of the TF-IDF value requires the definition of a matrix. In this embodiment, the parent is all the web pages 120-1, 120-2, . . . , 120-10 in the website 120. The
步驟S130,根據網頁包含的每一關鍵字之識別符及網頁包含的每一關鍵字的TF-IDF值產生一矩陣。請參照第4圖,其繪示根據本發明一實施例之矩陣MX的示意圖。處理器110根據網頁120-1包含的每一關鍵字KW1201、KW1202、...、KW1205之識別符及網頁120-1包含的每一關鍵字KW1201、KW1202、...、KW1205的TF-IDF值產生矩陣MX。換句話說,一網頁120-1對應一矩陣MX。
In step S130, a matrix is generated according to the identifier of each keyword contained in the webpage and the TF-IDF value of each keyword contained in the webpage. Please refer to FIG. 4 , which is a schematic diagram of a matrix MX according to an embodiment of the present invention. The
步驟S140,將矩陣輸入至網頁分類模型以產生一預測的分類名稱。處理器110將矩陣MX輸入至網頁分類模型130以產生一預測的分類名稱PCN。
Step S140, input the matrix into the webpage classification model to generate a predicted classification name. The
步驟S150,以預測的分類名稱儲存網頁。處理器110以預測的分類名稱PCN儲存網頁120-1。在一實施例中,在執行
步驟S110之前,處理器110判斷網頁是否已先前儲存過,當網頁先前未被儲存過,則執行步驟S110至步驟S150。舉例來說,處理器110在瀏覽器的cookie中建立一自定義欄位來記錄網頁120-1是否已先前儲存過。
In step S150, the webpage is stored with the predicted category name. The
如此一來,本案所提出之網頁的自動分類方法,可對一網頁所包含之每一關鍵字對應的識別符及TF-IDF值產生一矩陣,並輸入至已訓練的網頁分類模型以準確地產生網頁的分類名稱。 In this way, the automatic web classification method proposed in this case can generate a matrix for the identifier and TF-IDF value corresponding to each keyword contained in a web page, and input it into the trained web page classification model to accurately Generates the category name of the web page.
請參照第5及6圖。第5圖繪示根據本發明另一實施例之網頁的自動分類系統100與網站140、160的方塊圖。第6圖繪示根據本發明之另一實施例之網頁的自動分類方法中網頁分類模型130的訓練方法的流程圖。網站140包括網頁140-1、140-2、...、140-8。網站160包括網頁160-1、160-2、160-3。為方便說明,以下以兩個網站140、160作為訓練資料訓練網頁分類模型130為例。
Please refer to Figures 5 and 6. FIG. 5 illustrates a block diagram of an
步驟S210,使用應用程式介面擷取網站之網頁包含的複數個關鍵字及一分類名稱,並給予網頁包含的每一關鍵字一識別符。請參照第7圖,其繪示根據本發明之另一實施例之網頁140-1的示意圖。網頁140-1包含分類名稱CN140-1、及關鍵字KW1401、KW1402、...、KW1406。處理器110使用應用程式介面擷取網站140中之網頁140-1包含的複數個關鍵字KW1401、
KW1402、...、KW1406及分類名稱CN140-1,並給予網頁140-1包含的每一關鍵字KW1401、KW1402、...、KW1406一識別符。
In step S210, the application programming interface is used to extract a plurality of keywords and a category name contained in the webpage of the website, and an identifier is given to each keyword contained in the webpage. Please refer to FIG. 7, which shows a schematic diagram of a web page 140-1 according to another embodiment of the present invention. The web page 140-1 contains the category name CN 140-1 , and the keywords KW 1401 , KW 1402 , . . . , KW 1406 . The
步驟S220,基於複數個網站中所有網頁的數量,計算網頁包含的每一關鍵字的TF-IDF值。TF-IDF值的計算需要定義一母體。在此實施例中,母體為網站140中的所有網頁140-1、140-2、...、140-8以及網站160中的所有網頁160-1、160-2、160-3。處理器110基於網站140中的所有網頁140-1、140-2、...、140-8以及網站160中的所有網頁160-1、160-2、160-3的數量(11),計算網頁140-1包含的每一關鍵字KW1401、KW1402、...、KW1406的TF-IDF值。
Step S220: Calculate the TF-IDF value of each keyword included in the webpage based on the number of all webpages in the plurality of websites. The calculation of the TF-IDF value requires the definition of a matrix. In this embodiment, the parent is all web pages 140-1, 140-2, . The
步驟S230,根據網頁包含的每一關鍵字之識別符及網頁包含的每一關鍵字的TF-IDF值產生一矩陣。請參照第8圖,其繪示根據本發明之另一實施例之矩陣MX140-1的示意圖。處理器110根據網頁140-1包含的每一關鍵字KW1401、KW1402、...、KW1406之識別符及網頁140-1包含的每一關鍵字KW1401、KW1402、...、KW1406的TF-IDF值產生矩陣MX140-1。
Step S230, generating a matrix according to the identifier of each keyword contained in the webpage and the TF-IDF value of each keyword contained in the webpage. Please refer to FIG. 8, which shows a schematic diagram of a matrix MX 140-1 according to another embodiment of the present invention. The
步驟S240,根據矩陣及分類名稱訓練網頁分類模型。處理器110根據矩陣MX140-1及分類名稱CN140-1訓練網頁分類模型130。以此類推,步驟S210至步驟S240會重複執行,直到獲得網站140及160中每個網頁140-1、...140-8、140-1...、160-3對應的每一矩陣MX140-1、...、MX140-8、MX160-1、...、MX160-3
及分類名稱CN140-1、...、CN140-8、CN160-1、...、CN160-3,以訓練網頁分類模型130。
Step S240, training a webpage classification model according to the matrix and the classification name. The
如此一來,本案所提出之網頁的自動分類方法,可對訓練一網頁分類模型以準確地產生網頁的分類名稱。 In this way, the automatic classification method of webpages proposed in this case can train a webpage classification model to accurately generate the classification names of webpages.
請參照第1、3、4及9圖。第9圖繪示根據本發明另一實施例之網頁的自動分類方法的流程圖。以下以網站120之網頁120-1為瀏覽過的網頁,且網頁120-1未被儲存為例。 Please refer to Figures 1, 3, 4 and 9. FIG. 9 is a flow chart of a method for automatically classifying webpages according to another embodiment of the present invention. The following takes the webpage 120-1 of the website 120 as the browsed webpage, and the webpage 120-1 is not stored as an example.
步驟S310,判斷一已瀏覽過的網頁是否已被儲存。若是,則結束流程;若否,則執行步驟S320。處理器110判斷網頁120-1為瀏覽過的網頁,且網頁120-1未被儲存,接著執行步驟S320。
In step S310, it is determined whether a browsed webpage has been stored. If yes, end the process; if not, execute step S320. The
步驟S320,當已瀏覽過的網頁未被儲存時,使用應用程式介面擷取已瀏覽過的網頁包含的複數個關鍵字,並給予已瀏覽過的網頁的每一關鍵字一識別符。處理器110使用應用程式介面擷取已瀏覽過的網頁120-1包含的複數個關鍵字KW1201、KW1202、...、KW1205,並給予已瀏覽過的網頁120-1包含的每一關鍵字KW1201、KW1202、...、KW1205一識別符。
Step S320 , when the browsed webpages are not stored, use the application program interface to retrieve a plurality of keywords contained in the browsed webpages, and assign an identifier to each keyword of the browsed webpages. The
步驟S330,基於已瀏覽過的網頁所屬的網站中所有網頁的數量,計算已瀏覽過的網頁的每一關鍵字的TF-IDF值。TF-IDF值的計算需要定義一母體。在此實施例中,母體為已瀏覽過的網頁120-1所屬的網站120中的所有網頁120-1、120-2、...、120-10。處理器110基於網站120中所有網頁120-1、120-2、...、
120-10的數量(10),計算已瀏覽過的網頁120-1包含的每一關鍵字KW1201、KW1202、...、KW1205的TF-IDF值。
Step S330: Calculate the TF-IDF value of each keyword of the browsed webpage based on the number of all webpages in the website to which the browsed webpage belongs. The calculation of the TF-IDF value requires the definition of a matrix. In this embodiment, the parent is all the web pages 120-1, 120-2, . . . , 120-10 in the website 120 to which the web page 120-1 that has been viewed belongs. The
步驟S340,根據已瀏覽過的網頁的每一關鍵字的識別符以及已瀏覽過的網頁的每一關鍵字的TF-IDF值產生矩陣。處理器110根據已瀏覽過的網頁120-1包含的每一關鍵字KW1201、KW1202、...、KW1205之識別符及已瀏覽過的網頁120-1包含的每一關鍵字KW1201、KW1202、...、KW1205的TF-IDF值產生矩陣MX。
Step S340, generating a matrix according to the identifier of each keyword of the browsed webpage and the TF-IDF value of each keyword of the browsed webpage. The
步驟S350,將矩陣輸入至網頁分類模型以產生預測的分類名稱。處理器110將矩陣MX輸入至網頁分類模型130以產生一預測的分類名稱PCN。
Step S350, input the matrix into the webpage classification model to generate predicted classification names. The
步驟S360,以預測的分類名稱儲存已瀏覽過的網頁至一資料庫。處理器110以預測的分類名稱PCN儲存已瀏覽過的網頁120-1至一資料庫(未繪示)。資料庫用以儲存已儲存過的網頁及其分類名稱。
In step S360, the browsed web pages are stored in a database with the predicted category names. The
步驟S370,根據資料庫中各分類名稱之網頁的數量識別出一偏好資訊,並推薦與偏好資訊相關之廣告。處理器110選擇網頁數量最多的分類名稱作為偏好資訊,並推薦與偏好資訊相關之廣告。例如在資料庫中,分類名稱「運動類新聞」的網頁的數量最多,則以「運動類新聞」作為偏好資訊,推薦與「運動類新聞」相關之廣告(例如中華職棒開幕戰的新聞資訊)。在一
實施例中,資料庫可根據不同使用者來區分已儲存的網頁及其分類名稱。
In step S370, a preference information is identified according to the number of web pages of each category name in the database, and advertisements related to the preference information are recommended. The
如此一來,本案所提出之網頁的自動分類方法,可依據不同使用者識別出不同的偏好資訊。 In this way, the automatic classification method of web pages proposed in this case can identify different preference information according to different users.
請參照第10、11、12圖。第10圖繪示根據本發明另一實施例之網頁的自動分類系統100與網站180的方塊圖。第11圖繪示根據本發明之另一實施例之網頁的自動分類方法的流程圖。第12圖繪示根據本發明之一實施例之文章180-11的示意圖。在此實施例中,網頁的自動分類系統100可判斷網站180之網頁180-1、180-2中是否有具有一文章分類名稱之文章被發佈。以下以具有一文章分類名稱CN180-11之一文章180-11在網站180之網頁180-1中被發佈為例。網頁180-2中包含多個文章180-21、180-22、180-23。
Please refer to Figures 10, 11 and 12. FIG. 10 is a block diagram of an
步驟S410,判斷具有一文章分類名稱之一文章是否被發佈。若是,則執行步驟S420;若否,則結束流程。處理器110判斷具有一文章分類名稱CN180-11之一文章180-11被發佈,接著執行步驟S420。
Step S410, judging whether an article with an article category name is published. If yes, go to step S420; if no, end the process. The
步驟S420,當具有文章分類名稱之文章被發佈時,使用應用程式介面擷取文章包含的複數個關鍵字,並給予文章包含的每一關鍵字一識別符。當具有文章分類名稱CN180-11之文章180-11被發佈時,處理器110使用應用程式介面擷取文章180-11包含的複數個關鍵字KW1801、KW1802、...、KW1806,並給予文
章180-11包含的每一關鍵字KW1801、KW1802、...、KW1806一識別符。
In step S420, when the article with the article category name is published, the application program interface is used to extract a plurality of keywords included in the article, and an identifier is given to each keyword included in the article. When the article 180-11 with the article classification name CN 180-11 is published, the
步驟S430,基於文章所屬的網站中所有文章的數量,計算該文章包含的每一關鍵字的TF-IDF值。TF-IDF值的計算需要定義一母體。在此實施例中,母體為網站180中的所有文章180-11、180-21、180-22、180-23。處理器110基於網站180中所有文章180-11、180-21、180-22、180-23的數量(4),計算文章180-11包含的每一關鍵字KW1801、KW1802、...、KW1806的TF-IDF值。
Step S430, based on the number of all articles in the website to which the article belongs, calculate the TF-IDF value of each keyword included in the article. The calculation of the TF-IDF value requires the definition of a matrix. In this example, the parent is all articles 180-11, 180-21, 180-22, 180-23 in website 180. The
步驟S440,根據文章包含的每一關鍵字的識別符ID以及文章包含的每一關鍵字的TF-IDF值產生矩陣。處理器110根據文章180-11包含的每一關鍵字KW1801、KW1802、...、KW1806之識別符及文章180-11包含的每一關鍵字KW1801、KW1802、...、KW1806的TF-IDF值產生矩陣MX180-11。
Step S440, generating a matrix according to the identifier ID of each keyword included in the article and the TF-IDF value of each keyword included in the article. The
步驟S450,將矩陣輸入至網頁分類模型以產生預測的分類名稱。處理器110將矩陣MX180-11輸入至網頁分類模型130以產生一預測的分類名稱PCN180-11。
Step S450, input the matrix into the webpage classification model to generate predicted classification names. The
步驟S460,當文章分類名稱與預測的分類名稱不同時,以預測的分類名稱發佈文章。處理器110判斷文章分類名稱CN180-11與預測的分類名稱PCN180-11是否相同,當文章分類名稱CN180-11與預測的分類名稱PCN180-11不同時,以預測的分類名稱PCN180-11發佈文章180-11。
Step S460, when the article category name is different from the predicted category name, publish the article with the predicted category name. The
如此一來,本案所提出之網頁的自動分類方法,可對發佈之文章所包含之每一關鍵字對應的識別符即TF-IDF值產生一矩陣,並輸入至已訓練的網頁分類模型以準確地產生發佈之文章的分類名稱。 In this way, the automatic classification method of web pages proposed in this case can generate a matrix for the identifier corresponding to each keyword contained in the published article, that is, the TF-IDF value, and input it into the trained web page classification model to accurately Generates the category name of the published article.
請參照第1及13圖。第13圖繪示根據本發明之另一實施例之網頁的自動分類方法的流程圖。步驟S510至步驟S550分別與第2圖之步驟S110至步驟S150類似,在此不多贅述。在處理器110以預測的分類名稱PCN儲存網頁120-1之後,執行步驟S560。
Please refer to Figures 1 and 13. FIG. 13 is a flowchart illustrating a method for automatically classifying webpages according to another embodiment of the present invention. Steps S510 to S550 are respectively similar to steps S110 to S150 in FIG. 2 , and are not repeated here. After the
步驟S560,判斷已儲存的網頁的預測的分類名稱是否被更改。若是,則執行步驟S570;若否,則結束流程。處理器110判斷已儲存的網頁120-1的預測的分類名稱PCN被更改,則執行步驟S570。
Step S560, it is determined whether the predicted category name of the stored webpage has been changed. If yes, go to step S570; if no, end the process. The
步驟S570,當已儲存的網頁的預測的分類名稱被更改,則根據矩陣及更改後的分類名稱訓練網頁分類模型。當已儲存的網頁120-1的預測的分類名稱PCN被更改,表示使用者不滿意網頁分類模型130的預測的分類名稱,則處理器110根據矩陣MX及更改後的分類名稱訓練網頁分類模型130。
In step S570, when the predicted category name of the stored webpage is changed, the webpage classification model is trained according to the matrix and the changed category name. When the predicted category name PCN of the stored webpage 120-1 is changed, indicating that the user is not satisfied with the predicted category name of the
如此一來,本案所提出之網頁的自動分類方法,可判斷預測的分類名稱是否被更改,來優化網頁分類模型。 In this way, the automatic classification method of webpages proposed in this case can determine whether the predicted classification names have been changed, so as to optimize the webpage classification model.
綜上所述,雖然本發明已以實施例揭露如上,然其並非用以限定本發明。本發明所屬技術領域中具有通常知識者, 在不脫離本發明之精神和範圍內,當可作各種之更動與潤飾。因此,本發明之保護範圍當視後附之申請專利範圍所界定者為準。 To sum up, although the present invention has been disclosed by the above embodiments, it is not intended to limit the present invention. Those with ordinary knowledge in the technical field to which the present invention pertains, Various changes and modifications may be made without departing from the spirit and scope of the present invention. Therefore, the protection scope of the present invention shall be determined by the scope of the appended patent application.
S110,S120,S130,S140,S150:步驟 S110, S120, S130, S140, S150: Steps
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW109138812A TWI757957B (en) | 2020-11-06 | 2020-11-06 | Automatic classification method and system of webpages |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW109138812A TWI757957B (en) | 2020-11-06 | 2020-11-06 | Automatic classification method and system of webpages |
Publications (2)
Publication Number | Publication Date |
---|---|
TWI757957B true TWI757957B (en) | 2022-03-11 |
TW202219794A TW202219794A (en) | 2022-05-16 |
Family
ID=81710610
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW109138812A TWI757957B (en) | 2020-11-06 | 2020-11-06 | Automatic classification method and system of webpages |
Country Status (1)
Country | Link |
---|---|
TW (1) | TWI757957B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107169020A (en) * | 2017-04-07 | 2017-09-15 | 南京邮电大学 | A kind of orientation web retrieval method based on keyword |
CN110516074A (en) * | 2019-10-23 | 2019-11-29 | 中国人民解放军国防科技大学 | Website theme classification method and device based on deep learning |
-
2020
- 2020-11-06 TW TW109138812A patent/TWI757957B/en active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107169020A (en) * | 2017-04-07 | 2017-09-15 | 南京邮电大学 | A kind of orientation web retrieval method based on keyword |
CN110516074A (en) * | 2019-10-23 | 2019-11-29 | 中国人民解放军国防科技大学 | Website theme classification method and device based on deep learning |
Also Published As
Publication number | Publication date |
---|---|
TW202219794A (en) | 2022-05-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8868539B2 (en) | Search equalizer | |
US7516124B2 (en) | Interactive search engine | |
US10515147B2 (en) | Using statistical language models for contextual lookup | |
US11036801B1 (en) | Indexing and presenting content using latent interests | |
US8554756B2 (en) | Integrating social network data with search results | |
JP5224868B2 (en) | Information recommendation device and information recommendation method | |
KR101368594B1 (en) | Related-word registration device, information processing device, related-word registration method, and recording medium | |
TWI582619B (en) | Method and apparatus for providing referral words | |
US9311372B2 (en) | Product record normalization system with efficient and scalable methods for discovering, validating, and using schema mappings | |
JP2013517563A (en) | User communication analysis system and method | |
US20200134511A1 (en) | Systems and methods for identifying documents with topic vectors | |
JP2007018285A (en) | System, method, device, and program for providing information | |
JP6664599B2 (en) | Ambiguity evaluation device, ambiguity evaluation method, and ambiguity evaluation program | |
US20190108235A1 (en) | Alternative query suggestion in electronic searching | |
EP2720156B1 (en) | Information processing device, information processing method, program for information processing device, and recording medium | |
US20160299951A1 (en) | Processing a search query and retrieving targeted records from a networked database system | |
TWI461942B (en) | An ad management apparatus, an advertisement selecting apparatus, an advertisement management method, an advertisement management program, and a recording medium on which an advertisement management program is recorded | |
JP4939637B2 (en) | Information providing apparatus, information providing method, program, and information recording medium | |
US11282124B1 (en) | Automated identification of item attributes relevant to a browsing session | |
CN110377701B (en) | Hot word processing method and device, electronic equipment and storage medium | |
CN102024050A (en) | Web browsing method | |
JP4640554B2 (en) | Server apparatus, information processing method, and program | |
TWI399657B (en) | A provider, a method of providing information, a program, and an information recording medium | |
TWI757957B (en) | Automatic classification method and system of webpages | |
JP6576534B1 (en) | Information display program, information display method, information display device, and information processing system |