TWI757957B - Automatic classification method and system of webpages - Google Patents

Automatic classification method and system of webpages Download PDF

Info

Publication number
TWI757957B
TWI757957B TW109138812A TW109138812A TWI757957B TW I757957 B TWI757957 B TW I757957B TW 109138812 A TW109138812 A TW 109138812A TW 109138812 A TW109138812 A TW 109138812A TW I757957 B TWI757957 B TW I757957B
Authority
TW
Taiwan
Prior art keywords
webpage
keywords
article
matrix
identifier
Prior art date
Application number
TW109138812A
Other languages
Chinese (zh)
Other versions
TW202219794A (en
Inventor
陳冠儒
陳良其
Original Assignee
宏碁股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 宏碁股份有限公司 filed Critical 宏碁股份有限公司
Priority to TW109138812A priority Critical patent/TWI757957B/en
Application granted granted Critical
Publication of TWI757957B publication Critical patent/TWI757957B/en
Publication of TW202219794A publication Critical patent/TW202219794A/en

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

An automatic classification method and system of webpages is provided. The automatic classification method includes the following steps. A plurality of keywords contained in a webpage of a website is extracted using an application programming interface (API), and an identifier (ID) is given to each keywords contained in the webpage. A TF-IDF value of each keywords contained in the webpage is calculated based on the number of all webpages in the website. A matrix is generated according to the identifier of each keywords contained in the webpage and the TF-IDF value of each keywords contained in the webpage. The matrix is inputted into a webpage classification model to generate a predicted category name. The webpage is saved with the predicted category name.

Description

網頁的自動分類方法及系統Web page automatic classification method and system

本發明是有關於一種自動分類方法及系統,且特別是有關於一種網頁的自動分類方法及系統。 The present invention relates to an automatic classification method and system, and in particular, to an automatic classification method and system of web pages.

網路已成為生活中不可或缺的部分,人們時常透過電腦瀏覽網頁,當瀏覽到喜歡的網頁或重要的網頁時,可透過瀏覽器的功能儲存網頁,例如將網頁儲存在「我的最愛」中,以便下次打開瀏覽器時,可透過儲存在「我的最愛」中的網頁快速瀏覽儲存的網頁。 The Internet has become an indispensable part of life. People often browse web pages through computers. When they browse to favorite web pages or important web pages, they can save web pages through the browser's function, such as saving web pages in "My Favorites". , so that the next time you open your browser, you can quickly browse the saved pages from the pages saved in Favorites.

但是,在儲存網頁時,使用者常常需花很多時間想網頁的分類名稱,若分類名稱不準確,下次打開瀏覽器的時,使用者很難快速找到先前儲存的網頁進行瀏覽,造成使用不便。 However, when storing web pages, users often spend a lot of time thinking about the category names of the web pages. If the category names are inaccurate, it will be difficult for users to quickly find the previously stored web pages for browsing when the browser is opened next time, resulting in inconvenience. .

因此,如何對網頁提供準確的分類名稱,已成為業界努力的方向。 Therefore, how to provide accurate classification names for web pages has become the direction of the industry's efforts.

本發明係有關於一種網頁的自動分類方法及系統。 The present invention relates to an automatic classification method and system for web pages.

根據本發明之一實施例,提出一種網頁的自動分類方法。網頁的自動分類方法包括以下步驟。使用一應用程式介面(API)擷取一網站中之一網頁包含的複數個關鍵字,並給予網頁包含的每一關鍵字一識別符(ID)。以網站中所有網頁作為母體,計算網頁包含的每一關鍵字的TF-IDF值。根據網頁包含的每一關鍵字之識別符及網頁包含的每一關鍵字的TF-IDF值產生一矩陣。將矩陣輸入至一網頁分類模型以產生一預測的分類名稱。以預測的分類名稱儲存網頁。 According to an embodiment of the present invention, an automatic classification method for web pages is provided. The automatic classification method of web pages includes the following steps. An application programming interface (API) is used to retrieve a plurality of keywords contained in a webpage in a website, and an identifier (ID) is given to each keyword contained in the webpage. Taking all webpages in the website as the parent, calculate the TF-IDF value of each keyword contained in the webpage. A matrix is generated according to the identifier of each keyword contained in the web page and the TF-IDF value of each keyword contained in the web page. The matrix is input to a web page classification model to generate a predicted category name. Save the page with the predicted category name.

根據本發明之另一實施例,提出一種網頁的自動分類系統。網頁的自動分類系統包括一處理器及一網頁分類模型。處理器用以使用一應用程式介面(API)擷取一網站中之一網頁包含的複數個關鍵字,並給予網頁包含的每一關鍵字一識別符(ID)。處理器用以以網站中所有網頁作為母體,計算網頁包含的每一關鍵字的TF-IDF值。處理器用以根據網頁包含的每一關鍵字之識別符及網頁包含的每一關鍵字的TF-IDF值產生一矩陣。處理器用以將矩陣輸入至網頁分類模型以產生一預測的分類名稱。處理器用以以預測的分類名稱儲存網頁。 According to another embodiment of the present invention, an automatic classification system for web pages is provided. The automatic classification system for webpages includes a processor and a webpage classification model. The processor is used for using an application programming interface (API) to retrieve a plurality of keywords contained in a webpage in a website, and assigns an identifier (ID) to each keyword contained in the webpage. The processor is used for calculating the TF-IDF value of each keyword included in the webpage by taking all the webpages in the website as the parent. The processor is used for generating a matrix according to the identifier of each keyword contained in the web page and the TF-IDF value of each keyword contained in the web page. The processor is used to input the matrix to the webpage classification model to generate a predicted classification name. The processor is used for storing the web page with the predicted category name.

為了對本發明之上述及其他方面有更佳的瞭解,下文特舉實施例,並配合所附圖式詳細說明如下: In order to have a better understanding of the above-mentioned and other aspects of the present invention, the following specific examples are given and described in detail in conjunction with the accompanying drawings as follows:

100:自動分類系統 100: Automatic Classification System

110:處理器 110: Processor

120-1,120-2,120-10:網頁 120-1, 120-2, 120-10: web page

120:網站 120: Website

130:網頁分類模型 130: Web Page Classification Model

140,160:網站 140,160: Website

140-1,140-2,140-8,160-1,160-2,160-3:網頁 140-1, 140-2, 140-8, 160-1, 160-2, 160-3: web page

180:網站 180: Website

180-1,180-2:網頁 180-1, 180-2: web pages

180-11,180-21,180-22,180-23:文章 180-11, 180-21, 180-22, 180-23: Articles

API:應用程式介面 API: Application Programming Interface

CN120-1,CN140-1,CN140-8,CN160-1,CN160-3,CN180-11:分類名稱 CN 120-1, CN 140-1 , CN 140-8 , CN 160-1 , CN 160-3 , CN 180-11 : Classification name

KW1201,KW1202,KW1205,KW1401,KW1402,KW1406, KW 1201 ,KW 1202 ,KW 1205, KW 1401 ,KW 1402 ,KW 1406,

KW1801,KW1802,KW1806:關鍵字 KW 1801 , KW 1802 , KW 1806 : Keywords

PCN,PCN180-11:預測的分類名稱 PCN, PCN 180-11 : Predicted class names

MX,MX140-1,MX140-8,MX160-1,MX160-3,MX180-11:矩陣 MX,MX 140-1 ,MX 140-8 ,MX 160-1 ,MX 160-3 ,MX 180-11 : Matrix

S110,S120,S130,S140,S150,S210,S220,S230,S240,S310,S320,S330,S340,S350,S360,S370,S410,S420,S430,S440,S450,S460,S510,S520,S530,S540,S550,S560,S570:步驟 S110,S120,S130,S140,S150,S210,S220,S230,S240,S310,S320,S330,S340,S350,S360,S370,S410,S420,S430,S440,S450,S460,S510,S520,S530, S540, S550, S560, S570: Steps

第1圖繪示根據本發明一實施例之網頁的自動分類系統與網站的方塊圖。 FIG. 1 shows a block diagram of an automatic classification system for webpages and a website according to an embodiment of the present invention.

第2圖繪示根據本發明之一實施例之網頁的自動分類方法的流程圖。 FIG. 2 shows a flow chart of a method for automatically classifying webpages according to an embodiment of the present invention.

第3圖繪示根據本發明之一實施例之網頁的示意圖。 FIG. 3 shows a schematic diagram of a web page according to an embodiment of the present invention.

第4圖繪示根據本發明一實施例之矩陣的示意圖。 FIG. 4 is a schematic diagram of a matrix according to an embodiment of the present invention.

第5圖繪示根據本發明另一實施例之網頁的自動分類系統與網站的方塊圖。 FIG. 5 shows a block diagram of an automatic classification system for webpages and a website according to another embodiment of the present invention.

第6圖繪示根據本發明之另一實施例之網頁的自動分類方法中網頁分類模型130的訓練方法的流程圖。 FIG. 6 is a flowchart illustrating a training method of the webpage classification model 130 in the automatic webpage classification method according to another embodiment of the present invention.

第7圖繪示根據本發明之另一實施例之網頁的示意圖。 FIG. 7 is a schematic diagram of a web page according to another embodiment of the present invention.

第8圖繪示根據本發明之另一實施例之矩陣的示意圖。 FIG. 8 is a schematic diagram of a matrix according to another embodiment of the present invention.

第9圖繪示根據本發明另一實施例之網頁的自動分類方法的流程圖。 FIG. 9 is a flow chart of a method for automatically classifying webpages according to another embodiment of the present invention.

第10圖繪示根據本發明另一實施例之網頁的自動分類系統與網站的方塊圖。 FIG. 10 shows a block diagram of an automatic classification system for webpages and a website according to another embodiment of the present invention.

第11圖繪示根據本發明之另一實施例之網頁的自動分類方法的流程圖。 FIG. 11 is a flowchart illustrating a method for automatically classifying webpages according to another embodiment of the present invention.

第12圖繪示根據本發明之一實施例之文章的示意圖。 FIG. 12 shows a schematic diagram of an article according to an embodiment of the present invention.

第13圖繪示根據本發明之另一實施例之網頁的自動分類方法的流程圖。 FIG. 13 is a flowchart illustrating a method for automatically classifying webpages according to another embodiment of the present invention.

請參照第1圖,其繪示根據本發明一實施例之網頁的自動分類系統100與網站120的方塊圖。網頁的自動分類系統100包括一處理器110及一網頁分類模型130。網頁的自動分類系統100例如是一智慧型手機、一平板電腦、一筆記型電腦或一桌上型電腦。網站120包括多個網頁,例如網頁120-1、120-2、...、120-10。網頁的自動分類系統100可瀏覽網站120中的網頁120-1、120-2、...、120-10,也可透過處理器110使用一應用程式介面API擷取網頁120-1、120-2、...、120-10中的資料。 Please refer to FIG. 1 , which shows a block diagram of an automatic web page classification system 100 and a website 120 according to an embodiment of the present invention. The automatic classification system 100 for webpages includes a processor 110 and a webpage classification model 130 . The automatic classification system 100 of web pages is, for example, a smart phone, a tablet computer, a notebook computer or a desktop computer. Website 120 includes a plurality of web pages, such as web pages 120-1, 120-2, . . . , 120-10. The automatic classification system 100 for webpages can browse webpages 120-1, 120-2, . . . , 120-10 in the website 120, and can also retrieve webpages 120-1, 120- 2, ..., information in 120-10.

以下搭配流程圖詳細說明上述各項元件之運作。請參照第2圖,其繪示根據本發明之一實施例之網頁的自動分類方法的流程圖。 The operation of the above components is described in detail with the flow chart below. Please refer to FIG. 2 , which shows a flowchart of a method for automatically classifying webpages according to an embodiment of the present invention.

步驟S110,使用一應用程式介面擷取一網站中之一網頁包含的複數個關鍵字,並給予網頁包含的每一關鍵字一識別符(ID)。請參照第3圖,其繪示根據本發明之一實施例之網頁120-1的示意圖。網頁120-1包含分類名稱CN120-1、及關鍵字KW1201、KW1202、...、KW1205。分類名稱例如為「運動類新聞」或「政治類新聞」..等。關鍵字例如為「中華隊」、「開球」、「全壘打」、「總統」或「市長」...等。處理器110使用應用程式介面擷取網站120中之網頁120-1包含的複數個關鍵字KW1201、KW1202、...、KW1205,並給予網頁120-1包含的每一關鍵字 KW1201、KW1202、...、KW1205一識別符。每一關鍵字KW1201、KW1202、...、KW1205給予不同的識別符。在一實施例中,應用程式介面具有一字典,應用程式介面根據字典給予每一關鍵字KW1201、KW1202、...、KW1205不同的識別符。 Step S110 , using an application programming interface to extract a plurality of keywords contained in a web page of a website, and assign an identifier (ID) to each keyword contained in the web page. Please refer to FIG. 3, which illustrates a schematic diagram of a web page 120-1 according to an embodiment of the present invention. The web page 120-1 contains the category name CN 120-1 , and the keywords KW 1201 , KW 1202 , . . . , KW 1205 . For example, the category name is "sports news" or "political news".. etc. Keywords such as "Chinese Team", "Kickoff", "Home Run", "President" or "Mayor"...etc. The processor 110 retrieves a plurality of keywords KW 1201 , KW 1202 , . , KW 1202 , ..., KW 1205 - an identifier. Each keyword KW 1201 , KW 1202 , . . . , KW 1205 is given a different identifier. In one embodiment, the API has a dictionary, and the API gives each keyword KW 1201 , KW 1202 , . . . , KW 1205 a different identifier according to the dictionary.

步驟S120,基於網站中所有網頁的數量,計算網頁包含的每一關鍵字的TF-IDF值。TF-IDF值的計算需要定義一母體。在此實施例中,母體為網站120中的所有網頁120-1、120-2、...、120-10。處理器110基於網站120中所有網頁120-1、120-2、...、120-10的數量(10),計算網頁120-1包含的每一關鍵字KW1201、KW1202、...、KW1205的TF-IDF值。 Step S120: Calculate the TF-IDF value of each keyword included in the webpage based on the number of all webpages in the website. The calculation of the TF-IDF value requires the definition of a matrix. In this embodiment, the parent is all the web pages 120-1, 120-2, . . . , 120-10 in the website 120. The processor 110 calculates each keyword KW 1201 , KW 1202 , . , TF-IDF value of KW 1205 .

步驟S130,根據網頁包含的每一關鍵字之識別符及網頁包含的每一關鍵字的TF-IDF值產生一矩陣。請參照第4圖,其繪示根據本發明一實施例之矩陣MX的示意圖。處理器110根據網頁120-1包含的每一關鍵字KW1201、KW1202、...、KW1205之識別符及網頁120-1包含的每一關鍵字KW1201、KW1202、...、KW1205的TF-IDF值產生矩陣MX。換句話說,一網頁120-1對應一矩陣MX。 In step S130, a matrix is generated according to the identifier of each keyword contained in the webpage and the TF-IDF value of each keyword contained in the webpage. Please refer to FIG. 4 , which is a schematic diagram of a matrix MX according to an embodiment of the present invention. The processor 110 determines the identifier of each keyword KW 1201 , KW 1202 , . . . , KW 1205 included in the web page 120-1 and each keyword KW 1201 , KW 1202 , . The TF-IDF values of the KW 1205 yield the matrix MX. In other words, a web page 120-1 corresponds to a matrix MX.

步驟S140,將矩陣輸入至網頁分類模型以產生一預測的分類名稱。處理器110將矩陣MX輸入至網頁分類模型130以產生一預測的分類名稱PCN。 Step S140, input the matrix into the webpage classification model to generate a predicted classification name. The processor 110 inputs the matrix MX to the webpage classification model 130 to generate a predicted classification name PCN.

步驟S150,以預測的分類名稱儲存網頁。處理器110以預測的分類名稱PCN儲存網頁120-1。在一實施例中,在執行 步驟S110之前,處理器110判斷網頁是否已先前儲存過,當網頁先前未被儲存過,則執行步驟S110至步驟S150。舉例來說,處理器110在瀏覽器的cookie中建立一自定義欄位來記錄網頁120-1是否已先前儲存過。 In step S150, the webpage is stored with the predicted category name. The processor 110 stores the web page 120-1 with the predicted category name PCN. In one embodiment, executing Before step S110, the processor 110 determines whether the webpage has been previously stored, and when the webpage has not been previously stored, steps S110 to S150 are executed. For example, the processor 110 creates a custom field in the browser's cookie to record whether the web page 120-1 has been previously stored.

如此一來,本案所提出之網頁的自動分類方法,可對一網頁所包含之每一關鍵字對應的識別符及TF-IDF值產生一矩陣,並輸入至已訓練的網頁分類模型以準確地產生網頁的分類名稱。 In this way, the automatic web classification method proposed in this case can generate a matrix for the identifier and TF-IDF value corresponding to each keyword contained in a web page, and input it into the trained web page classification model to accurately Generates the category name of the web page.

請參照第5及6圖。第5圖繪示根據本發明另一實施例之網頁的自動分類系統100與網站140、160的方塊圖。第6圖繪示根據本發明之另一實施例之網頁的自動分類方法中網頁分類模型130的訓練方法的流程圖。網站140包括網頁140-1、140-2、...、140-8。網站160包括網頁160-1、160-2、160-3。為方便說明,以下以兩個網站140、160作為訓練資料訓練網頁分類模型130為例。 Please refer to Figures 5 and 6. FIG. 5 illustrates a block diagram of an automatic classification system 100 for web pages and websites 140 and 160 according to another embodiment of the present invention. FIG. 6 is a flowchart illustrating a training method of the webpage classification model 130 in the automatic webpage classification method according to another embodiment of the present invention. The website 140 includes web pages 140-1, 140-2, . . . , 140-8. Website 160 includes web pages 160-1, 160-2, 160-3. For the convenience of description, the following two websites 140 and 160 are used as training data to train the webpage classification model 130 as an example.

步驟S210,使用應用程式介面擷取網站之網頁包含的複數個關鍵字及一分類名稱,並給予網頁包含的每一關鍵字一識別符。請參照第7圖,其繪示根據本發明之另一實施例之網頁140-1的示意圖。網頁140-1包含分類名稱CN140-1、及關鍵字KW1401、KW1402、...、KW1406。處理器110使用應用程式介面擷取網站140中之網頁140-1包含的複數個關鍵字KW1401、 KW1402、...、KW1406及分類名稱CN140-1,並給予網頁140-1包含的每一關鍵字KW1401、KW1402、...、KW1406一識別符。 In step S210, the application programming interface is used to extract a plurality of keywords and a category name contained in the webpage of the website, and an identifier is given to each keyword contained in the webpage. Please refer to FIG. 7, which shows a schematic diagram of a web page 140-1 according to another embodiment of the present invention. The web page 140-1 contains the category name CN 140-1 , and the keywords KW 1401 , KW 1402 , . . . , KW 1406 . The processor 110 uses the application programming interface to retrieve a plurality of keywords KW 1401 , KW 1402 , . Each keyword of KW 1401 , KW 1402 , . . . , KW 1406 has an identifier.

步驟S220,基於複數個網站中所有網頁的數量,計算網頁包含的每一關鍵字的TF-IDF值。TF-IDF值的計算需要定義一母體。在此實施例中,母體為網站140中的所有網頁140-1、140-2、...、140-8以及網站160中的所有網頁160-1、160-2、160-3。處理器110基於網站140中的所有網頁140-1、140-2、...、140-8以及網站160中的所有網頁160-1、160-2、160-3的數量(11),計算網頁140-1包含的每一關鍵字KW1401、KW1402、...、KW1406的TF-IDF值。 Step S220: Calculate the TF-IDF value of each keyword included in the webpage based on the number of all webpages in the plurality of websites. The calculation of the TF-IDF value requires the definition of a matrix. In this embodiment, the parent is all web pages 140-1, 140-2, . The processor 110 calculates based on the number (11) of all web pages 140-1, 140-2, . The TF-IDF value of each keyword KW 1401 , KW 1402 , . . . , KW 1406 contained in the web page 140-1.

步驟S230,根據網頁包含的每一關鍵字之識別符及網頁包含的每一關鍵字的TF-IDF值產生一矩陣。請參照第8圖,其繪示根據本發明之另一實施例之矩陣MX140-1的示意圖。處理器110根據網頁140-1包含的每一關鍵字KW1401、KW1402、...、KW1406之識別符及網頁140-1包含的每一關鍵字KW1401、KW1402、...、KW1406的TF-IDF值產生矩陣MX140-1Step S230, generating a matrix according to the identifier of each keyword contained in the webpage and the TF-IDF value of each keyword contained in the webpage. Please refer to FIG. 8, which shows a schematic diagram of a matrix MX 140-1 according to another embodiment of the present invention. The processor 110 determines the identifier of each keyword KW 1401 , KW 1402 , . . . , KW 1406 included in the web page 140-1 and each keyword KW 1401 , KW 1402 , . The TF-IDF values of KW 1406 yield a matrix MX 140-1 .

步驟S240,根據矩陣及分類名稱訓練網頁分類模型。處理器110根據矩陣MX140-1及分類名稱CN140-1訓練網頁分類模型130。以此類推,步驟S210至步驟S240會重複執行,直到獲得網站140及160中每個網頁140-1、...140-8、140-1...、160-3對應的每一矩陣MX140-1、...、MX140-8、MX160-1、...、MX160-3 及分類名稱CN140-1、...、CN140-8、CN160-1、...、CN160-3,以訓練網頁分類模型130。 Step S240, training a webpage classification model according to the matrix and the classification name. The processor 110 trains the webpage classification model 130 according to the matrix MX 140-1 and the classification name CN 140-1 . By analogy, steps S210 to S240 are repeated until each matrix MX corresponding to each web page 140-1, . . . 140-8, 140-1, . 140-1 , ..., MX 140-8 , MX 160-1 , ..., MX 160-3 and classification names CN 140-1 , ..., CN 140-8 , CN 160-1 , ... ., CN 160-3 , to train the webpage classification model 130.

如此一來,本案所提出之網頁的自動分類方法,可對訓練一網頁分類模型以準確地產生網頁的分類名稱。 In this way, the automatic classification method of webpages proposed in this case can train a webpage classification model to accurately generate the classification names of webpages.

請參照第1、3、4及9圖。第9圖繪示根據本發明另一實施例之網頁的自動分類方法的流程圖。以下以網站120之網頁120-1為瀏覽過的網頁,且網頁120-1未被儲存為例。 Please refer to Figures 1, 3, 4 and 9. FIG. 9 is a flow chart of a method for automatically classifying webpages according to another embodiment of the present invention. The following takes the webpage 120-1 of the website 120 as the browsed webpage, and the webpage 120-1 is not stored as an example.

步驟S310,判斷一已瀏覽過的網頁是否已被儲存。若是,則結束流程;若否,則執行步驟S320。處理器110判斷網頁120-1為瀏覽過的網頁,且網頁120-1未被儲存,接著執行步驟S320。 In step S310, it is determined whether a browsed webpage has been stored. If yes, end the process; if not, execute step S320. The processor 110 determines that the webpage 120-1 is a browsed webpage and the webpage 120-1 is not stored, and then executes step S320.

步驟S320,當已瀏覽過的網頁未被儲存時,使用應用程式介面擷取已瀏覽過的網頁包含的複數個關鍵字,並給予已瀏覽過的網頁的每一關鍵字一識別符。處理器110使用應用程式介面擷取已瀏覽過的網頁120-1包含的複數個關鍵字KW1201、KW1202、...、KW1205,並給予已瀏覽過的網頁120-1包含的每一關鍵字KW1201、KW1202、...、KW1205一識別符。 Step S320 , when the browsed webpages are not stored, use the application program interface to retrieve a plurality of keywords contained in the browsed webpages, and assign an identifier to each keyword of the browsed webpages. The processor 110 retrieves a plurality of keywords KW 1201 , KW 1202 , . Keywords KW 1201 , KW 1202 , ..., KW 1205 - an identifier.

步驟S330,基於已瀏覽過的網頁所屬的網站中所有網頁的數量,計算已瀏覽過的網頁的每一關鍵字的TF-IDF值。TF-IDF值的計算需要定義一母體。在此實施例中,母體為已瀏覽過的網頁120-1所屬的網站120中的所有網頁120-1、120-2、...、120-10。處理器110基於網站120中所有網頁120-1、120-2、...、 120-10的數量(10),計算已瀏覽過的網頁120-1包含的每一關鍵字KW1201、KW1202、...、KW1205的TF-IDF值。 Step S330: Calculate the TF-IDF value of each keyword of the browsed webpage based on the number of all webpages in the website to which the browsed webpage belongs. The calculation of the TF-IDF value requires the definition of a matrix. In this embodiment, the parent is all the web pages 120-1, 120-2, . . . , 120-10 in the website 120 to which the web page 120-1 that has been viewed belongs. The processor 110 calculates each keyword KW 1201 , KW 1202 included in the browsed webpage 120-1 based on the number (10) of all webpages 120-1, 120-2, . . . , 120-10 in the website 120 , ..., TF-IDF value of KW 1205 .

步驟S340,根據已瀏覽過的網頁的每一關鍵字的識別符以及已瀏覽過的網頁的每一關鍵字的TF-IDF值產生矩陣。處理器110根據已瀏覽過的網頁120-1包含的每一關鍵字KW1201、KW1202、...、KW1205之識別符及已瀏覽過的網頁120-1包含的每一關鍵字KW1201、KW1202、...、KW1205的TF-IDF值產生矩陣MX。 Step S340, generating a matrix according to the identifier of each keyword of the browsed webpage and the TF-IDF value of each keyword of the browsed webpage. The processor 110 is based on the identifiers of each keyword KW 1201 , KW 1202 , . The TF-IDF values of , KW 1202 , ..., KW 1205 generate a matrix MX.

步驟S350,將矩陣輸入至網頁分類模型以產生預測的分類名稱。處理器110將矩陣MX輸入至網頁分類模型130以產生一預測的分類名稱PCN。 Step S350, input the matrix into the webpage classification model to generate predicted classification names. The processor 110 inputs the matrix MX to the webpage classification model 130 to generate a predicted classification name PCN.

步驟S360,以預測的分類名稱儲存已瀏覽過的網頁至一資料庫。處理器110以預測的分類名稱PCN儲存已瀏覽過的網頁120-1至一資料庫(未繪示)。資料庫用以儲存已儲存過的網頁及其分類名稱。 In step S360, the browsed web pages are stored in a database with the predicted category names. The processor 110 stores the browsed webpage 120-1 in a database (not shown) with the predicted category name PCN. The database is used to store the saved web pages and their category names.

步驟S370,根據資料庫中各分類名稱之網頁的數量識別出一偏好資訊,並推薦與偏好資訊相關之廣告。處理器110選擇網頁數量最多的分類名稱作為偏好資訊,並推薦與偏好資訊相關之廣告。例如在資料庫中,分類名稱「運動類新聞」的網頁的數量最多,則以「運動類新聞」作為偏好資訊,推薦與「運動類新聞」相關之廣告(例如中華職棒開幕戰的新聞資訊)。在一 實施例中,資料庫可根據不同使用者來區分已儲存的網頁及其分類名稱。 In step S370, a preference information is identified according to the number of web pages of each category name in the database, and advertisements related to the preference information are recommended. The processor 110 selects the category name with the largest number of web pages as the preference information, and recommends advertisements related to the preference information. For example, in the database, the number of pages with the category name "Sports News" is the largest, and "Sports News" is used as the preferred information to recommend advertisements related to "Sports News" (such as the news about the opening game of Chinese Professional Baseball) ). In a In an embodiment, the database can distinguish stored web pages and their category names according to different users.

如此一來,本案所提出之網頁的自動分類方法,可依據不同使用者識別出不同的偏好資訊。 In this way, the automatic classification method of web pages proposed in this case can identify different preference information according to different users.

請參照第10、11、12圖。第10圖繪示根據本發明另一實施例之網頁的自動分類系統100與網站180的方塊圖。第11圖繪示根據本發明之另一實施例之網頁的自動分類方法的流程圖。第12圖繪示根據本發明之一實施例之文章180-11的示意圖。在此實施例中,網頁的自動分類系統100可判斷網站180之網頁180-1、180-2中是否有具有一文章分類名稱之文章被發佈。以下以具有一文章分類名稱CN180-11之一文章180-11在網站180之網頁180-1中被發佈為例。網頁180-2中包含多個文章180-21、180-22、180-23。 Please refer to Figures 10, 11 and 12. FIG. 10 is a block diagram of an automatic classification system 100 for web pages and a website 180 according to another embodiment of the present invention. FIG. 11 is a flowchart illustrating a method for automatically classifying webpages according to another embodiment of the present invention. FIG. 12 shows a schematic diagram of articles 180-11 according to an embodiment of the present invention. In this embodiment, the automatic classification system 100 of the webpage can determine whether an article with an article classification name is published in the webpages 180-1 and 180-2 of the website 180. The following is an example in which an article 180-11 with an article category name CN 180-11 is published in the web page 180-1 of the website 180. The web page 180-2 contains a plurality of articles 180-21, 180-22, 180-23.

步驟S410,判斷具有一文章分類名稱之一文章是否被發佈。若是,則執行步驟S420;若否,則結束流程。處理器110判斷具有一文章分類名稱CN180-11之一文章180-11被發佈,接著執行步驟S420。 Step S410, judging whether an article with an article category name is published. If yes, go to step S420; if no, end the process. The processor 110 determines that an article 180-11 with an article classification name CN 180-11 is published, and then executes step S420.

步驟S420,當具有文章分類名稱之文章被發佈時,使用應用程式介面擷取文章包含的複數個關鍵字,並給予文章包含的每一關鍵字一識別符。當具有文章分類名稱CN180-11之文章180-11被發佈時,處理器110使用應用程式介面擷取文章180-11包含的複數個關鍵字KW1801、KW1802、...、KW1806,並給予文 章180-11包含的每一關鍵字KW1801、KW1802、...、KW1806一識別符。 In step S420, when the article with the article category name is published, the application program interface is used to extract a plurality of keywords included in the article, and an identifier is given to each keyword included in the article. When the article 180-11 with the article classification name CN 180-11 is published, the processor 110 uses the API to retrieve the plurality of keywords KW 1801 , KW 1802 , . . . , KW 1806 contained in the article 180-11, An identifier is given to each keyword KW 1801 , KW 1802 , . . . , KW 1806 contained in the articles 180-11.

步驟S430,基於文章所屬的網站中所有文章的數量,計算該文章包含的每一關鍵字的TF-IDF值。TF-IDF值的計算需要定義一母體。在此實施例中,母體為網站180中的所有文章180-11、180-21、180-22、180-23。處理器110基於網站180中所有文章180-11、180-21、180-22、180-23的數量(4),計算文章180-11包含的每一關鍵字KW1801、KW1802、...、KW1806的TF-IDF值。 Step S430, based on the number of all articles in the website to which the article belongs, calculate the TF-IDF value of each keyword included in the article. The calculation of the TF-IDF value requires the definition of a matrix. In this example, the parent is all articles 180-11, 180-21, 180-22, 180-23 in website 180. The processor 110 calculates each keyword KW 1801 , KW 1802 ,  … , TF-IDF value of KW 1806 .

步驟S440,根據文章包含的每一關鍵字的識別符ID以及文章包含的每一關鍵字的TF-IDF值產生矩陣。處理器110根據文章180-11包含的每一關鍵字KW1801、KW1802、...、KW1806之識別符及文章180-11包含的每一關鍵字KW1801、KW1802、...、KW1806的TF-IDF值產生矩陣MX180-11Step S440, generating a matrix according to the identifier ID of each keyword included in the article and the TF-IDF value of each keyword included in the article. The processor 110 is based on the identifier of each keyword KW 1801 , KW 1802 , . The TF-IDF values of KW 1806 yield the matrix MX 180-11 .

步驟S450,將矩陣輸入至網頁分類模型以產生預測的分類名稱。處理器110將矩陣MX180-11輸入至網頁分類模型130以產生一預測的分類名稱PCN180-11Step S450, input the matrix into the webpage classification model to generate predicted classification names. The processor 110 inputs the matrix MX 180-11 to the webpage classification model 130 to generate a predicted classification name PCN 180-11 .

步驟S460,當文章分類名稱與預測的分類名稱不同時,以預測的分類名稱發佈文章。處理器110判斷文章分類名稱CN180-11與預測的分類名稱PCN180-11是否相同,當文章分類名稱CN180-11與預測的分類名稱PCN180-11不同時,以預測的分類名稱PCN180-11發佈文章180-11。 Step S460, when the article category name is different from the predicted category name, publish the article with the predicted category name. The processor 110 determines whether the article category name CN 180-11 is the same as the predicted category name PCN 180-11 , and when the article category name CN 180-11 is different from the predicted category name PCN 180-11 , the predicted category name PCN 180 -11 Post article 180-11.

如此一來,本案所提出之網頁的自動分類方法,可對發佈之文章所包含之每一關鍵字對應的識別符即TF-IDF值產生一矩陣,並輸入至已訓練的網頁分類模型以準確地產生發佈之文章的分類名稱。 In this way, the automatic classification method of web pages proposed in this case can generate a matrix for the identifier corresponding to each keyword contained in the published article, that is, the TF-IDF value, and input it into the trained web page classification model to accurately Generates the category name of the published article.

請參照第1及13圖。第13圖繪示根據本發明之另一實施例之網頁的自動分類方法的流程圖。步驟S510至步驟S550分別與第2圖之步驟S110至步驟S150類似,在此不多贅述。在處理器110以預測的分類名稱PCN儲存網頁120-1之後,執行步驟S560。 Please refer to Figures 1 and 13. FIG. 13 is a flowchart illustrating a method for automatically classifying webpages according to another embodiment of the present invention. Steps S510 to S550 are respectively similar to steps S110 to S150 in FIG. 2 , and are not repeated here. After the processor 110 stores the webpage 120-1 with the predicted category name PCN, step S560 is executed.

步驟S560,判斷已儲存的網頁的預測的分類名稱是否被更改。若是,則執行步驟S570;若否,則結束流程。處理器110判斷已儲存的網頁120-1的預測的分類名稱PCN被更改,則執行步驟S570。 Step S560, it is determined whether the predicted category name of the stored webpage has been changed. If yes, go to step S570; if no, end the process. The processor 110 determines that the predicted category name PCN of the stored webpage 120-1 has been changed, and then executes step S570.

步驟S570,當已儲存的網頁的預測的分類名稱被更改,則根據矩陣及更改後的分類名稱訓練網頁分類模型。當已儲存的網頁120-1的預測的分類名稱PCN被更改,表示使用者不滿意網頁分類模型130的預測的分類名稱,則處理器110根據矩陣MX及更改後的分類名稱訓練網頁分類模型130。 In step S570, when the predicted category name of the stored webpage is changed, the webpage classification model is trained according to the matrix and the changed category name. When the predicted category name PCN of the stored webpage 120-1 is changed, indicating that the user is not satisfied with the predicted category name of the webpage classification model 130, the processor 110 trains the webpage classification model 130 according to the matrix MX and the changed category name. .

如此一來,本案所提出之網頁的自動分類方法,可判斷預測的分類名稱是否被更改,來優化網頁分類模型。 In this way, the automatic classification method of webpages proposed in this case can determine whether the predicted classification names have been changed, so as to optimize the webpage classification model.

綜上所述,雖然本發明已以實施例揭露如上,然其並非用以限定本發明。本發明所屬技術領域中具有通常知識者, 在不脫離本發明之精神和範圍內,當可作各種之更動與潤飾。因此,本發明之保護範圍當視後附之申請專利範圍所界定者為準。 To sum up, although the present invention has been disclosed by the above embodiments, it is not intended to limit the present invention. Those with ordinary knowledge in the technical field to which the present invention pertains, Various changes and modifications may be made without departing from the spirit and scope of the present invention. Therefore, the protection scope of the present invention shall be determined by the scope of the appended patent application.

S110,S120,S130,S140,S150:步驟 S110, S120, S130, S140, S150: Steps

Claims (10)

一種網頁的自動分類方法,包括:使用一應用程式介面擷取一網站中之一網頁包含的複數個關鍵字,並給予該網頁包含的每一該些關鍵字一識別符(ID);基於該網站中所有網頁的數量,計算該網頁包含的每一該些關鍵字的TF-IDF值;根據該網頁包含的每一該些關鍵字之識別符及該網頁包含的每一該些關鍵字的TF-IDF值產生一矩陣;將該矩陣輸入至一網頁分類模型以產生一預測分類名稱;以及以該預測分類名稱儲存該網頁;其中該網頁分類模型的一訓練方法包括:使用該應用程式介面擷取該網站之該網頁包含的該些關鍵字及一網頁分類名稱,並給予該網頁包含的每一該些關鍵字一識別符;基於複數個網站中所有網頁的數量,計算該網頁包含的每一該些關鍵字的TF-IDF值;根據該網頁包含的每一該些關鍵字之識別符及該網頁包含的每一該些關鍵字的TF-IDF值產生該矩陣;根據該矩陣及該網頁分類名稱訓練該網頁分類模型。 An automatic classification method for web pages, comprising: using an application programming interface to retrieve a plurality of keywords contained in a web page in a website, and giving each of the keywords contained in the web page an identifier (ID); based on the The number of all the webpages in the website, calculate the TF-IDF value of each of the keywords contained in the webpage; according to the identifier of each of the keywords contained in the webpage and the generating a matrix of TF-IDF values; inputting the matrix into a webpage classification model to generate a predicted classification name; and storing the webpage with the predicted classification name; wherein a training method of the webpage classification model includes: using the application programming interface Retrieve the keywords and a webpage category name contained in the webpage of the website, and give each of the keywords contained in the webpage an identifier; based on the number of all webpages in the plurality of websites, calculate the the TF-IDF value of each of the keywords; generating the matrix according to the identifier of each of the keywords contained in the webpage and the TF-IDF value of each of the keywords contained in the webpage; according to the matrix and The webpage classification name trains the webpage classification model. 如請求項1所述之自動分類方法,其中在使用該應用程式介面擷取該網站之該網頁包含的該些關鍵字之前更包括:判斷該網頁是否已先前儲存過; 當該網頁先前未被儲存過,則執行該自動分類方法。 The automatic classification method according to claim 1, wherein before using the application programming interface to retrieve the keywords included in the webpage of the website, the method further comprises: judging whether the webpage has been previously stored; When the webpage has not been previously stored, the automatic classification method is executed. 如請求項1所述之自動分類方法,其中更包括:判斷一已瀏覽過的網頁是否已被儲存;當該已瀏覽過的網頁未被儲存時,使用該應用程式介面擷取該已瀏覽過的網頁包含的複數個關鍵字,並給予該已瀏覽過的網頁的每一該些關鍵字一識別符;基於該已瀏覽過的網頁所屬的網站中所有網頁的數量,計算該已瀏覽過的網頁的每一該些關鍵字的TF-IDF值;根據該已瀏覽過的網頁的每一該些關鍵字的識別符以及該已瀏覽過的網頁的每一該些關鍵字的TF-IDF值產生該矩陣;將該矩陣輸入至該網頁分類模型以產生該預測分類名稱;以該預測的分類名稱儲存該已瀏覽過的網頁至一資料庫;根據該資料庫中各分類名稱之網頁的數量識別出一偏好資訊,並推薦與該偏好資訊相關之廣告。 The automatic classification method according to claim 1, further comprising: judging whether a browsed webpage has been saved; when the browsed webpage has not been saved, using the application program interface to retrieve the browsed webpage The webpage contains a plurality of keywords, and each of the keywords of the browsed webpage is given an identifier; based on the number of all webpages in the website to which the browsed webpage belongs, calculate the browsed webpage. TF-IDF value of each of the keywords of the webpage; according to the identifier of each of the keywords of the browsed webpage and the TF-IDF value of each of the keywords of the browsed webpage generating the matrix; inputting the matrix into the webpage classification model to generate the predicted category name; storing the visited webpage in a database with the predicted category name; according to the number of webpages of each category name in the database Identify a preference information and recommend advertisements related to the preference information. 如請求項1所述之自動分類方法,其中更包括:判斷具有一文章分類名稱之一文章是否被發佈;當具有該文章分類名稱之該文章被發佈時,使用該應用程式介面擷取該文章包含的複數個關鍵字,並給予該文章包含的每一該些關鍵字一識別符;基於該文章所屬的網站中所有文章的數量,計算該文章包含的每一該些關鍵字的TF-IDF值; 根據該文章包含的每一該些關鍵字的識別符ID以及該文章包含的每一該些關鍵字的TF-IDF值產生該矩陣;將該矩陣輸入至該網頁分類模型以產生該預測分類名稱;當該文章分類名稱與該預測分類名稱不同時,以該預測的分類名稱發佈該文章。 The automatic classification method according to claim 1, further comprising: judging whether an article with an article category name is published; when the article with the article category name is published, using the application program interface to retrieve the article A plurality of keywords included, and an identifier is given to each of the keywords included in the article; based on the number of all articles in the website to which the article belongs, calculate the TF-IDF of each of the keywords included in the article value; Generate the matrix according to the identifier ID of each of the keywords contained in the article and the TF-IDF value of each of the keywords contained in the article; input the matrix into the webpage classification model to generate the predicted category name ; When the article category name is different from the predicted category name, publish the article with the predicted category name. 如請求項1所述之自動分類方法,其中更包括:判斷已儲存的該網頁的該預測的分類名稱是否被更改;以及當已儲存的該網頁的該預測分類名稱被更改,則根據該矩陣及更改後的分類名稱訓練該網頁分類模型。 The automatic classification method according to claim 1, further comprising: judging whether the predicted classification name of the stored webpage has been changed; and when the stored predicted classification name of the webpage has been changed, according to the matrix and the changed category name to train the webpage classification model. 一種網頁的自動分類系統,包括:一處理器,用以使用一應用程式介面擷取一網站中之一網頁包含的複數個關鍵字,並給予該網頁包含的每一該些關鍵字一識別符(ID),基於該網站中所有網頁的數量,計算該網頁包含的每一該些關鍵字的TF-IDF值,根據該網頁包含的每一該些關鍵字之識別符及該網頁包含的每一該些關鍵字的TF-IDF值產生一矩陣;以及一網頁分類模型,用以根據該矩陣產生一預測分類名稱;該處理器以該預測分類名稱儲存該網頁;以及其中該處理器更用以: 使用該應用程式介面擷取該網站之該網頁包含的該些關鍵字及一網頁分類名稱,並給予該網頁包含的每一該些關鍵字一識別符,基於複數個網站中所有網頁的數量,計算該網頁包含的每一該些關鍵字的TF-IDF值,根據該網頁包含的每一該些關鍵字之識別符及該網頁包含的每一該些關鍵字的TF-IDF值產生該矩陣,以及根據該矩陣及該網頁分類名稱訓練該網頁分類模型。 An automatic classification system for web pages, comprising: a processor for retrieving a plurality of keywords contained in a web page in a website using an application programming interface, and giving each of the keywords contained in the web page an identifier (ID), based on the number of all webpages in the website, calculate the TF-IDF value of each of the keywords contained in the webpage, according to the identifier of each of the keywords contained in the webpage and each of the keywords contained in the webpage. A matrix is generated from the TF-IDF values of the keywords; and a webpage classification model is used to generate a predicted category name according to the matrix; the processor stores the webpage with the predicted category name; and wherein the processor further uses by: extracting the keywords and a page category name contained in the webpage of the website using the application programming interface, and giving each of the keywords contained in the webpage an identifier, based on the number of all webpages in the plurality of websites, Calculate the TF-IDF value of each of the keywords contained in the webpage, and generate the matrix according to the identifier of each of the keywords contained in the webpage and the TF-IDF value of each of the keywords contained in the webpage , and train the webpage classification model according to the matrix and the webpage classification name. 如請求項6所述之自動分類系統,其中該處理器更用以判斷該網頁是否已先前儲存過。 The automatic classification system of claim 6, wherein the processor is further configured to determine whether the webpage has been previously stored. 如請求項6所述之自動分類系統,其中該處理器更用以判斷一已瀏覽過的網頁是否已被儲存,當該已瀏覽過的網頁未被儲存時,使用該應用程式介面擷取該已瀏覽過的網頁包含的複數個關鍵字,並給予該已瀏覽過的網頁的每一該些關鍵字一識別符,基於該已瀏覽過的網頁所屬的網站中所有網頁的數量,計算該已瀏覽過的網頁的每一該些關鍵字的TF-IDF值,根據該已瀏覽過的網頁的每一該些關鍵字的識別符以及該已瀏覽過的網頁的每一該些關鍵字的TF-IDF值產生該矩陣,將該矩陣輸入至該網頁分類模型以產生該預測分類名稱,以該預測分類名稱儲存該已瀏覽過的網頁至一資料庫,以及 根據該資料庫中各分類名稱之網頁的數量識別出一偏好資訊,並推薦與該偏好資訊相關之廣告。 The automatic classification system of claim 6, wherein the processor is further configured to determine whether a browsed webpage has been stored, and when the browsed webpage has not been stored, the application program interface is used to retrieve the A plurality of keywords contained in the browsed webpage, and an identifier is given to each of the keywords in the browsed webpage. Based on the number of all webpages in the website to which the browsed webpage belongs, calculate the The TF-IDF value of each of the keywords of the browsed webpage, according to the identifier of each of the keywords of the browsed webpage and the TF of each of the keywords of the browsed webpage generating the matrix of IDF values, inputting the matrix to the webpage classification model to generate the predicted classification name, storing the viewed webpage in a database with the predicted classification name, and Identify a preference information according to the number of web pages of each category name in the database, and recommend advertisements related to the preference information. 如請求項6所述之自動分類系統,其中該處理器更用以判斷具有一文章分類名稱之一文章是否被發佈,當具有該文章分類名稱之該文章被發佈時,使用該應用程式介面擷取該文章包含的複數個關鍵字,並給予該文章包含的每一該些關鍵字一識別符,基於該文章所屬的網站中所有文章的數量,計算該文章包含的每一該些關鍵字的TF-IDF值,根據該文章包含的每一該些關鍵字的識別符ID以及該文章包含的每一該些關鍵字的TF-IDF值產生該矩陣,將該矩陣輸入至該網頁分類模型以產生該預測分類名稱,以及當該文章分類名稱與該預測分類名稱不同時,以該預測分類名稱發佈該文章。 The automatic classification system of claim 6, wherein the processor is further configured to determine whether an article with an article category name is published, and when the article with the article category name is published, use the application program interface to retrieve Take a plurality of keywords contained in the article, and give each of the keywords contained in the article an identifier, and based on the number of all articles in the website to which the article belongs, calculate the number of keywords contained in the article. TF-IDF value, the matrix is generated according to the identifier ID of each of the keywords contained in the article and the TF-IDF value of each of the keywords contained in the article, and the matrix is input into the webpage classification model to The predicted category name is generated, and when the article category name is different from the predicted category name, the article is published under the predicted category name. 如請求項6所述之自動分類系統,其中該處理器更用以判斷已儲存的該網頁的該預測分類名稱是否被更改,以及當已儲存的該網頁的該預測分類名稱被更改,則根據該矩陣及更改後的分類名稱訓練該網頁分類模型。 The automatic classification system of claim 6, wherein the processor is further configured to determine whether the stored predicted classification name of the webpage has been changed, and when the stored predicted classification name of the webpage has been changed, according to The matrix and the changed category names train the webpage classification model.
TW109138812A 2020-11-06 2020-11-06 Automatic classification method and system of webpages TWI757957B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW109138812A TWI757957B (en) 2020-11-06 2020-11-06 Automatic classification method and system of webpages

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW109138812A TWI757957B (en) 2020-11-06 2020-11-06 Automatic classification method and system of webpages

Publications (2)

Publication Number Publication Date
TWI757957B true TWI757957B (en) 2022-03-11
TW202219794A TW202219794A (en) 2022-05-16

Family

ID=81710610

Family Applications (1)

Application Number Title Priority Date Filing Date
TW109138812A TWI757957B (en) 2020-11-06 2020-11-06 Automatic classification method and system of webpages

Country Status (1)

Country Link
TW (1) TWI757957B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169020A (en) * 2017-04-07 2017-09-15 南京邮电大学 A kind of orientation web retrieval method based on keyword
CN110516074A (en) * 2019-10-23 2019-11-29 中国人民解放军国防科技大学 Website theme classification method and device based on deep learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169020A (en) * 2017-04-07 2017-09-15 南京邮电大学 A kind of orientation web retrieval method based on keyword
CN110516074A (en) * 2019-10-23 2019-11-29 中国人民解放军国防科技大学 Website theme classification method and device based on deep learning

Also Published As

Publication number Publication date
TW202219794A (en) 2022-05-16

Similar Documents

Publication Publication Date Title
US8868539B2 (en) Search equalizer
US7516124B2 (en) Interactive search engine
US10515147B2 (en) Using statistical language models for contextual lookup
US11036801B1 (en) Indexing and presenting content using latent interests
US8554756B2 (en) Integrating social network data with search results
JP5224868B2 (en) Information recommendation device and information recommendation method
KR101368594B1 (en) Related-word registration device, information processing device, related-word registration method, and recording medium
TWI582619B (en) Method and apparatus for providing referral words
US9311372B2 (en) Product record normalization system with efficient and scalable methods for discovering, validating, and using schema mappings
JP2013517563A (en) User communication analysis system and method
US20200134511A1 (en) Systems and methods for identifying documents with topic vectors
JP2007018285A (en) System, method, device, and program for providing information
JP6664599B2 (en) Ambiguity evaluation device, ambiguity evaluation method, and ambiguity evaluation program
US20190108235A1 (en) Alternative query suggestion in electronic searching
EP2720156B1 (en) Information processing device, information processing method, program for information processing device, and recording medium
US20160299951A1 (en) Processing a search query and retrieving targeted records from a networked database system
TWI461942B (en) An ad management apparatus, an advertisement selecting apparatus, an advertisement management method, an advertisement management program, and a recording medium on which an advertisement management program is recorded
JP4939637B2 (en) Information providing apparatus, information providing method, program, and information recording medium
US11282124B1 (en) Automated identification of item attributes relevant to a browsing session
CN110377701B (en) Hot word processing method and device, electronic equipment and storage medium
CN102024050A (en) Web browsing method
JP4640554B2 (en) Server apparatus, information processing method, and program
TWI399657B (en) A provider, a method of providing information, a program, and an information recording medium
TWI757957B (en) Automatic classification method and system of webpages
JP6576534B1 (en) Information display program, information display method, information display device, and information processing system