經濟部中央標準局員工消費合作社印製 A7 B7 五、發明説明() 【發明之技術領域】 本發明係有關一種同時進行多個搜尋引擎檢索的方 法,尤指一種應用於網路資料系統,藉由輸入關鍵字及/ 或设定檢索參數後’可同時取得許多個網站資訊的方法。 【發明背景】 隨著網際網路(Internet)的飛速發展,網路上的信孝、 也愈來愈豐富’而且每天還有新的網站加入。由於信息數 量龐大,讓使用者在網際網路上找尋信息猶如大海撈針, 明知所需的信息一定存在,但是想要自己去找,卻幾乎無 攸下手。因此,使用者通常都從一些網際網路搜尋引擎 (Search Engine)的網站逐一的進行檢索。 網際網路搜尋引擎是一種在網際網路中主動搜索信息並將 其自動索引的網站,它在網際網路上日夜穿行以便不斷地 尋找新的網頁,並索引其檢索到的每個頁面中的單字,其 索引内容儲存於可供查詢的大型資料庫中;這類搜尋引擎 現在有很多,如 Yahoo,Infoseek,Altavista 等。 使用者利用搜尋引擎的目的是要在網際網路上浩瀚的 信息海洋中’找到使用者所需要的信息;使用者通常並不 在意搜尋引擎返回的結果到底有多少個,所關心的是這些 搜尋引擎是否能把使用者引領至其想要去的地方,從而獲 取有饧值的息;因此,使用者希望搜尋引擎能真正返回 其所需要的結果。 但疋每個搜尋引擎都是獨立地在網上進行搜尋的,由於它 們各自的檢索方式不同,它們搜尋到的站點資料所涵蓋的 ---------社衣-------ir------^ (請*L.閲讀背面之注意事項再填寫本頁) .Printed by the Consumer Standards Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs A7 B7 V. Description of the Invention (Technical Field of the Invention) The present invention relates to a method for searching multiple search engines at the same time, especially a method applied to a network data system. By entering keywords and / or setting search parameters, it is a method of obtaining many website information at the same time. [Background of the Invention] With the rapid development of the Internet, the number of filial piety on the Internet is becoming more and more abundant, and new websites are added daily. Because of the huge amount of information, making users look for information on the Internet is like looking for a needle in a haystack, knowing that the necessary information must exist, but it is almost impossible to find it by themselves. Therefore, users usually crawl one by one from some Internet search engine (Search Engine) websites. An Internet search engine is a website that actively searches for information in the Internet and automatically indexes it. It travels day and night on the Internet to constantly find new web pages and indexes the words in each page it retrieves. The index content is stored in a large database that can be queried; there are many such search engines, such as Yahoo, Infoseek, Altavista, etc. The purpose of users using search engines is to 'find the information users need in the vast ocean of information on the Internet; users usually don't care how many results the search engines return, they are concerned about these search engines. Whether users can be led where they want to go and get valuable information; therefore, users want search engines to really return the results they need. However, each search engine searches on the Internet independently. Because of their different search methods, the site information they search for is covered by --------- 社 衣 ---- --- ir ------ ^ (Please * L. Read the notes on the back and fill out this page).
經濟部中央標準局員工消費合作社印製 A7 -----—_________B7____ 五、發明説明() ' --—· 範圍也各不相同;每個搜尋引擎的#料索引方式也各有千 秋’有的是按類似於圖書館中索引目錄的方式進行分類索 引,如Yahoo;而有的又是按網頁中出現的每個單詞進行 索引,如OpenText。甚至還有專門搜索某類信息的搜尋 引擎,如有專門檢索專利信息的,也有專門檢索共享軟体 的。 當我們利用這些搜尋引擎查找信息時,傳統的檢索方 法是先要開啟一瀏覽器(Browser),如微軟公司之InternetPrinted A7 by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs ---------_________ B7____ V. Description of the invention () '--- · The scope is also different; the #material index method of each search engine is also different. Some are similar. Index the catalogs in the library for indexing, such as Yahoo; and some index each word that appears in a web page, such as OpenText. There are even search engines that specialize in searching certain types of information. If there is a special search for patent information, there is also a special search for shared software. When we use these search engines to find information, the traditional search method is to first open a browser, such as the Internet of Microsoft Corporation.
Explorer或網景公司之Netscape Navigator,找到某個搜 尋引擎的首頁(Home Page),在其首頁中鍵入所要查詢的 關鍵字(Keyword),必要時需設定一些選項(參數),如返 回多少個結果,檢索的範圍等;然後搜尋引擎就會在其資 料庫中進行查詢,並將檢索的結果返回至瀏覽器;倘若用 戶不滿意檢索的結果,便可再移至另一個搜尋引擎的首頁 重複上述的步驟直到滿意為止。 例如’我們想使用Yahoo搜尋引擎去尋找一些有關 計算機雜諸的站點;我們先得打開網景公司之Netscape Navigator,在其首頁中,鍵人關鍵字"c〇mputer magazine” ’按下search按鈕,如「第1圖」所示;然後, Yahoo搜尋引擎就會返回許多與計算機雜誌有關的資料, 如 弟2圖」所示;若同樣是查詢'’computer magazine", 則利用Infoseek搜尋引擎查詢,其返回的結果就與Yah00 搜尋引擎不同,如「第3圖」所示。 正因為如此’我們用同樣的關鍵字在不同的搜尋引擎中進 4 本紙張尺度適用中國國家榡準(CNs ) a4規格(210X297公釐) 私衣------,玎------m -(請*閱讀背面之注意事項再填寫本頁) . A7Explorer or Netscape Navigator of Netscape, find the home page of a search engine, type the keyword you want to query in its home page, and set some options (parameters) if necessary, such as how many results to return , Search scope, etc .; then the search engine will make a query in its database and return the search results to the browser; if the user is not satisfied with the search results, they can move to the home page of another search engine and repeat the above Steps until satisfied. For example, 'We want to use the Yahoo search engine to find some sites related to computer miscellaneous; we must first open Netscape Navigator of Netscape, in its home page, type the keyword " c〇mputer magazine "' Press search Button, as shown in "Figure 1"; then, the Yahoo search engine will return a lot of information related to computer magazines, as shown in Figure 2; if the same query is "computer magazine", then use the Infoseek search engine The query returns different results from the Yah00 search engine, as shown in Figure 3. Because of this, 'we use the same keywords to enter 4 different paper search engines in different paper sizes. Applicable to China National Standards (CNs) a4 specifications (210X297 mm). --m-(Please read the notes on the back and fill in this page). A7
經濟部中央標準局員工消費合作社印製 Α7 Β7 五、發明説明() 其索引到的網頁中的單字進行比較,然後返回匹配的結 果;由於它所能返回的查詢結果完全依賴於其資料庫,所 以它需要不斷地去檢索新的網頁,才能不斷地擴充其資料 庫’因此,這類搜尋引擎通常都配備多台大型词服器 (Server),日夜不停地在網際網路上搜索新的網頁,並同 時接受用戶的查詢;這種方法不僅對硬体的要求較高,而 且成本也較大’一般用戶自己無法實現。 【發明之概述】 本發明的目的是實現一種在網路上由系統自動控制、 依序到使用者所指定之一個或複數個搜尋引擎中進行杳詢 的方法’使用者無須分別進入各個搜尋引擎中進行查詢, 只需要一次設定好查詢條件,就可得到多個搜尋引擎回應 的結果。 ~ 本發明的另一目的是節省使用者的作業時間,並簡化 作業流程;因為從多個搜尋引擎回應查詢的結果,不僅有 較廣汎的資訊涵蓋面,而且檢索的結果更加可靠。 【圖式簡要說明】 第1圖’為習知搜哥引擎的操作畫面,顯示在搜尋引擎 Yahoo 首頁中鍵入關鍵字 “conipUter magazine” 。 第2圖,為習知搜尋引擎的操作畫面,顯示搜尋引擎Yah〇〇 將查詢 computer magazine 的結果返回 Netscape。 苐3圖’為習知搜尋引擎的操作晝面,顯示搜尋引擎 Infoseek將查詢computer magazine的結果返回。 第4圖’係為本發明之硬體方塊圖。 6 本紙張尺度通州T囷國家標準(CNS ) Μ規格(210X29*7公釐) ^-- (請先閱讀免面之注意事項再填寫本頁) -1Τ 線 第 A7 B7 五、發明説明() 圖,係為本發明之操作流程圖。 第6圖’係為本發明的實施例晝面,顯示可供用戶設定多 個搜尋引擎之選項圖。 第7圖’係為本發明的實施例畫面,顯示用戶輸入檢索條 件以及’本發明之檢索代理器將多個搜尋引擎檢索 computer magazine的結果同時返回給用戶的情形。 第8圖’係為本發明之檢索代理器,.將搜尋引擎查詢格式 設計成一特徵格式表的流程圖。 第9圖’係為本發明之檢索代理器,對返回之結果進行語 法分析的流程圖。 第1 0圖’係為本發明之檢索代理器,刪除重覆站點資料 的流程圖。 【實施例說明】 根據本發明所提出的技術,係為一種確實可行、可以 在網際網路上代替使用者同時到多個搜尋引擎中進行搜尋 的方法;而根據本發明之技術内容,並不僅限定應用於網 際網路(internet)’舉凡如廣域網際網路(WAN; fide奸⑸ Network)、都會網路(MAN ; Metr〇p〇Utian AreaPrinted by the Consumers' Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs A7 Β7 V. Description of the Invention () Compare the words in the web pages indexed by it, and then return the matching results; since the query results it can return are completely dependent on its database, Therefore, it needs to constantly search for new web pages in order to continuously expand its database. Therefore, this type of search engine is usually equipped with multiple large server servers to search for new web pages on the Internet day and night. , And at the same time accept the user's query; this method not only has higher requirements on hardware, but also has a higher cost. 'General users cannot achieve it themselves. [Summary of the invention] The object of the present invention is to realize a method for automatically inquiring on one or more search engines designated by a user in order by the system automatically controlled by the system on the Internet. 'Users do not need to enter each search engine separately. When performing a query, you only need to set the query conditions once to get the results of multiple search engine responses. ~ Another object of the present invention is to save the user's operation time and simplify the operation process; because the results of responding to queries from multiple search engines not only have a wider range of information coverage, but also the retrieval results are more reliable. [Brief description of the diagram] Fig. 1 'is the operation screen of the search engine, showing the keyword "conipUter magazine" on the search engine Yahoo homepage. Figure 2 shows the operation screen of the conventional search engine. The search engine Yah〇〇 displays the results of querying computer magazine to Netscape. Figure 3 'shows the operation of a conventional search engine, showing that the search engine Infoseek returns the results of a computer magazine query. Figure 4 'is a hardware block diagram of the present invention. 6 This paper size Tongzhou T 囷 National Standard (CNS) M specifications (210X29 * 7 mm) ^-(Please read the precautions for face-free before filling out this page) -1T line No. A7 B7 V. Description of the invention () The figure is a flowchart of the operation of the present invention. Fig. 6 'is a day view of an embodiment of the present invention, showing a map of options that a user can set for a plurality of search engines. Fig. 7 'is a screen of an embodiment of the present invention, showing a case where a user inputs search conditions and a case where the search agent of the present invention returns the results of a computer magazine search by multiple search engines to the user at the same time. Fig. 8 'is a flowchart of a search agent designing a search engine query format into a feature format table according to the present invention. Fig. 9 'is a flowchart of the syntax analysis of the returned result by the retrieval agent of the present invention. Fig. 10 'is a flowchart of a search agent of the present invention for deleting duplicate site data. [Explanation of the embodiment] The technology proposed by the present invention is a method that is indeed feasible and can search on multiple search engines at the same time instead of users on the Internet; but the technical content of the present invention is not limited only Application to the Internet ('Internet; Wide Area Internet (WAN; fide rape network), Metropolitan Area Network (MAN; Metr〇p〇Utian Area)
Network)、區域網路(LAN; L〇cal Area Netw〇rk)或是 企業網路(intranet)均應適用本發明之技術。 請參閱「第4圖」,係為本發明之硬體方塊圖,根據 圖中所揭露的硬體實施例,本發明係使用具備中央處理單 兀41、數學邏輯運算單元42、儲存裝置43、輸入單 儿44、輸出單元4 5等構件之電腦4 0,經由連結至該 / 私紙張尺度適财關家縣)機格(21Qx297公董 ---------批衣------11------線 (請>-閲讀背面之注意事項再填寫本頁) 經濟部中央橾準局員工消費合作社印製 經 濟 部 中 央 -標 筚, 局 員 工 消 合 作 社 印 % 五、發明説明( 〇的數據機4 6及連結至該數據機4 6的網際網路 4 7 ’該網際網路4 7與指定之複數個搜尋引擎*叫目 ,二其中的中央處理單元4 i可具備作業系統内建之捲動 扎7或執订一應用程式達到捲動功能;輸入單元4 4則可 ^臣包含鍵盤與滑鼠等電腦週邊設備;冑出單元4 5則可以 是監視器、視頻監視器或視頻輸出輸入裝置之中的任一 而健存裝置4 3則可以是硬碟、軟碟、記憶體或光碟 等’用以儲存特徵格式表或語法分析表或檢索代理器(程 式模組)等。 日關於本發明之操作流程,請參閱「第5圖」;本發明 係提供-種建構在劉覽器之網際網路的檢索代理器(程式 椟組)(Search Agent);操作流程依序為: a.開啟一劉覽H ;『例如網景公司之⑽㈣心如⑽, 啟動該檢索代理器(程式模組)。』 •選擇欲利用的搜判擎並輸人關鍵字或必要時設 條件等選項内容;『其晝面如「第6圖」所示。』 丨、 c. 根據用戶設定條件取得搜尋引擎的查詢格式; d. 依照取自於搜尋引墾的杏咱玫斗、 Μ擎的查6旬格式’將關鍵字與檢索侔侏 傳送至搜尋引擎進行檢索; 蛛件 e ·判斷檢索結果是否已返回檢索代 似矛、n理态,右尚未返回則縧 續等待,若已返回則進行下一步驟; 、繼 f. 運用一語法分析表動態地分析返回之檢索結果; g. 檢查所有返回之站點县^:在a,一丄 ……—右不存在則於輸出單元 (4 5 ”員示-提示信息,提醒用戶這個站點地址已 經不 (請尤閱讀背面之注意事項再填寫本頁j 裝 訂 線 本紙張尺度適用中國國豕榡準(CNS ) A4規格(210X297公餐) A7 B7 五、發明説明( 存在’若存在則進行下—步驟. h.將返回的站點信i锉左 〜储存於—儲存裝置(4 3 ),成為一 統一的索引表(清單),n ^ 乂供用戶查詢,致使用戶更易於找 到想要的内容; i ·判斷是否所有的搜晷3丨敬 哭子引擎都已返回檢索結果,若所有的 搜尋引擎都已返回檢孛6士要 双糸、,.。果’則跳至步驟(k ),否則繼續 下一步驟; j.移至下個搜寻引擎設成目前的搜尋引擎,並跳回至步 驟(c ); k•顯示索引表内之檢索結果於-輸出單元(45);以及 1.結束檢索代理器(程式模組)。 根據前述的操作流程,進—步說明各操作流程之功能 與内容如次: 首先,本發明提供用戶一種例如「第7圖」所示之使 用旦面’可在其中之查詢内容攔位輸入關鍵字,如⑺mputer magazine ’即為前述的步驟(a)所示,用戶選擇一些搜 尋引擎並輸入關鍵字或必要時設定檢索條件。 其次,關於步驟⑴所示,檢索代理器根據用戶設定條 件取得搜尋引擎的查詢格式之部份,說明如後。 由於每個搜尋引擎在將^的查詢請求送往資料庫進行查 洶時,都有其自己的查詢格式,$常包括查詢的關鍵字, ,個關鍵字之間的邏輯關係,每次返回的結果數,檢索的 範圍等參數,而搜尋引擎便是透過這些參數來控制其返回 的結果。4 了使每個不同的搜尋引擎在不同設定條件下均 _ y 本,·.氏張尺度適用巾國國家標準(CNS ) A4規格(21〇χ297公餐 ----------裝------訂------線 (請先閲讀背面之注意事項再填寫本頁) 經濟部中央標準局員工消費合作社印製 經 濟 部 中 央 樣 準 Μ 員 工 消 費 合 作 社 印 製 Α7 ________Β7 五、發明説明() 能接受用戶查詢時所使用的格式,本發明之實施手段係將 每個搜尋引擎在不同參數設定下所使用的查詢袼式^計成 張特徵格式表(Feature Table),立流程如「笛 _ 弟8圖」所 不,包括下列步驟: c 1.開啟一瀏覽器; c2.進入該搜尋引擎之網站; c3.在該搜尋引擎之網站中’輸入關鍵字; c4.在該搜尋引擎之網站中’設定該搜尋引擎之各個杳詢 參數; ° Μ.執行該搜尋引擎之查詢動作; c6.取得存在於該搜尋引擎返回之搜尋結果中的查詢格式 及查詢參數; c7.建立一特徵格式表;以及 替換該關鍵字及各個該搜尋引擎選項之内容為該特徵 格式表中相應之查詢參數。 而此一特徵格式表的資料結構如下,大致包括有: (1 )搜尋引擎名稱,如Yahoo ; (2 )搜尋引擎第一次接受查詢的網址格式,如 http://search.yahoo.com/search?p = %s&d=y&za=and&h=s&g = 0&n=%d ; (3)搜尋引擎接受第一次查詢後’對同一關鍵字進行下一 次查詢時的網址格式,如 http://search.yahoo.com/bin/search?p = %s&h = s&n = 0/〇d ;以 及 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) -------------装------ir------線 一請兄閲讀背面之泣意事項存填寫本X ) , 經濟部中央標準局員工消費合作社印製 五、發明説明( 『其中,網址格式中的%s,%d等符號是一些查詢參數。』 (4)其它有關信息,如搜尋引擎每次總合檢查(checksu/ 返回的結果數等。 在前述的例子之中,所稱的網址格式雖然是以現行的 URL(Universal Resource Locators)格式為例子,但並非僅 限定於此種URL格式。 在了解了每個搜尋引擎的查詢格式後,本發明之檢索代理 軟體就能利用這些參數,直接代替用戶到不同搜尋引擎的 資料庫中進行搜尋;用戶同樣可以像在搜尋引擎中那樣, 對這些參數進行設定,本發明會根據這些參數到特徵格式 表中找到與搜尋引擎對應的查詢格式,然後自動將這些參 數轉化成搜尋引擎所能接受的格式。例如,在上例Yah〇〇 的URL查詢格式中,%s即為關鍵字,%d即為讓Yah〇〇 返回多少個結果’若用戶設定查詢的關鍵字為⑶㈣如, 次返回20個結果,則我們就將URL查詢格式中的參數 替換成用戶的設定,在本例中即可替換成 http://search.yahoo.c〇m/bin/search?p=c〇mputer&h=s &n = 20這就是Yahoo最終能接受的查詢格式;在得到搜 尋引擎所能接受的格式後,本發明就同時將查詢請求送往 多個搜尋引擎。 關於步驟⑴所述’運用語法分析表進行動態地分 析返回之檢索結果,還包括下列步驟(如「第9圖」所示): (f 1 ).讀取搜尋引擎的語法分析表; I 扣衣------、玎------^ (請.X閱讀背面之注意事項再填寫本頁) (f2).取得語法分析表中有關站點主題URL位址、簡介Network), local area network (LAN; Local Area Network) or intranet. Please refer to "Figure 4", which is a hardware block diagram of the present invention. According to the hardware embodiment disclosed in the figure, the present invention uses a central processing unit 41, a mathematical logic operation unit 42, a storage device 43, Input computer 44, such as input list 44, output unit 45, etc., connected to the / private paper scale Shicai Guanjia County) (21Qx297 public director --------- approved clothing --- --- 11 ------ line (Please >-Read the notes on the back and fill in this page again) Printed by the Central Consumers ’Cooperative of the Ministry of Economic Affairs and the Consumer Cooperatives of the Ministry of Economic Affairs V. Description of the invention (the modem 46 of 〇 and the internet 4 7 connected to the modem 4 6 'the internet 47 and the specified plurality of search engines * are called heads, and two of them are the central processing unit 4 i can have scrolling function built in the operating system 7 or order an application to achieve scrolling function; input unit 4 4 can include computer peripherals such as keyboard and mouse; output unit 4 5 can be monitoring Device, video monitor, or video output input device, while the storage device 4 3 It is a hard disk, a floppy disk, a memory or an optical disk, etc. 'to store a feature format table or a parsing table or a search agent (program module), etc. For the operation flow of the present invention, please refer to "Figure 5" The present invention provides a search agent (Program Agent) (Search Agent) built on the Internet of Liu Lanji; the operation sequence is: a. Open a Liu Lan H; If you are worried, start the search agent (program module). ”• Select the search engine you want to use and enter keywords or set conditions if necessary;“ It ’s daytime is as shown in “Figure 6” The query format of the search engine is obtained according to the conditions set by the user; d. The keywords and the search query are transmitted to the search engine in accordance with the format of Xingzao Meidou and MEMS engine obtained from search and reclamation. The search engine searches; spider e · judges whether the search result has been returned to the search engine like spear, n state, if the right has not returned, then continue to wait, if it has returned, proceed to the next step;, followed by f. Using a syntax analysis table Dynamically analyze returned search results g. Check all returned sites and counties ^: In a, one ......... If the right does not exist, it will be in the output unit (4 5 ”staff-prompt message to remind the user that the site address is no longer available (please read especially on the back) Note: Please fill in this page again. J Binder This paper size is applicable to China National Standard (CNS) A4 specification (210X297 meal) A7 B7 5. Description of the invention (if there is' if it exists then proceed to the next step — h. Will return Site letter i file left ~ stored in-storage device (4 3), becomes a unified index table (list), n ^ 乂 for users to query, making it easier for users to find the content they want; i · determine whether all Search 3 丨 Jingwaizi engine has returned the search results, if all search engines have returned to the search engine, you must double search, ... If 'if', then go to step (k), otherwise continue to the next step; j. Move to the next search engine set to the current search engine, and skip back to step (c); k • Display the search results in the index table in -An output unit (45); and 1. ending the search agent (program module). According to the foregoing operation flow, the functions and contents of each operation flow are further explained as follows: First, the present invention provides the user with a use surface such as the one shown in "Figure 7", where the content can be searched for. Words, such as ⑺mputer magazine ', as shown in step (a) above, the user selects some search engines and enters keywords or sets search conditions if necessary. Next, as shown in step ⑴, the search agent obtains a part of the query format of the search engine according to the user setting conditions, as described below. Because each search engine sends its query request to the database for searching, it has its own query format. $ Often includes the keywords of the query, and the logical relationship between the keywords. The number of results, the scope of the search, and other parameters, and the search engine controls the results returned by these parameters. 4 In order to make each different search engine under different setting conditions _ y this, .. Zhang scales apply national standards (CNS) A4 specifications (21〇χ297 public meals ---------- Packing ------ order ------ line (please read the notes on the back before filling out this page) Printed by the Central Consumers Bureau of the Ministry of Economic Affairs Printed by the Ministry of Economic Affairs Central Cooperative Standard M Printed by the Employees Consumer Cooperatives ________ Β7 V. Description of the invention () The format used when inquiring by users can be accepted. The implementation method of the present invention is to calculate the query method used by each search engine under different parameter settings ^ into a feature table. The process of setting up is like the "Di_Di 8 Pictures", including the following steps: c 1. Open a browser; c2. Enter the website of the search engine; c3. 'Enter keywords in the website of the search engine; c4 'Set the various query parameters of the search engine in the website of the search engine; ° M. Perform the query action of the search engine; c6. Obtain the query format and query parameters in the search results returned by the search engine; c7. Building a feature The format table; and the content of replacing the keyword and each search engine option with the corresponding query parameters in the feature format table. The data structure of this feature format table is as follows, including: (1) the name of the search engine, such as Yahoo; (2) The URL format that the search engine accepted the query for the first time, such as http://search.yahoo.com/search?p =% s & d = y & za = and & h = s & g = 0 & n =% d; (3) After the search engine accepts the first query, 'the URL format for the next query on the same keyword, such as http://search.yahoo.com/bin/search?p =% s & h = s & n = 0 / 〇d; and this paper size applies Chinese National Standard (CNS) A4 specification (210X297 mm) ------------- installation ------ ir ------ Line 1 Please read the Weeping Matters on the back and fill in this X), printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs 5. Description of the invention ("% s,% d, etc. in the URL format Symbols are some query parameters. "(4) Other relevant information, such as the search engine's total check (checksu / number of results returned, etc.). In the previous example, all Although the URL format is based on the current URL (Universal Resource Locators) format as an example, it is not limited to this URL format. After understanding the query format of each search engine, the retrieval agent software of the present invention can use these Parameters, instead of the user to search in the databases of different search engines; users can also set these parameters as in search engines, and the present invention will find the query corresponding to the search engine in the feature format table according to these parameters Format, and then automatically convert these parameters into a format acceptable to search engines. For example, in the URL query format of Yah〇〇 above,% s is the keyword, and% d is how many results Yah 00 returns. If the user sets the query key to ㈣, for example, 20 results are returned. , Then we will replace the parameters in the URL query format with the user's settings, which in this case can be replaced with http: //search.yahoo.c〇m/bin/search? P = c〇mputer & h = s & n = 20 This is the query format that Yahoo can finally accept; after obtaining the format acceptable to the search engine, the present invention sends the query request to multiple search engines at the same time. Regarding step '' Using a grammar analysis table to dynamically analyze the returned search results, the following steps are also included (as shown in "Figure 9"): (f 1). Read the grammar analysis table of the search engine; I deduction Clothing ------, 玎 ------ ^ (Please.X read the notes on the back and then fill out this page) (f2). Obtain the URL and description of the subject topic URL in the grammar analysis table
經濟部中央標準局員工消費合作社印製 A7 _____B7 五、發明説明()' · ^ 等信息的語法標記(Syntax Tag); (f 3 ).在搜尋引擎返回的文件中尋找語法標記;以及 (f 4 )·過濾返回文件中對應的站點信息。 對搜尋引擎返回的查詢結果進行語法分析(Parse)的原 因,係由於目前所有搜尋引擎查詢的結果,都是以 (Hyper Text Markup Language)文件的格式返回的,而每 個搜尋引擎返回文件的格式也是不同的;因此我們要先了 解這些格式,並針對每個搜尋引擎製作相應的語法分析表 (Parsing Table),才能對搜尋引擎返回的HTML格式進行~ 動態的語法分析;而每個搜尋引擎的語法分析表中通常包 括搜尋引擎返回文件中與站點主題、URL位址、站點簡l 介等信息相對應的語法標記(Syntax Tag)。 我們根據與搜尋引擎相對應的語法分析表對傳回的文 件進行分析,在文件中尋找這些語法標記,才能將其中每 個站點資料中的主題,URL位址,站點簡介等信息整理 出來;並將其它無用的信息,如文件中所包含的圖片和廣 告等過濾(Filter)掉。 例如在Yahoo的語法分析表中,有如下語法標記:Printed by A7 _____B7 of the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs 5. Syntax Tag (Information Description) and other information (Syntax Tag); (f 3). Look for syntax tags in the documents returned by the search engine; and (f 4) Filter the corresponding site information in the returned file. The reason for parsing the query results returned by the search engine is that the results of all current search engine queries are returned in the format of (Hyper Text Markup Language) files, and the format of the files returned by each search engine is It is also different; therefore, we must first understand these formats and make a corresponding parsing table for each search engine before we can perform ~ dynamic parsing on the HTML format returned by the search engine; and each search engine's The syntax analysis table usually includes a syntax tag (Syntax Tag) corresponding to the information such as the subject of the site, the URL address, and the site profile in the file returned by the search engine. We analyze the returned file according to the grammar analysis table corresponding to the search engine, and look for these grammatical tags in the file, so as to sort out the topics, URL addresses, site profile and other information in each of the site materials ; And filter out other useless information, such as pictures and advertisements contained in the file. For example, in Yahoo's parsing table, there are the following syntax tags:
<li><A HREF = URL>TITLE</a>SUMMARY 當搜尋引擎Yahoo返回搜尋結果後,我們根據該語法分 析表對傳回文件進行分析,在文件中尋找這些語法標記, 比如我們在文件中找到,,<li><A HREF = ”時,其後面所跟 著的文件内谷’一直到找到 > 為止,就是站點之URL位址; 由>至</3>間的文件内容是站點的主題;而在</a>之後的 12 本紙張尺度適用中國國家操準(CNS ) A4規格(2丨〇χ 297公董) 私衣1τ------0 f請先閱讀背面之注意事項再填寫本頁) 、 A7 A7 所 五、發明説明( 文件内容是站& &热人 γ , 1 這樣我們就可以把搜尋引擎< li > < A HREF = URL > TITLE < / a > SUMMARY When the search engine Yahoo returns the search results, we analyze the returned file according to the grammar analysis table, and look for these syntax tags in the file. When found in the file, when < li > < A HREF = ", the valley in the file that follows it will be the URL address of the site until > is found; from > to < / 3 > The content of the document is the subject of the site; the 12 paper sizes after < / a > are applicable to China National Standards (CNS) A4 specifications (2 丨 〇χ 297 公 董), private clothing 1τ ----- -0 f Please read the notes on the back before filling in this page), A7, A7, V. Invention Description (The content of the file is the station & & hot people γ, 1 so we can put the search engine
Yahoo傳回的蛀 文兮W擎 ^ _ ° 件中站點的信息分析出來。 在進仃則述的步驟(h)時 回相同的站點眘祖二 个u议寸引拏可旎會返 ^ p '枓,且有時搜尋引擎傳回的站里占地#1 實已經不存At*,, ^ 地址其 仔在了。為了提高搜索 器對搜尋引擎值向认— 罪性栝索代理 + u 傳回的母一個站點資料都進行了檢杳·用曰 母乂進行搜尋時,檢 一,用戶 I都存儲到—張會將查到的每㈣RL網址 1時,就去檢:=如二分析出一個新的— 在,則表示該URL為重…:個URL在素引表中已經存 就把它新加入索引表中的站點,便將它删除掉;否則, 而此-刪除重覆站點資料的流程則如「第1 示,其包括了步驟: 27 (hi) ·判斷新取得的URL位址是否己存在索引表中 己存在則進行步驟(h2),若未存在則進行步鄉(⑻; (h2).從搜尋結果中刪除此—url位址,然後結束;’ (h3).將新取得的URL位址加入索引表+,然後'结束。 除了將重複的站點信息去除掉,檢索代理器還會代替用戶 |去檢查每個站點位址是否還存在。當每個URL位址分析 I出來後,檢索代理_試去連接這個站點,如果站點返 |回的信息表明該URL位址已不存在,則檢索代理器便將 |此一不存在的URL位址自索引表中刪除。對於最終結果, |再以統一的清單表示出來,令用戶更易於找到想要的内 ! |容。 13 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐 -----------裝—I (請.先聞讀背面之注意事項再填寫本頁} --訂------ ----- 五、發明説明( :發明之功效】 综上所述’將不難看出, -發明所提供的tj衔系万法相比杈, I簡化了用=:僅可節省用戶的時間,而且大大地 逐-進行杳绚\過程,用戶無需再在各個搜尋引擎之間 I多個搜=的二Γ:好查詢條件,就可同時得到 “多不同的搜尋=檢::果因:使用這種方法可以 I的信自覆篆 檢索、·°果,所以不僅有較為廣泛 心覆盍面,而且查詢結果也更加可靠了。 發明之技術特徵,經上料細 欲強調者,該黧奋沪η〆 從肘更為具體,惟 例之用L Λ 作為例示說明本發明較佳實施 非用以偈限本於明^^益图 明精神下所為"… 任何不脫離本發 L ·,、、 良或變更,皆屬本發明意圖保護者。 I I圖式.#號說明、 .......電腦 • ·.中央處理單元 .數學邏輯運算單元 .....儲存裝置 鯉濟部中央榡準局員工消費合作,社印製 【圖式符 |4 〇 4 1 4 2 4 3 4 4 4 5 4 6 4 7 4 8 .....輸入單元 .....輪出單元 ......數據機 .....網際網路 .....搜尋引擎 14 本紙張尺度適用中國國家標準(CNS ) A#規格(21〇><297公釐)The information from the site returned by Yahoo is analyzed. In the step (h) described in the next step, return to the same site. Shenzu ’s two u-introductions will return ^ p '枓, and sometimes the site returned by the search engine covers # 1. No At *, ^ address is here. In order to improve the search engine ’s recognition of the value of the search engine — the crime search agent + u returned the site information of both the parent and the site. Every time the RL URL is found, it will be checked: == If a new one is analyzed—yes, it means that the URL is heavy…: If a URL already exists in the prime index table, it will be newly added to the index table , Delete it; otherwise, the process of deleting duplicate site information is as shown in "1. It includes steps: 27 (hi) · Determine whether the newly obtained URL address already exists If the index table already exists, go to step (h2). If it does not exist, go to step (⑻; (h2). Delete this -url address from the search results, and then end; '(h3). Will the newly obtained URL The address is added to the index table +, and then 'end. In addition to removing the duplicate site information, the search agent will also replace the user to check whether each site address still exists. When each URL address analysis comes out After that, the search proxy_ try to connect to this site, if the site returns | returns the information indicating the URL address If it does not exist, the search agent deletes this non-existent URL address from the index table. For the final result, it is represented by a unified list, making it easier for users to find the content they want! 13 This paper size is applicable to Chinese National Standard (CNS) A4 specification (210X297mm ---------- packing—I (please read the precautions on the back before filling out this page}) --Order- ----- ----- V. Description of the invention (: The effect of the invention) In summary, it will not be difficult to see that-the tj title provided by the invention is simpler than that, I simplified the use =: only It can save the user's time, and it can greatly perform the process step by step. The user does not need to search multiple search engines between the two search engines. Γ: Good query conditions, you can get "multiple different search = check" at the same time. :: Cause: Using this method, you can search and search results automatically, so it not only has a wide range of coverage, but also the query results are more reliable. The technical characteristics of the invention are detailed. Emphasizing, this is more specific, but the use of L Λ as an example illustrates the preferred implementation of the present invention. It is used to limit the copy of what is done in the spirit of Ming ^ Yi Tuming " ... Anything that does not deviate from the L ,,, good or change of this article belongs to the intent of the present invention. II schema. # 号 说明 、 .. ..... Computer • ·. Central Processing Unit. Mathematical Logic Operation Unit ..... Storage Device Consumer Co-operation by the Central Government Standards Bureau of the Ministry of Economic Affairs, printed by the agency [Schematic symbols | 4 〇4 1 4 2 4 3 4 4 4 5 4 6 4 7 4 8 ..... input unit .... roll out unit ... modem ..... internet ... search engine 14 This paper size applies the Chinese National Standard (CNS) A # specification (21〇 > < 297mm)