J237779 玖、發明說明: 【發明所屬之技術領域】 本發明係關於一種搜尋系統及方法,特別指一種具快 速搜尋效能之電腦檔案内文搜尋系統及方法。 【先前技術】 現有如Microsoft WINDOWS等個人電腦常用作業系統 中’除具備槽案名稱搜尋功能外,並可進一步就檔案内含 文子進行搜哥。惟其搜尋方式係於接收使用者輸入之關鍵 字後,就電腦内所有可能檔案之内文逐篇逐字比對,故其 耗費時間不言而喻。尤其隨現今硬碟容量不斷迅速擴增, 其内存各種袼式之檔案資料數量及大小皆愈顯可觀,而將 使上述習知搜尋動作耗時而缺乏效率之缺點愈為突顯。 【發明内容】 ▲因此,本發明之首一目的,即在提供一種具快速搜尋 效能之電腦檔案内文搜尋系統及方法。 本發明之次一目的,在於提供一種可就多種格式檔案 進行搜尋之電職案内讀尋純及方法。 、 一人於=,本發明之電腦檔案内文搜尋系統,係用以依據 一檢索資料而自該電腦内儲存之複數原始檔案中檢索出包 亥貝料之對應檔案,該系、統包括··-擷取模組,依一預 疋=自各該原始播案掏取出至少一有意義之待檢索資料 二&不:吾言轉換模組,將各該待檢索資料轉換為標示語 才。式之待‘索播案;—儲存模組,將該等待檢索播案儲 子於4電細’一檢出模組,依據該檢索資料而自該等待檢 5 10 15 20 1237779 索播案檢出對應之所有檔案;及一 ^ ^ 久逑恕褀組,依據各該對 應之待檢索檔案連結至其對應之原始檔案。 於-較佳實施例中’該系統更包括一純文字轉換模組 ,用以㈣等待檢索資料中非純文字(τχτ)格式者將其榻取 出之該待檢索資料轉換為純文字袼式。 於一較佳實施例中,該系統更包括一更新監控模组, 用以於各該原始㈣更新後,令該擁取模組重新自更新後 之各該原始檔案擷取該待檢索資料。 於一較佳實施例中,各該待檢索播案係為XML槽。 於:較佳實施财,該檢出模組更依據一檢索時間範 圍’而就該等對應待檢㈣案中進—步檢㈣合該時 圍者。 於一較佳實施例中,該連結模組更於各該連結之對應 原始檔案中標示出該檢索資料所在處。 本發明更揭示一種電腦檔案内文搜尋方法,用以依據 -檢索資料而自該電腦内儲存之複數原始檔案中檢索出包 含該資料之對應檔案,該方法包括下述步驟:⑴依一預定 規則自各該原始檔案擷取出至少-有意義之待檢索資料; (2)將各該待檢索資料轉換為標示語言格式之待檢索檔案; ()儲存4 4待檢索檔案;(4)依據該檢索資料而自該等待檢 索播案檢出對應之所有檔案;及(5)依據各該對應之待檢索 才虽案連結至其對應之原始標案。 【實施方式】 本發明之前述及其他技術内容、特點與功效,於以下 5 1237779 配合參考圖式之較佳實施例詳細說明中,將可清楚明白。 如圖1所示,本發明電腦檔案内文搜尋系統丨之一較 佳實施例’係以-儲存於-電腦1QQ可讀取儲存媒體(如^ 碟、光碟片)之程式軟體為例,而使該電腦1〇〇得以執行如 圖2之流程圖所示之複數步驟,以依據一使用者輸入之一 檢索資料20(示於圖3),自該電腦100内儲存之複數原始檔 案3中檢索出包含該檢索資料2〇之所有對應檔案。然於其 他變化例中,本發明亦可藉由—具等效功能之晶片或其他 韌體或硬體之形式實現。 本實施例中該内文搜尋系統丨主要包括一擷取模組u 、一純文字轉換模組12、一標示語言轉換模組13、一儲存 模組14、一更新監控模組15、一檢出模組16及一連結模 組17。各模組之作用則配合圖2之執行步驟及圖3之關聯 示意圖依序說明如下: 首先如步驟400及402所示,使用者將該系統丨安裝 於電腦100之一儲存裝置(如一硬碟,圖未示)後,擷取模組 11即依一預設之擷取規則50,自電腦1〇〇内既存之所有原 始檔案3,擷取出符合該擷取規則5〇之所有有意義之待檢 索資料51。本實施例中該預設擷取規則so係將預設之若干 無顯著意義而較少作為檢索關鍵字之字彙,如中文介詞,,的” 、助詞”啊’,、英文”of”、”the”、,,is,,,及數字等等,排除在欲 擷取之待檢索資料51標的之外而不予擷取。相對而言,待 檢索資料51則為有意義之字串,如,,軟體,,、,,檢索,,、 ”patent”、”search”或如廉商名稱,,Microsoft”、,,infoacer” 等 5 10 15 1237779 等。 再如步驟404及406所示,為使所有待檢索資料51格 式-致,本實施例中係令所有待檢索資料5"左以純文字 σχτ)格式呈現β故於步驟4〇4中,純文字轉換模組12將 就各原始檔案3檢查是否為純文字檔案,若經判斷該原始 播案3為如DOC(即由”Mi_〇ft w〇rd,,所編輯卜如(即由J237779 发明 Description of the invention: [Technical field to which the invention belongs] The present invention relates to a search system and method, and particularly to a computer file text search system and method with fast search performance. [Previous technology] Existing common operating systems for personal computers such as Microsoft Windows, in addition to the search function for slot names, can also search for the text contained in the file. However, the search method is based on receiving the key words input by the user, and comparing the text of all possible files in the computer word by word, so it takes time to speak. Especially with the rapid and rapid expansion of the current hard disk capacity, the amount and size of the various types of file data in its memory are becoming more and more significant, and the disadvantages of the time-consuming and inefficient efficiency of the conventional search operation will become more prominent. [Summary of the Invention] ▲ Therefore, the first object of the present invention is to provide a computer file text search system and method with fast search performance. It is a second object of the present invention to provide a method and a method for reading and searching in an electrical service case, which can search files in multiple formats. , One person =, the computer file text search system of the present invention is used to retrieve the corresponding file of Bao Haibei material from a plurality of original files stored in the computer according to a retrieved data. The system includes: -Retrieval module, according to a pre-evaluation = extract at least one meaningful to-be-retrieved data from each of the original broadcasts. &Amp; No: My language conversion module converts each to-be-retrieved data into a tagline. The storage case is a storage module, and the waiting retrieval case is stored in the 4 electric fine detection unit. Based on the retrieval data, the waiting case is inspected from the waiting case 5 10 15 20 1237779 Out all corresponding files; and a ^ ^ Jiu Jishu group, link to the corresponding original file according to each corresponding to be retrieved file. In the preferred embodiment, the system further includes a text-only conversion module for waiting for the non-pure text (τχτ) format in the retrieved data to convert the data to be retrieved from it into a plain text format. In a preferred embodiment, the system further includes an update monitoring module for retrieving the data to be retrieved from the updated original files after the original modules are updated. In a preferred embodiment, each of the broadcast cases to be retrieved is an XML slot. Yu: For better implementation, the check-out module is based on a search time range 'and advances in the corresponding cases to be inspected-step-by-step inspections are combined with those at that time. In a preferred embodiment, the link module further indicates the location of the retrieved data in the corresponding original file of each link. The invention further discloses a computer file text search method for retrieving corresponding files containing the data from a plurality of original files stored in the computer according to the -retrieval data. The method includes the following steps: according to a predetermined rule Extract at least-meaningful to-be-retrieved data from each of the original files; (2) convert each to-be-retrieved data into a markup language format to-be-retrieved file; () store 4 to-be-retrieved files; (4) based on the retrieved data Check out all the corresponding files from the waiting search case; and (5) link the cases to their corresponding original bids according to the corresponding waiting to be retrieved cases. [Embodiment] The foregoing and other technical contents, features, and effects of the present invention will be clearly understood in the following detailed description of the preferred embodiment with reference to the drawings. As shown in FIG. 1, a preferred embodiment of the computer file content search system of the present invention is a program software that is stored in a computer 1QQ readable storage medium (such as a ^ disc, a compact disc) as an example, and Enable the computer 100 to execute the multiple steps shown in the flowchart of FIG. 2 to retrieve data 20 (shown in FIG. 3) based on one of the user inputs, from the plurality of original files 3 stored in the computer 100 Retrieve all corresponding files containing the search data 20. However, in other variations, the present invention may also be implemented in the form of a chip or other firmware or hardware with equivalent functions. The text search system in this embodiment mainly includes an extraction module u, a plain text conversion module 12, a markup language conversion module 13, a storage module 14, an update monitoring module 15, and a checker. The output module 16 and a connection module 17. The functions of each module are explained in order in accordance with the execution steps of FIG. 2 and the associated schematic diagram of FIG. 3: First, as shown in steps 400 and 402, the user installs the system on a storage device (such as a hard disk) of the computer 100 (Not shown), the extraction module 11 is based on a preset extraction rule 50, from all the original files 3 existing in the computer 100, and extracts all meaningful treatments that meet the extraction rule 50. Retrieved data 51. In this embodiment, the preset extraction rule is to use a number of preset vocabularies that are not significant and are not used as search keywords, such as Chinese prepositions, "", auxiliary words "ah", English "of", " the ",,, is ,, and numbers, etc., are excluded from the target of the data to be retrieved and are not retrieved. In contrast, the data to be retrieved 51 is a meaningful string, such as, software, search, search, "patent", "search", or the name of a retailer, Microsoft, Microsoft, infoacer, etc. 5 10 15 1237779 etc. Then, as shown in steps 404 and 406, in order to make all the materials to be retrieved 51 format consistent, in this embodiment, all the materials to be retrieved 5 " is represented in the form of β in pure text σχτ). Therefore, in step 404, the pure text The conversion module 12 will check whether each original file 3 is a plain text file. If it is judged that the original broadcast case 3 is as DOC (that is, by “Mi_〇ft w〇rd”, the edited version is as follows (that is, by
Microsoft Excel^^H) ^ PPT(,P ^ Microsoft Powerpoint^ 所編輯)、舰(即由”0utlook Express,,所編輯)或卿播等 非純文字格式,純文字轉換模組12則進—步將料非^文 字格式之待㈣㈣51,躲何習知方式轉換為純文字格 式待檢索資料52。 再如步驟彻所示,標示語言轉換模組13而後即分別 針對各純文字格式之待檢f資料52,連同該待檢索資料μ 對應原始權案3(亦即該待檢索資料52係自何—原始樓案3 擷取獲得)於該電腦100之儲存裝置(如前述之硬旬内田絕對 或相對儲存位址(含目錄及權名)53,以及該對應原始標案3 之產生或更新時間54’而形成對應之—標示語言(驗㈣ hnguage)格式之待檢索㈣55。易言之,各待檢索楷案η 除载有各待檢索資料52本身外’並記錄含有該待檢索資料 52之原始檔案3儲存位址53,以及該原始檔案3之產生或 更新時間54 °本實施例中該標示語言格式之待檢索檔案55 係由現今編輯網頁常用之XML語言所撰寫,然非以此為限 而後如步驟410所示,經前述步驟而針對各原始檔案3 20 1237779 内所擁取之任-待檢索資料52分別產生—待檢索播案μ 後,儲存模組14即將所有待檢索檔案55儲存於電腦100 之儲存裝置(如前述之硬碟)内之—檢索資料庫%。 1*0 15 再如步驟412所不,更新監控模組15可被使用者選擇 性啟動’巾即刻或依預定時間間隔定期監控既有各原奸 案3是否有所改變更新或有新原峨3產生,若確心 更新之原始檔t 3(包含新產生之檔案),即進—步依前述步 驟次序,致動操取模組U、純文字轉換模組12、標示語言 轉換模組13及儲存模組14,以針對更新後之原始檔案3產 生待檢索檔案55而儲存於檢索資料庫56。 經上述各步驟400至412後,本系統i之前置作業即 已70成而可接受使用者之輸入檢索。如步驟414、416及 418所示’當使用者透過一建構於Microsoft WINDOWS或 其他適當作業系統下之適當輸入介面,而輸入其欲檢索搜 哥可此存在於若干原始檔案3中之一檢索資料20後,檢出 模組16即接收該檢索資料2〇,而自檢索資料庫56中藉現 有之比對搜尋技術,檢出吻合該檢索資料20之所有待檢索 槽案55 ’而將所檢出待檢索檔案55中標示之對應原始檔案 3播名及儲存位址53(目錄及路徑),透過一適當顯示介面顯 示供使用者參考。本實施例該檢索資料20可為單一關鍵字 彙(keyword,如,,patent,,),亦可運用現有其他搜尋系統之習 知技術,而接收複數關鍵字彙經由如聯集、交集等布林 (Bolen)運算形式搜尋條件。 同時,如步驟415所示,本實施例中檢出模組16更可 20 1237779 接收使用者選擇性輸入之一檢索 +広u a Γ 22 ’亦即指定欲檢索 之原始槽案3其產生或更新之時間範圍(如一週内)。檢出: 組16再同時依據該檢索時限22 : 会咨報由〜丄a,η + 乂饿$貝冲十20 ’自檢 索貝枓庫56中才双出同時吻合該兩 5 10 η 15 舍拎垒“ ^ ^ ^ ^ 双京條件20、22之待檢 索榀案55,以避免耗費不必要之 服μ 饿家時間,其中該檢索時 限U可依各待檢索槽案55所載各原始_3 新時間54予以比對。 八 生或更 再如步驟420、422及424餅- 土 4所不,連結模組17可受使 用者選擇性啟動,而分別針對步驟 Τ耵步驟416中檢出模組16所檢 出之所有待檢索檔案55,執行各 现仃各祂案55而連接至各待檢索 才虽案55中標示之對應原始檔宰 系3儲存位址53,以開啟對應 原始檔案3後,進一步將該檢索資料2〇之關鍵字彙所在處 =白或其他醒目方式標示,便於使用者迅速知悉該關鍵 字彙之所在。 夺疋故、左由上述說明,本發明揭示一種電腦播案内文 搜尋系統及其方法,其針對電腦内各種格式之既有原始檔 案預先擷取出有意義之待檢索資料,並就各待檢索資料產 生,含其原始檔案财位址之㈣語言格式之待檢索播案 。藉此’當使用者輸入欲檢索資料之關鍵字囊時,該系統 可直接自待檢索檔案搜尋,而無須就各原始檔案内文逐篇 逐子搜哥,故可明顯縮減搜尋時間,尤其當儲存資料量龐 大時,本系統之快速檢索效能將尤具意義。 惟以上所述者,僅為本發明之較佳實施例而已,當不 能以此限定本發明實施之範圍,即大凡依本發明令請專利 20 1237779 範圍及發明說明書内容所作之簡單等效變化與修飾,皆應 仍屬本發明專利涵蓋之範圍内。 【圖式簡單說明】 圖1為本發明電腦檔案内文搜尋系統較佳實施例之主 要系統架構圖; 圖2為該較佳實施例之實施步驟主要流程圖;及 圖3為該較佳實施例之關聯動作示意圖。 1Ό 10 1237779 【圖式之主要元件代表符號說明】 1電腦檔案内文搜尋系統 100電腦 20檢索資料 11擷取模組 12純文字轉換模組 13標示語言轉換模組 14儲存模組 15更新監控模組 16檢出模組 17連結模組 50擷取規則 5 1待檢索資料 5 2純文字格式待檢索資料 3原始檔案 53儲存位址 54更新時間 55待檢索檔案 56檢索資料庫 22檢索時限Microsoft Excel ^^ H) ^ PPT (, P ^ Edited by Microsoft Powerpoint ^), ship (that is, edited by "0utlook Express," or Qing Po), and other non-plain text formats. The pure text conversion module 12 goes further. The material to be searched 51 in non- ^ text format is converted into the plain text format to be retrieved 52 in a hidden way. Then, as shown in the steps, the language conversion module 13 is marked, and then the data to be inspected f in each plain text format is 52. , Together with the data to be searched μ corresponds to the original right case 3 (that is, the data 52 to be retrieved is from—the original building case 3 was retrieved) on the storage device of the computer 100 (such as the aforementioned hard ten Uchida absolute or relative storage) The address (including the directory and the right name) 53, and the corresponding original bid 3 generation or update time 54 'to form a corresponding — mark language (checking hnguage) format to be searched 55. In other words, each to be searched Case η except that each of the data to be retrieved 52 itself is contained 'and the original file 3 storage address 53 containing the data to be retrieved 52 is recorded, and the generation or update time of the original file 3 is 54 ° The markup language in this embodiment Formatting Request file 55 is written in the XML language commonly used in editing webpages today, but not limited to this. As shown in step 410, after the foregoing steps, each of the original files 3 20 1237779 has been acquired-data to be retrieved 52 Generated separately—After the broadcast case to be retrieved μ, the storage module 14 stores all the files to be retrieved 55 in the storage device of the computer 100 (such as the aforementioned hard disk) —the retrieval database%. 1 * 0 15 and then step 412 No, the update monitoring module 15 can be selectively activated by the user to monitor the existing original gang cases 3 for changes immediately or at predetermined time intervals, or if new original E3 is generated. File t 3 (including newly generated files), that is, step-by-step in accordance with the foregoing sequence of steps, actuate the operation module U, the plain text conversion module 12, the markup language conversion module 13, and the storage module 14 to update the After the original file 3 is generated, a file 55 to be searched is generated and stored in the search database 56. After the above steps 400 to 412, the previous operation of the system i is 70% and the user's input search is acceptable. Step 414 , 416 and 418 ' After a user enters an appropriate input interface built under Microsoft Windows or other appropriate operating system, and enters the search data 20 that he wants to retrieve, Sogo may exist in one of several original files 3, and the detection module 16 receives the Retrieve data 20, and use the existing comparison search technology from the retrieval database 56 to detect all the to-be-retrieved slots 55 which match the retrieved data 20 and to mark the corresponding original files marked in the retrieved to-be-retrieved files 55. 3 broadcast name and storage address 53 (directory and path), displayed through an appropriate display interface for user reference. In this embodiment, the search data 20 may be a single keyword sink (keyword, such as, patent, etc.), or a conventional technology of other search systems may be used, and a plurality of keyword sinks may be received through distribution such as union, intersection, etc. Forest (Bolen) operation form search condition. At the same time, as shown in step 415, in this embodiment, the detection module 16 can further select 20 1237779 to receive one of the user's selective input to retrieve + 広 ua Γ 22 ', that is, specify the original slot to be retrieved 3 and generate or update it. Time range (such as within a week). Check out: Group 16 and then according to the search time limit 22: Meeting report from ~ 丄 a, η + 乂 Hung $ 冲 washed 10 20 'self-searched in the search 枓 library 56 to double out and match the two 5 10 η 15 round拎 “^ ^ ^ ^ ^ 55 cases to be retrieved in Shuangjing conditions 20 and 22 to avoid unnecessary time spent at home, where the retrieval time limit U can be based on the original _ 3 The new time 54 is compared. Eight or more times as in steps 420, 422 and 424 cake-soil 4, the connection module 17 can be selectively activated by the user, and detected in step 416 respectively All the files 55 to be retrieved detected by the module 16 are executed in each case 55 and connected to the corresponding original files marked in the case 55. The storage location 53 is opened to open the corresponding original files. After 3, the location of the keyword sink of the search data 20 is displayed in white or other eye-catching ways, so that the user can quickly know where the keyword sink is. From the above explanation, the above explanation is provided. The present invention discloses a computer. Podcasting text search system and method thereof, aiming at various formats in computer There are original files to retrieve meaningful data to be searched in advance, and to generate search results for each data to be searched, including the original file's financial address in ㈣language format. In this case, the system can search directly from the files to be retrieved without searching for the content of each original file, so the search time can be significantly reduced, especially when the amount of stored data is huge, the rapid retrieval performance of the system will be It is particularly significant. However, the above are only the preferred embodiments of the present invention. When the scope of implementation of the present invention cannot be limited by this, that is, the simplicity of the scope of the patent and the description of the invention according to the invention patent 20 1237779 Equivalent changes and modifications should still fall within the scope of the invention patent. [Brief Description of the Drawings] Figure 1 is a main system architecture diagram of a preferred embodiment of the computer file text search system of the invention; Figure 2 is the comparison The main flowchart of the implementation steps of the preferred embodiment; and Figure 3 is a schematic diagram of the associated actions of the preferred embodiment. 1Ό 10 1237779 Explanation of the number] 1 Computer file text search system 100 Computer 20 Retrieval data 11 Extraction module 12 Pure text conversion module 13 Markup language conversion module 14 Storage module 15 Update monitoring module 16 Detection module 17 Link module 50 Retrieval rules 5 1 Data to be retrieved 5 2 Text to be retrieved in plain text format 3 Original file 53 Storage address 54 Update time 55 Files to be retrieved 56 Retrieval database 22 Retrieval time limit