TW201019142A

TW201019142A - Dynamic webpage content capturing method

Info

Publication number: TW201019142A
Application number: TW97142784A
Authority: TW
Inventors: guo-ren Zhao; yi-chang Cai; qing-chang Li
Original assignee: guo-ren Zhao; yi-chang Cai; qing-chang Li
Priority date: 2008-11-06
Filing date: 2008-11-06
Publication date: 2010-05-16
Also published as: TWI399653B

Abstract

The present invention provides a dynamic webpage content capturing method, which uses an interpreter to write and establish a plurality of webpage parsers. Each webpage parser is defined to be capable of capturing specific data and include different numbers of webpage parser calling conditions. When practically applying the webpage parsers to capture the content of a source webpage, each webpage parser dynamically calls another corresponding webpage parser for being added into a parsing process according to the data contained in the webpage. The called webpage parser can further call the other webpage parsers according to the processed content without being limited by the number of callings. Accordingly, the required information in the source webpage can be completely taken out for subsequent use by others. In response to fast and frequent update and revision of the webpage content, the webpage parsers can perform editing operations, such as modification, insertion or deletion, with suitable syntax directly by the users or system administrator when the webpage parsers are in executing or non-executing state, so as to adapt to new content and immediately generate effect without the need of requesting the original developer to perform complicated modifying and compiling operations.

Description

201019142 九、發明說明：【發明所屬之技術領域】本發明係關於一種動態式網頁内容擷取方法，、t 一種利用以直譯器（丨nterpreter)建立之網頁剖析器，對網頁内容進行動態式的判斷分析，以精細擷資訊的網頁内容榻取方法。、【先前技術】 ❿ ❹ 請參閱第六圖所示，為本國公告第Ι292ι〇4號「利用網頁模板剖析網頁文件以擷取資料之方法」發明專利案，該案的主要技術是先建立一網頁模板八败再透過一網頁剖析器（Pa⑽）依據該網頁模板之設定，剖析所讀取之網頁文件内容’並將該網頁文件内容中剖析出之資料擷取下來，並記錄於一資料庫，以達到自動祿取出網頁文件内容中所含資訊之目的。但是實施前述方法的基礎是必須預先建立一「網頁模 ^ ’該網頁剖析器才能處理符合該網頁模板敎義的網頁貪料’目前在網際網路上係存在難以計數的站點，每二點可能皆具有其獨立的網頁格式，而且資訊之更新速變侍更加頻繁’隨時都有新資料上載到飼服 1甚至疋大幅度地修改整體版面之配置方式。當面臨到資訊快速更新、網頁内衮f “夕搡用胖… 貝内谷更加多樣化等情況後，前述專利 + ^ 貝模板來擁取網頁内容的作法便顯得不切貫際。因為網頁模板上的今係既疋格式，無法對應至I: 201019142 如此快速變化的來源網頁，該網頁剖析器自然a 網頁的内容加以分辨祿取，甚至是來源網頁中的資科= 僅是單純地改變前、後記载順序，網頁模板便無法源網頁對應。尺…云興此來若是對既有之铜百@ k β + 之來源網頁的格式内容、：：修改調整，使其與欲剖析是重新建立網頁模板必二二能短期地發揮作用，但到的效果卻是相當時間及精神，而所得 ❹ 迅速、資料極度龐雜的情況的更新速度如此更加顯得不具有實用價值5’修改網頁模版的作法便【發明内容】鑑於目別利用網頁模板擷取網頁内容的作法而無法因應更新速度已 k於死板百为4 曰趨頻繁的網頁，導致擷取出之網頁内谷無法符合預期要求，故本發明之、’ 種動態式網頁I 的係提供一 ❿ 唧貝内谷擷取方法，利用以直 (interpreter)所撰芎之網百卹譯益程式 ^ ^ 1 ser)# ^ ^ 孓解析網頁過程中，任一網頁可自動地判讀作举中之絪百i χ 一次貞4析盗（parser) 網頁剖析器（parser)加解柘吁Η匕 …業’以取出作業中網頁各 …心並加以分門別類儲存身視網頁更新内容，透過直#||簡單地修改^二理者本以編輯網頁剖析器便可適應新網頁内纟：二刪除發者進行傳統繁靖之程式編譯作業，該此網^還原始開一’頁剖析器係支 5 201019142 援即時編輯，即當網頁剖析_ 可允許使用者動態地新増、冊m、修改所有二仍 (_「)’讓程式在執行中，即時根據格：析器化而動態因應’而是否須广式的變可藉由程式撰寫出的條件加:::網頁剖析器的運作^ 為達成前述目&，„ 係包含：發明之動態式網頁内容擷取方法以直譯器程式建立出複數網頁内係定義有其資料掏取條件及數目不等之網剖析器條件；頁σ彳析器呼叫指定待分析之來源網頁為-待處理網頁· 該《剖析H全面地檢視待處理中資料類取條件或網頁剖析器中疋否存在符合該資料掏取條件時係將該筆、g貝抖’當資料符合人杯^^ 資抖加以擷取暫存，者咨财所動作且再進，斷是否其資料棟取析作業；、頁σ1】析器進入解判斷待處理網頁是否均已解 ::之所有網頁剖析器及所有待處:::進入解析右仍有等待處理的網頁或等待執行 =行完畢，行解析作業； J析器則持續執輸出解析結果，各參與解析作負責執行之功能自來源網頁中操取所剖析器依據其、s ’該些操取出 201019142 之資訊係加以分類輸出【實施方式】本發明係提出-種動態式網頁内容操取方法，利用以直譯器⑽e一)撰寫而成的網頁剖析器(pa·)，㈣頁内容進行全面解讀，將網頁中所包含的細目資訊逐一操取、分類、儲存以提供予其它的各類應用，前述直譯器包含各種可行的語言，例如但不限定於：pHp、—細>、201019142 IX. Description of the invention: [Technical field of the invention] The present invention relates to a dynamic web content capture method, and t is a web page parser built by a literary translator to dynamically dynamic web content. Judgment analysis, to fine-tune the information on the content of the web page. [Previous Technology] ❿ ❹ Please refer to the sixth figure for the invention of the patent case for the National Publication No. Ι 292 〇 4 “Using the Web Template to Analyze the Web File to Capture the Data”. The main technology of the case is to establish a first The webpage template is defeated by a webpage parser (Pa(10)) according to the setting of the webpage template, parsing the content of the webpage file read, and extracting the parsed data in the webpage file content, and recording it in a database. In order to achieve the purpose of automatically extracting the information contained in the content of the web file. However, the basis of implementing the foregoing method is that a "webpage module" must be pre-established in order to process a webpage that conforms to the meaning of the webpage template. "There is currently an uncountable site on the Internet, and every two points may be All have their own independent web page format, and the information update speed is changed more frequently. 'New data is uploaded to the feeding service at any time. Even the configuration of the overall layout is greatly modified. When faced with rapid information update, web page 衮f “The use of fat in the evening... After the more diverse the Bene Valley, the aforementioned patent + ^ template to capture the content of the web page is not consistent. Because the current format on the web page template does not correspond to I: 201019142 such a fast-changing source web page, the web parser naturally distinguishes the content of the web page, even the source web page is only simple. The page format is changed before and after, and the web page template cannot be mapped to the source page.尺... Yun Xing is the format content of the source page of the existing copper hundred @ k β +, :: modify the adjustment, so that it is necessary to re-establish the template of the web page, it can play a short-term role, but to The effect is quite time and spirit, and the update speed of the data is very fast and the data is extremely complicated. The speed of updating is so much less practical. 5' Modifying the template of the webpage [invention content] In view of the use of the webpage template to capture the webpage The content of the content can not respond to the update speed has been a rigid web page, resulting in a web page that is not in line with the expected requirements, so the invention of the dynamic web page I provides a 唧Bene Valley's method of extracting, using the interpreter to write a web-based translation program ^ ^ 1 ser) # ^ ^ 孓 In the process of parsing a web page, any web page can be automatically interpreted as a hundred i χ one 贞 4 arbitrage (parser) web parser (parser) add 柘柘 Η匕业业业业业业业业业业业业业业业业取出取出取出取出取出取出取出取出取出取出取出取出取出取出取出取出取出取出取出取出By directly #||simplely modifying the ^2 ruler to edit the web profiler to adapt to the new web page: the second delete the sender to perform the traditional arbitrarily compiled program, the net ^ also originally opened a 'page parser Department 5 201019142 Aid for instant editing, that is, when the web page is parsed _ allows the user to dynamically update the new album, m, and modify all the two still (_")' to let the program execute, and dynamically respond according to the grid: 'Whether it is necessary to change the conditions that can be written by the program plus::: The operation of the web profiler ^ To achieve the above objectives &, „ contains: Invented dynamic web content capture method to the interpreter program The network parser conditions with different data acquisition conditions and numbers are defined in the plural webpages; the page σ parser call specifies the source webpage to be analyzed as the to-be-processed webpage· The parsing H comprehensively examines the pending If there is a condition in the data class or a web page parser that is in compliance with the data retrieval condition, the pen and the g-be shaken are used when the data meets the human cup ^^ 抖撷撷撷暂暂 , , Re-entry, break The data ridge takes the analysis operation; , page σ1] The arsenal enters the solution to determine whether the pending web page has been solved:: all the web page parsers and all the waiting places::: enter the parsing right, there are still waiting for processing web pages or waiting for execution = After the line is completed, the line parsing operation; the J parser continues to output the parsing result, and each participating parsing function is responsible for executing the function. The parser is operated from the source web page according to the s 'the operation information of the 201019142. [Embodiment] The present invention proposes a dynamic web content operation method, which utilizes a web page parser (pa·) written by an interpreter (10) e), and (4) a comprehensive interpretation of the page content, which is included in the web page. The detailed information is handled, classified, and stored one by one for other various applications. The aforementioned interpreter includes various feasible languages, such as but not limited to: pHp, - fine >

PeM BaS|e'ASP· · _等各種程式；利用直譯器程式建立本發明的剖析器時’甚至允許當網頁剖析器(parser)正在剖析網頁擷取資料時，仍可動態地新增、刪除、修改所有相關的剖析器（pa「ser)，讓程式可以在執行中，即時根據網頁内容格式的變化而動態因應，在設計上，是否項暫時中止該網頁剖析器的運作，可藉由程式撰寫出的條件加以控制。請參考第一圖所示，為—待解析的網頁示意圖，該網頁僅作為範例說明以幫助理解本發明之技術，並非限制為PeM BaS|e'ASP· · _ and other programs; when using the interpreter program to build the parser of the present invention, 'even allowing the web parser to dynamically add and delete when the web page is being parsed. Modify all relevant parsers (pa "ser", so that the program can dynamically respond to changes in the content format of the webpage during execution. In design, whether the program temporarily suspends the operation of the web parser can be used by the program. The written conditions are controlled. Please refer to the first figure, which is a schematic diagram of the webpage to be parsed. This webpage is only used as an example to help understand the technology of the present invention, and is not limited to

具有此種格式之網頁方能解析。該網頁可供任何人以支援其格式的劇覽器直接間靖，n+A • 見益置接閱讀’且包含有網址資訊⑴）及本文資訊(12) ’在本文資訊(12)中除了單純的文字資料⑽)之外亦可此含括圖片播、聲音檀、影像擋、超連結等各型態的資料，本發明可顧取之資料型態並不褐限為文字資枓（㈣’任何型態的資料均可利用本發明的技術加以榻取，而在以下的詳細說明中，將以文字資料⑽）為範例說 201019142 明。進步參考第二圖所示，本發明之主要步驟包含有· 建立網頁剖析器（201)，此步驟係利用直譯器程式撰寫出複數個網頁剖析器（pa⑽）’各網頁剖析器内係定義有貧料擷取條件及數目不等之網頁剖析器呼叫條件，當滿足網頁刮析器呼叫條件時係可呼叫其它的網頁剖析器加入解 f作業；1亥等網頁剖析器可以根據使用者自訂需求或根據剖析器本身負責的功能而將其分類成不等群組，網頁剖析 ^此，間不必然具有主從g係，亦可為平行關係。舉例而:，遠複數個網頁剖析器依功能規劃為群組，如第三圖 :丁二T將其劃分為負責拆解網址的網址剖析器、負責拆 Γ八二的網頁成分剖析器、-階成分剖析器、二階成刀口丨J析器…N階成分立丨丨杯哭析器’該些剖析器可設計為共用， P使針對不同的來源網址仍可適用。特性⑷如：方式如第四圖所示’是根據不同的來源網址特险（例如不同網站、不同子路徑源網址之網頁㈣^ ^屬該種來亦可進一牛五争母個網頁剖析器群組之下進乂再建立有其專屬的吹雜知^ 不限制要在同-群惟實際應用時，並不同^ 匕層之下的網頁剖析器要逐-執行，、且之間的網頁剖析器可以視網頁中所包含的〜、呼叫其它—個或多個網頁刊析 / 式執行料，例如圖中所：：：成非規則的跳躍之間的呼叫順序。噩戎表不為不同網頁剖析器讀入使用者指定之待分斤、、祠頁（202)，使用者可以指定 •201019142 -::個網址以選定單個或複數個待分析的來源網頁，右疋心疋單—來源網站，亦可設定要在該網站中取出幾層子路=的網頁内容，例如使用者指定叫:\\333\時，若僅 =對-層子路徑的網頁内容加以剖析，則會加以處理的網頁内容將包含到http:Uaaa灿匕心的網頁内容。後二=行網頁剖析器(203)’當選定好待分析的網頁剖析器時則边已預先建立好的網頁剖析器，在執行網頁中的資料，^個網頁剖析器可依據所職予的功能取出網頁其它網頁剖析器…“：時動態地呼叫以執行其功能業’使被呼叫之網頁剖析器得現有其它網頁剖析器所負責之工^ =執；：=若發頁剖析器，換古之，畚一,田 #再進-步呼叫該網其它網頁气析；谁紘頁剖析器可呼叫一個或多個限制二=:業，而呼叫的次數、階數亦不或是否須呼叫其它網頁剖析否:==的資料時容是否有符合資料揭取條件及網頁f地檢視網頁内存在，若有符合條件的資料存在條件的資料制。排列、出現的先後順序所限頁分畢(2。4)，在執行網能在網頁中又包含其它網頁之待處理網頁以外，可、’、α，於S亥連結的網頁亦可 9 201019142 能具有可供擷取的内容，則網頁網頁自動納入至解析作業中理時亦可將該網頁處理，故可能有若干網頁传暫時間點僅對單- 』貝係皙時處於等待狀離，且簞一個網頁剖析器可能呼叫多個。』祈斋進入解析作業，故有可能其它複數個網頁、注…J析器亦正處於等待狀態中，尚待上一個網頁剖析器執行完 a ^ ^ 交丹依-人執仃，若判斷結果疋仍有切處理的網頁或是有網頁剖析器亦在等待執行 ❹ ❿ 中則持續則-步驟之執行網頁剖析器（2〇3)，反之、，若網頁剖析器已處理至最細節的資料 ^ „ 町貧料而不再呼叫其它的網頁刮析盗，且所有網頁均已處理完畢，則結束作業。輸出解析結果（205)，為網百> 〗在網頁剖析器持續執行之過程中，除了會動態地呼叫装亡、态# & , 吁”其匕適當的網頁剖析器之外，同時依據所設計的功能棟取出所需的資訊，該些摘取出來的資訊係加以分類、輸出至-資料庫或輸出成為-槽案（如 XML、EXCEL··.格式）等，|锉在刑 ” ^寻其儲存型態係不受特定限制。請參考第五圖所示，為本發明於執行時的其中一種實施例’以擷取第一圖所示的網頁為範例配合說明，假設所需的資料為該項產品的相關規格介紹m資料（12〇)中所列舉的各項細目’則利用本發明的作法能將各項目中的細節均完整擷取出來。首先使用者於^曰疋網頁來源（本實施例以p C η 〇 M E 之購物網頁為例)後，該網頁的内容可先經由一網頁成分刳 ^器（HTML Parser)拆解成不同部分，例如網頁中除了文字資料（12Q)以外，亦包含有上方的產品標題網框、左方的 20 201019142 產品列舉網框、中央的產品影像圖檔等頁成分剖析器可以僅取出文字、網A web page with this format can be parsed. This webpage can be used by anyone to support the format of the browser directly, n+A • see and read the 'and contains the website information (1)) and this article information (12) 'In addition to this information (12) In addition to the simple text data (10), it can also include various types of information such as picture broadcast, sound sandal, image block, and hyperlink. The data type that can be taken by the present invention is not limited to text resources ((4) 'Any type of data can be taken by the technique of the present invention, and in the following detailed description, the text data (10)) will be used as an example to say 201019142. Progress Referring to the second figure, the main steps of the present invention include: creating a web page parser (201), which is to compose a plurality of web parsers (pa(10)) using an interpreter program. Web page parser call conditions with different conditions and numbers of poor materials, when the web page scraper call conditions are met, other web page parsers can be called to join the f job; 1Hai web page parser can be customized according to the user The requirements are classified into unequal groups according to the functions that the parser itself is responsible for. The web page analysis ^ does not necessarily have a master-slave g system or a parallel relationship. For example: a number of web page parsers are grouped according to function, such as the third picture: Ding Er T divides it into a URL parser responsible for disassembling the web address, and a web page parser that is responsible for the demolition of the web page. The step component parser, the second-order knives, the arranging device, the N-stage component, the crepe-cracker, the parsers can be designed to be shared, and P can be applied to different source URLs. Features (4) such as: the way as shown in the fourth figure 'is based on different source URLs (for example, different websites, different sub-path source URLs (4) ^ ^ is a kind of this kind of can also enter a bullish five contender web page parser The group is created under the group and has its own proprietary knowledge. It is not restricted to the same-group only when it is actually applied, and the webpage parser under the different layer is to be executed one by one, and between The parser can view the ~ or multiple other web pages in the web page to report/execute the material, for example, in the picture::: the order of the calls between irregular jumps. The webpage parser reads the user-specified bills and pages (202), and the user can specify • 201019142 -:: web addresses to select single or plural source web pages to be analyzed, right-handed-single-source The website can also set the content of the webpage to be removed from the website. For example, if the user specifies: \\333\, if only the content of the webpage of the -layer sub-path is parsed, it will be processed. The content of the webpage will be included in http:Uaaa The content of the web page of the heart. The second two = line web parser (203) 'When the web parser to be analyzed is selected, the web parser has been pre-established, and the web page parser can be executed in the web page. According to the function of the job, the other webpage parser is taken out of the webpage... ": Dynamically call to perform its function industry" so that the called webpage parser can be responsible for the work of other webpage parsers; == Page parser, for the ancients, 畚一, 田# 再进-step call the other pages of the network gas analysis; who page parser can call one or more restrictions two =: industry, and the number and order of calls No or whether it is necessary to call other webpages. No: == Whether the data content of the data conforms to the data retrieval conditions and the webpages of the webpages, if there are conditions, the data system exists. The limited pages are divided into (2. 4). In addition to the pending web pages that can include other web pages in the webpage, ', α, the webpage linked to S Hai can also be available for 2010 19142. Content, then the page When the webpage is automatically included in the parsing operation, the webpage can also be processed. Therefore, there may be a number of webpages that are temporarily waiting for a single-page, and a webpage parser may call more than one. 』斋斋 into the analysis of the operation, it is possible that other multiple pages, notes ... J is also in a waiting state, still waiting for the previous web profiler to complete a ^ ^ 交丹依-人仃, if the result疋The webpage that still has a cut or the webpage parser is waiting for execution 则则 Continues the step-by-step execution webpage parser (2〇3), otherwise, if the web parser has processed the most detailed data ^ „ The town is in poor condition and no longer calls other web pages to scrape the pirates, and all the web pages have been processed, then the job ends. The output parsing result (205) is in the process of continuous execution of the web parser, except that the webpage parser is dynamically called, and the appropriate webpage parser is called, and The designed function building takes out the required information, and the extracted information is classified, output to the -database or output into a - slot case (such as XML, EXCEL·. format), etc. ^ Finding its storage type is not subject to specific restrictions. Please refer to the fifth figure, which is an example of the implementation of the present invention, which takes the webpage shown in the first figure as an example, and assumes that the required information is the relevant specification of the product. The items listed in (12〇) can be used to extract the details of each item by using the method of the present invention. First, after the user uses the webpage source (in this embodiment, the shopping webpage of p C η 〇 ME as an example), the content of the webpage can be first disassembled into different parts via a webpage component (HTML Parser). For example, in addition to the text data (12Q), the webpage also includes the product title frame above, the left 20 201019142 product list frame, the central product image file, etc. The page component parser can only take out text, net

Hr m Έ j, /V α 斗（2〇)以進行後續處理。該網頁成分剖析器係判斷 v叫邊些•文子資料Hr m Έ j, /V α bucket (2〇) for subsequent processing. The webpage component parser is judged v.

在有一階資料、二階資料N '«丨自育科等’若存在，則對應的剖析器進行處理’例如 ^ 爽埋器、記情贈、麻虛光碟機、螢幕/重量、網路、盆 ’、，、他4項目視為是一則-Ρ皆成分剖析器可辨識掏取出益’ “丨―”可視為是二階的資 -：：·’處理器的廠牌 ❹ Ψ , — 3 而一階資訊可定義為型號R理’母—個項目内所含的資料均可由預先設計網頁剖析器擷取出來，例如、匕U體的谷量為2G、格式兔 DDRII 667MHz ;硬碑的衮蕃么拾式為㈣I!為25〇G、格式為SATA等。惟刖述第五圖的流程僅是舉其中—例說明而已際執行網頁剖析器時，並實个限制其運仃順序必然是從一产白、二階…N階依序進行，有 t 被g ^ 、有了月b在執行負責第五階的剖析器夺’於解析過程中又判斷屮古4人哲本认次〜列斷出I &含第三階剖析器所負貝的貝二’故適時動態地呼叫對應的第三階剖析器執行。、 …以第-圖所示的網頁為例，在分析發現到具有其它產品的連社 7迓、，’σ(Ί3〇) ’例如‘下一個商品，，矽 ;連結⑽)會開啟另-新網頁以介紹另—款產品，本發明= 此情況時自動進入該新網頁以分析、揭取該新網頁不同功能頁剖析器以修改便本發明係利用直譯器程式（丨nterpreter)建立的網頁剖析H(pa⑽），故對於使用或管理該網 (_0的使用者而言，可直接以支援的語法加 .201019142 能立刻適應不同的網頁，從中擷取頁剖析器不須要交還原始的程式開發者==些網須利用特定的編譯器（⑶mpi丨叫執行編業編撰、不當高的應用靈活度；而各個網頁剖析業，故具有相臨到的實際内纟’動態地呼叫對應 :丁時，係視面中的資訊均能被完整取出。，析器，使網頁對於所屬領域具有通常知識者而言，之不同修正及變化，均不偏離本發明 2本發明所為本發明已敘述特定的具體實施例神。雖然應被不當地限制於該等特定具體實施例！：…明不步驟之進盯順序。在實施本發明 ^ 屬領域中具有通常㈣者*言_ =式方面，對於所蓋於下列申請專利範圍之内易知之不同修正亦被涵固叭間早說明第一圖係欲利用本發明之古土 1、，此月之方法加以揭取網頁内容的雜 •頁示意圖。In the case of a first-order data, second-order data, N '«丨自育科, etc., if it exists, the corresponding parser will process it. For example, ^ 爽器 , , , , , , , , , , , , , , , , , , , , , , , , ', ,, his 4 items are considered to be one - all of the components of the parser can identify the benefits of taking out - "丨 -" can be regarded as the second-order capital -:: · 'processor's label ❹ —, — 3 and one The order information can be defined as the model R. The data contained in the project can be taken out by the pre-designed webpage parser. For example, the volume of the U-body is 2G, the format rabbit DDRII 667MHz; The pick-up is (four) I! is 25 〇 G, the format is SATA. However, the flow of the fifth diagram is only for the sake of example - when the web parser is executed, and the order of the operation is limited, it must be from the order of white, second, ... N order, with t being g ^, With the month b in the execution of the fifth-order parser, in the process of analysis, it is judged that the four people of the ancient times have recognized the number of times ~ the column breaks out I & the second-order parser with the negative shell 'Therefore, the corresponding third-order parser is called dynamically at the appropriate time. For example, in the case of the web page shown in the first figure, in the analysis, it is found that there are other products, such as Lianshe, and 'σ(Ί3〇) 'such as 'next product, 矽; link (10)) will open another - The new webpage introduces another product, and the present invention = automatically enters the new webpage in this case to analyze and extract the new webpage different function page parser to modify the webpage established by the invention using the interpreter program (丨nterpreter) Analyze H(pa(10)), so for users who use or manage the network (_0, they can directly use the supported syntax plus .201019142 to adapt to different web pages immediately, and the page parser can be retrieved without returning the original program development. == Some nets must use a specific compiler ((3) mpi screaming to perform editing and editing, improper application flexibility; and each web page analysis industry, so there is an actual internal 纟纟 'dynamic call corresponding: Ding Shi, The information in the viewing plane can be completely taken out, and the different modifications and changes of the webpage for those having ordinary knowledge in the field do not deviate from the present invention. The specific embodiments are described in the context of the specific embodiments. Although they should not be limited to the specific embodiments of the invention, the order of the steps of the present invention is not limited to the order of the steps in the field of implementing the invention. The different amendments that are within the scope of the following patent application are also clarified. The first figure is intended to make use of the ancient soil of the present invention 1. This month's method is used to extract the page content of the webpage. schematic diagram.

第第二圖·•係本發明之方法流程圖。三圖·係本發明建立網頁剖析器群組-實施例之示意第四圖:係本發明建立網頁剖析器群組另一實施例之示意圖。第五圖··係本發明—實施例之流程圖。第/、圖.係本國公告第1292104號「利用網頁模板剖析 12 201019142 其方法流程圖網頁文件以擷取資料之方法」【主要元件符號說明】 ’(1 1)網址資訊 (12)本文資訊 (120)文字資料 (130)連結Fig. 2 is a flow chart of the method of the present invention. BRIEF DESCRIPTION OF THE DRAWINGS The present invention is directed to a web page parser group - an illustration of an embodiment. FIG. 4 is a schematic illustration of another embodiment of the present invention for creating a web page parser group. Figure 5 is a flow chart of the present invention - an embodiment. No. 1292104 "Using the Web Page Template to Analyze 12 201019142 Method of Flowchart Web Page File to Capture Data" [Key Component Symbol Description] '(1 1) URL Information (12) This article information ( 120) Text data (130) link

1313

Claims

201019142 X. Application for Patent Park: 1. Two kinds of dynamic web content capture methods, including: Establishing a complex network with an interpreter program. J analysis of theft, each web profiler condition; body (four) take the conditions and the number of web profiler calls to specify the source page to be analyzed as a pending web page, κ web profiler comprehensively check Yan Xiang add words and information The conditions are bundled, and whether there is any data in the webpage that meets the conditions of the caller of the time-of-day and the number of times of the page of the cattle. The person who is a rough person is qualified to serve the webpage... Γ take the temporary storage, when the data symbol L1 is analyzed (4) Material, the material is called to enter the analysis operation, the called action and the π κ network stomach analysis 15 pure line data acquisition analysis work and then step-by-step judgment whether it is necessary to call other web profiler to enter the solution two: analysis results Each webpage parser participating in the parsing operation extracts the required information according to the sub-function and the sub-function from the source webpage, and the information of the scrambled information is classified and output.出取取 1 Τ 动态动态动态动态动态动态动态动态动态动态动态动态动态动态动态网页网页网页网页网页网页网页网页网页网页网页网页网页网页网页网页网页网页网页网页网页网页网页网页网页网页网页网页The dynamic webpage content as described in item 1 of the claimed patent scope ’ 'the plural webpage parser is different in different groups. J d knife is 4 as described in the patent scope of the first paragraph of the dynamic web content capture method 'Web profile analysis defined in the web profiler...= 14 201019142 Included in the homepage of the job in the store When the link is made, the other web page is included in the parsing operation. 5 · For the dynamic web content collection method described in item 1 to 4 of the patent scope, the data of the web page depreciator includes text file image files, sound files, image files or hyperlinks. . 6. The method for extracting dynamic web content as described in claim 5, wherein the outputting is output to a database in the step of outputting the analysis result. According to the dynamic extraction method described in claim 5, in the step of outputting the analysis result, the 撷 = is output as a file. * <Bei said 8. The method of moving the method described in the sixth paragraph of the patent application 'The interpreter program is ΡΗΡβ in-page operation 9. As described in the patent application scope, the dynamic net extraction method, The interpreter program is Javascript.撷❹ 撷❹ 方法 11 11 11 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 The dynamic network described in the Scope 6 item is the Basic Translator program. Internal order The dynamic program described in item 6 of the patent application is ASP. Each of the = the patented scope of the scope of the dynamic web page inside the program, the program is PHP. Each of the dynamic webpage contents as described in claim 7 of the patent scope 15 201019142 Method of acquisition Method of acquisition Method of capture Λ The program of the stencil is Javascript. For example, in the dynamic webpage content described in claim 7, the interpreter program is Perbu's dynamic webpage content as described in claim 7 '. The interpreter program is Basic. • Dynamic web content as described in item 7 of the patent application. The interpreter program is ASP.

XI. Schema: as the next page

26