CN101984429A - Method and device for acquiring destination page, search engine and browser - Google Patents

Method and device for acquiring destination page, search engine and browser Download PDF

Info

Publication number
CN101984429A
CN101984429A CN 201010531460 CN201010531460A CN101984429A CN 101984429 A CN101984429 A CN 101984429A CN 201010531460 CN201010531460 CN 201010531460 CN 201010531460 A CN201010531460 A CN 201010531460A CN 101984429 A CN101984429 A CN 101984429A
Authority
CN
China
Prior art keywords
target pages
path
dom
state path
pages
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 201010531460
Other languages
Chinese (zh)
Other versions
CN101984429B (en
Inventor
潘云泓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN2010105314609A priority Critical patent/CN101984429B/en
Publication of CN101984429A publication Critical patent/CN101984429A/en
Application granted granted Critical
Publication of CN101984429B publication Critical patent/CN101984429B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a method and a device for acquiring a destination page, a search engine and a browser. The method comprises the following steps of: capturing a foundation page corresponding to a received uniform resource locator (URL) and a script of the foundation page by the search engine; and analyzing the captured foundation page and the captured script to generate over one state path comprising dynamic information and corresponding to the foundation page, and capturing the destination page by using the generated state path, wherein the state path comprises the URL of the foundation page, position information of a document object model (DOM) event for generating the dynamic information in the foundation page and a callback function index corresponding to the DOM event. The search engine can capture dynamic contents in the page when searching the destination page.

Description

Obtain method, device, search engine and the browser of target pages
Technical field
The present invention relates to Internet technology, particularly a kind of method, device, search engine and browser that obtains target pages.
Background technology
Along with developing rapidly of network, the internet becomes the carrier of bulk information, how to extract effectively and utilizes these information to become a great challenge.Search engine becomes the inlet and the guide of user capture internet as the instrument of auxiliary people's retrieving information.Web crawlers (Spider) is a program of extracting webpage automatically, is the important composition of search engine.
The legacy network reptile is from the URL(uniform resource locator) (URL) of one or several Initial pages, grasp the basic page of this URL, and current basic content of pages resolved the URL that obtains target pages, the line data of going forward side by side is handled, after comprising that setting up webpage summary, snapshot, index also stores, return to browser and select for the user.
Yet, the legacy network reptile is when obtaining the URL of target pages, only can grasp static page, but continuous development along with Internet technology, the content of the page from before static mode change dynamical fashion into and generate data, the legacy network crawler technology obviously can not satisfy this transformation demand, promptly can not grasp the dynamic content of the page.
Summary of the invention
Search engine the invention provides a kind of method, device, search engine and browser that obtains target pages, so that can grasp the dynamic content in the page when the ferret out page.
Concrete technical scheme is as follows:
A kind of method of obtaining target pages, this method may further comprise the steps:
A, grasp the basic page of received uniform resource position mark URL correspondence and script that should the basis page;
B, the basic page and the script that grasp are analyzed, one or more that produces described basic page correspondence comprises the state path of multidate information, utilizes the state path that produces to grasp target pages; Wherein, described state path comprises: the positional information of the DOM Document Object Model DOM incident of generation multidate information and the call back function index of described DOM incident correspondence in the URL of the basic page, the basic page.
Wherein, described step B specifically comprises:
In the extracting process of the described basic page and script, download each DOM node, the DOM node execution in step B11 to B13 to downloading to successively, after the download that finishes all DOM nodes, execution in step B14;
B11, judge whether the current DOM node that downloads to is the script label, if the DOM node that the next one is downloaded to goes to step B11, otherwise, execution in step B12;
B12, judge whether the current DOM node that downloads to contains DOM incident and call back function, and if not, the DOM node that the next one is downloaded to goes to step B11, if, execution in step B13;
B13, the DOM incident of utilizing the current DOM node that downloads to comprise produce state path, and the state path that produces is kept in the state path formation, and the DOM node that the next one is downloaded to goes to step B11;
B14, obtain the pairing target pages of each state path in the state queue one by one, judge whether to produce new content of pages or page jump takes place, the state path that produces new content of pages or generation page jump is defined as the state path of described basic page correspondence.
Perhaps, described step B specifically comprises:
Download each DOM node in the extracting process of the described basic page and script, the DOM node execution in step B21 to B23 to downloading to successively is until the download that finishes all DOM nodes;
B21, judge whether the current DOM node that downloads to is the script label, if the DOM node that the next one is downloaded to goes to step B21, otherwise execution in step B22;
B22, judge whether the current DOM node that downloads to contains DOM incident and call back function, and if not, the DOM node that the next one is downloaded to goes to step B21, if, execution in step B23;
B23, the DOM incident of utilizing the current DOM node that downloads to comprise produce state path;
B24, obtain the pairing target pages of this state path, judge whether to produce new content of pages or produce page jump, if, determine that this state path is the state path of described basic page correspondence, the DOM node that the next one is downloaded to goes to step B21; Otherwise the DOM node that the next one downloads to is gone to step B21.
In the aforesaid way, judge whether that page jump takes place to be comprised:, then determine to take place page jump if the URL of the target pages that obtains and the described basic page is different.
Particularly, judge whether that producing new content of pages comprises: the target pages and the described basic page that obtain are carried out sentence signature or character string comparison, if comparison result shows target pages and has different content of pages with the basic page, then determine to produce new content of pages; Perhaps,
The target pages that calculating is obtained and the similarity of the described basic page have different content of pages if result of calculation shows target pages with the basic page, then determine to produce new content of pages.
Wherein, the positional information of described DOM incident comprises: the path Xpath of DOM node identification, DOM node and DOM event identifier.
Further, after described step B, this method also comprises:
The state path of the basic page correspondence that C, storing step B produce and the snapshot of the target pages that grasps, the index of foundation and storage target pages.
A kind of method of obtaining target pages after said method, comprises;
After receiving searching request from browser, the index of the target pages of keyword that searching request comprises and storage is mated, the pairing state path of target pages of coupling is included in and returns to browser in the Search Results, obtain corresponding target pages for the state path that browser utilizes the user to select.
In addition, can also comprise in the described Search Results: the SNAPSHOT INFO of the target pages of described coupling;
After receiving the SNAPSHOT INFO of the target pages that user that browser returns selects, return the snapshot of corresponding target pages to described browser.
Further, described with the coupling the pairing state path of target pages be included in return to browser in the Search Results after, this method also comprises:
After receiving the state path that user that described browser sends selects, the state path of selecting according to the user sends the target pages request to the target pages website, so that described target pages website pushes target pages to described browser.
A kind of method of obtaining target pages, this method comprises:
Browser receives the Search Results that comprises state path that search engine returns after search engine sends searching request;
State path according to the user selects sends the target pages request to the target pages website;
Receive the target pages that described target pages website pushes;
Wherein, the described Search Results that comprises state path is that described search engine adopts the described method of claim 8 to return.
A kind of device that obtains target pages, this device comprises:
First placement unit is used to grasp the basic page of received uniform resource position mark URL correspondence and script that should the basis page;
Analytic unit is used for the basic page and script that described first placement unit grasps are analyzed, and one or more that produces described basic page correspondence comprises the state path of multidate information; Wherein, described state path comprises: the positional information of the DOM Document Object Model DOM incident of generation multidate information and the call back function index of described DOM incident correspondence in the URL of the basic page, the basic page;
Second placement unit, the state path that is used to utilize described analytic unit to produce grasps target pages.
Wherein, described analytic unit specifically comprises: first judge module, second judge module, the first path generation module and the first path determination module;
Described first placement unit is downloaded each DOM node in the extracting process of the described basic page and script thereof, and the current DOM node that downloads to sent to described first judge module, after the download that finishes all DOM nodes, send definite notice to the described first path determination module;
Described first judge module is used to judge whether the current DOM node that downloads to is the script label, if, trigger described first placement unit and download next DOM node, otherwise, the judgement notice sent to described second judge module;
Described second judge module, be used to judge whether the current DOM node that downloads to contains DOM incident and call back function, if not, trigger described first placement unit and download next DOM node, if send exercise notice to the described first path generation module;
The described first path generation module, after being used to receive described exercise notice, utilize the current DOM node that downloads to produce state path, and the state path that produces is kept in the state path formation, trigger described first placement unit and download next DOM node;
The described first path determination module, when being used to receive described definite notice, trigger the target pages that described second placement unit obtains each state path correspondence in the state queue one by one, the result that obtains according to described second placement unit judges whether to produce new content of pages or page jump takes place, and the new content of pages that produces or state path that page jump takes place is defined as the state path of described basic page correspondence.
Particularly, described analytic unit can comprise: the 3rd judge module, the 4th judge module, the second path generation module and the second path determination module;
Described first placement unit is downloaded each DOM node in the extracting process of the described basic page and script thereof, and the current DOM node that downloads to is sent to described the 3rd judge module, until the download that finishes all DOM nodes;
Described the 3rd judge module is used to judge whether the DOM node of current download is the script label, if, trigger described first placement unit and download next DOM node, otherwise, the judgement notice sent to described the 4th judge module;
Described the 4th judge module, be used to judge whether the current DOM node that downloads to contains DOM incident and call back function, if not, trigger described first placement unit and download next DOM node, if send exercise notice to the described second path generation module;
The described second path generation module, when being used to receive exercise notice, the DOM incident of utilizing the current DOM node that downloads to comprise produces state path, and the state path that produces is sent to the described second path determination module;
The second path determination module, when being used to receive state path, trigger described second placement unit and obtain the pairing target pages of this state path, the result that obtains according to described second placement unit, judge whether to produce new content of pages or produce page jump, if determine that this state path is the state path of described basic page correspondence, trigger described first placement unit and download next DOM node, download next DOM node otherwise trigger described first placement unit.
Wherein, judge whether that page jump takes place to be comprised:, then determine to take place page jump if the URL of the target pages that obtains and the described basic page is different.
Judge whether that producing new content of pages comprises: the target pages and the described basic page that obtain are carried out sentence signature or character string comparison, if comparison result shows target pages and has different content of pages with the basic page, then determine to produce new content of pages; Perhaps,
The target pages that calculating is obtained and the similarity of the described basic page have different content of pages if result of calculation shows target pages with the basic page, then determine to produce new content of pages.
Particularly, the positional information of described DOM incident comprises: the path Xpath of DOM node identification, DOM node and DOM event identifier.
Further, this device also comprises:
Storage unit is used to store the state path of the basic page correspondence that described analytic unit produces and the snapshot of described second target pages that placement unit grasps, and sets up and store the index of target pages.
A kind of search engine, this search engine comprises: above-mentioned device, user interface section and the search processing of obtaining target pages;
Described user interface section is used to receive the searching request from browser, and the keyword that comprises in this searching request is sent to described search processing; The Search Results that described search processing is sent returns to browser, obtains corresponding target pages for the state path that browser utilizes the user to select;
Described search processing is used for the index of the target pages of described keyword and described device cell stores is mated, and the pairing state path of the target pages that matches is included in sends to described user interface section in the Search Results.
Further, also comprise in the described Search Results: the SNAPSHOT INFO of the target pages of described coupling;
Described user interface section, the SNAPSHOT INFO of the target pages that the user who is used for that also browser is returned selects sends to described search processing; The snapshot of the target pages that described search processing is sent returns to described browser;
Described search processing also is used for the SNAPSHOT INFO according to the target pages of described user's selection, obtains the snapshot of corresponding target pages and send to described user interface section from described storage unit.
Further, this search engine also comprises: path resolution unit and network interface unit;
Described user interface section, also be used to receive the state path that user that browser sends selects after, this state path is sent to described path resolution unit;
Described path resolution unit is used for generating the target pages request according to the state path that receives;
Described network interface unit is used for the target pages request that described path resolution unit generates is sent to the target pages website.
A kind of browser, this browser comprises: network side interface unit, path resolution unit and user side interface unit;
Described network side interface unit is used to receive the Search Results that comprises state path that sends as search engine as described in the claim 19; The target pages request that described path resolution unit is sent sends to the target pages website;
Described user side interface unit is used for the Search Results that described network side interface unit receives is shown to the user; The state path that the user is selected sends to described path resolution unit;
Described path resolution unit, the state path that is used for selecting according to the user generates the target pages Intra-request Concurrency and gives described network side interface unit.
As can be seen from the above technical solutions, the present invention is based on analysis to the basic page and script thereof, introduce the notion of state path, promptly produce the state path that comprises multidate information of basic page correspondence, the target pages that this state path is pointed to comprises the dynamic content of the page, makes the subsequent searches engine can grasp the dynamic content in the page when the ferret out page.
Description of drawings
Fig. 1 is a main method process flow diagram provided by the invention;
The detailed method process flow diagram that Fig. 2 provides for the embodiment of the invention one;
The process flow diagram of the generation state path that Fig. 3 provides for the embodiment of the invention two;
The process flow diagram of the generation state path that Fig. 4 provides for the embodiment of the invention three;
The browser that Fig. 5 provides for the embodiment of the invention four obtains the process flow diagram of target pages;
The browser that Fig. 6 provides for the embodiment of the invention five obtains the process flow diagram of target pages;
The browser that Fig. 7 provides for the embodiment of the invention six obtains the process flow diagram of target snapshot;
Fig. 8 is a structure drawing of device synoptic diagram provided by the invention;
Fig. 9 is a kind of structural representation of analytic unit among Fig. 8;
Figure 10 is the another kind of structural representation of analytic unit among Fig. 8;
Figure 11 is a search engine architecture synoptic diagram provided by the invention;
Figure 12 is a browser structure synoptic diagram provided by the invention.
Embodiment
In order to make the purpose, technical solutions and advantages of the present invention clearer, describe the present invention below in conjunction with the drawings and specific embodiments.
Main method provided by the invention can may further comprise the steps as shown in Figure 1:
Step 101: grasp the basic page of received URL correspondence and script that should the basis page.
Step 102: the basic page and the script that grasp are analyzed, produced one or more corresponding and comprise the state path of multidate information with this basis page; Wherein, state path comprises: the call back function index of the DOM incident correspondence of the positional information of DOM Document Object Model (DOM) incident of generation multidate information and this generation multidate information in the URL of the basic page, the basic page.
Step 103: utilize the state path that produces to grasp target pages.
Above-mentioned method flow shown in Figure 1 is the performed operation of search engine, further, search engine also can be stored the state path that produces, so that after receiving the searching request of browser, return the Search Results that comprises state path to browser, obtain corresponding target pages for the state path that browser utilizes the user to select.
Below by specific embodiment said method is described in detail.
Embodiment one,
The detailed method process flow diagram that Fig. 2 provides for the embodiment of the invention one, as shown in Figure 2, this method can specifically may further comprise the steps:
Step 201: search engine receives URL.
Search engine can grasp URL on automatic batch ground, backstage.
Step 202: grasp the basic page of received URL correspondence and script that should the basis page.
Can there be following two kinds in the corresponding relation of the basis page and script: have scripting documents in the html tag that one, basic page source code comprise.Two, have the link of scripting documents in the html tag that basic page source code comprises, scripting documents is pointed in the link of this scripting documents; That is to say that the basic page is two different documents with scripting documents, but has adduction relationship.
Step 203: the DOM node that downloads in the basic page that grasps is analyzed, whether the script of judging DOM incident correspondence in the DOM node produces multidate information, produces one or more corresponding with this basis page according to analysis result and comprises the state path of multidate information and utilize this state path to obtain target pages; Wherein, state path comprises: the call back function index of the DOM incident correspondence of the positional information of the DOM incident of generation multidate information and this generation multidate information in the URL of the basic page, the basic page.
The script that relates among the present invention includes but not limited to: java script, vbscript, perl or python.
Wherein, the positional information of DOM incident can comprise: the path (Xpath) of DOM node identification, DOM node, DOM event identifier.Wherein, the DOM node identification can be the title of the ID of DOM node or DOM node.
Call back function index in the state path is used for the call back function of DOM incident correspondence is quoted.Call back functions all in the script all possess index, and this index can be stored by data structures such as overall situation function table, mapping functions with the corresponding relation of concrete call back function.By the call back function index in the state path, in the data structure of the corresponding relation that comprises index and concrete call back function, inquire about, just can get access to the call back function of DOM incident correspondence.The call back function here can comprise: anonymous call back function and non-anonymous call back function.
At state path, finish the DOM incident corresponding call back function compiling and carry out after, just can obtain corresponding target pages.
Specific implementation in this step will be described in detail in embodiment two and embodiment three.
For a basic page, it can corresponding N bar state path, a corresponding N target pages, and wherein N can be the integer more than.
For example, be the basic page of www.baidu.com for URL, two bar state paths of generation can for:
{base_url:http://www.baidu.com,id:idsample1,xpath:html/body/a/,event:click,type:new_content,callback:fun1}
{base_url:http://www.baidu.com,id:idsample2,xpath:html/body/li/a/,event:click,type:new_link,callback:fun2}
Need to prove that the present invention is the concrete form in finite-state path not, above-mentioned only is wherein a kind of example.
Step 204: store the target pages snapshot of the state path and the state path correspondence of basic page correspondence, set up and store the index of target pages, so that follow-up searched engine finds and returns to browser as Search Results.
Among this embodiment, can store, the state path that step 203 produces be stored, and the target pages snapshot that step 203 is obtained is stored the basic page and script thereof that step 202 grasps.Wherein, the storage of the basic page specifically can comprise: basic page URL, basic page snapshot etc.
Search engine obtains the flow process of target pages and can regularly carry out, and also can artificially trigger execution.When producing the state path of basic page correspondence at every turn, if there is the state path of having stored, then the state path state path corresponding with this basis page of having stored of the basic page correspondence that produces can be compared, if different, the state path of the basic page correspondence of the storage that then upgrades in time.
In addition, search engine can regularly check according to the index of target pages whether target pages has renewal, and the index of the target pages of the storage that upgrades in time.Equally, if the target pages snapshot that obtains is different with the target pages snapshot of having stored at every turn, then can replace the target pages snapshot of having stored with the target pages snapshot that newly obtains.
For three class contents of above-mentioned storage, can independently store respectively, also can merge storage.
Above-mentioned steps 201 to step 204 all is search engine operations on the backstage, if search engine receives the searching request from browser, then continues to carry out on the foreground following step.
Step 205: after receiving searching request from browser, the index of keyword that searching request comprises and each target pages is mated, the pairing state path of target pages of coupling is included in and returns to browser in the Search Results, obtain corresponding target pages for the state path that browser utilizes the user to select.
After receiving the searching request that comprises keyword when search engine, except the index of target pages participates in the coupling, the index of the basic page also can participate in coupling, that is to say, the basis page also can be included in the Search Results, and this part is same as the prior art, no longer specifically gives unnecessary details.
Further, the SNAPSHOT INFO of target pages can also be comprised in the Search Results, perhaps, the index of target pages can also be comprised.
In this step, browser state path specifically how to utilize the user to select is obtained corresponding target pages referring to embodiment four and embodiment five.
The mode that produces state path in the above-mentioned steps 203 can adopt embodiment two and embodiment three dual modes.
Embodiment two,
The process flow diagram of the generation state path that Fig. 3 provides for the embodiment of the invention two as shown in Figure 3, can specifically may further comprise the steps:
Step 301: in the extracting process of the basic page and script thereof, download each DOM node.
Step 302: judge whether to finish the download of DOM node, if, finish the extracting flow process of the basic page, go to execution in step 306; Otherwise, to the current DOM node execution in step 303 that downloads to.
Step 303: judge whether the current DOM node that downloads to is the script label, if the DOM node that the next one is downloaded to goes to step 302; Otherwise, execution in step 304.
For the node of script label, the script of this script label correspondence can be sent to the script analytics engine and compile execution.
Step 304: judge whether this DOM node contains DOM incident and call back function, if not, jumps out the analysis of this DOM node, and the DOM node that the next one is downloaded to goes to step 302; If, execution in step 305.
If this DOM node does not comprise DOM incident and call back function, then illustrate not cause page jump and new content of pages in this DOM node, promptly can not produce page multidate information, can skip this DOM node, if have next DOM node, then begin the analysis of next DOM node.
Step 305: the DOM incident of utilizing this DOM node to comprise produces state path, and the state path that produces is kept in the state path formation; The DOM node that the next one is downloaded to goes to step 302.
Step 306: obtain the pairing target pages of each state path in the state queue one by one, judge whether to produce new content of pages or page jump takes place, the state path that produces new content of pages or generation page jump is defined as the state path of basic page correspondence.
The state path and the corresponding target pages thereof that produce new content of pages or generation page jump can be stored then.
Judge whether to take place page jump mode can for: if the URL of the target pages and the basic page is different, then determine to take place page jump.Judge whether to produce new content of pages mode can for: the target pages and the basic page are carried out sentence signature or character string comparison, perhaps, calculate the similarity of the target pages and the basic page, if comparison result or similarity result of calculation show target pages and has different content of pages with the basic page, then determine to produce new content of pages.Wherein, when carrying out the comparison of sentence signature, the calculating of sentence signature can be adopted such as existing account forms such as MD5, does not do concrete restriction at this.
In this embodiment two, at first the state path that the DOM incident is produced all is kept in the state path formation, but because the state path of DOM incident might not all produce page multidate information, also may there be some invalid state path, therefore, one by one each state path in the state path formation is judged again, determined whether the target pages of this state path formation correspondence comprises multidate information.Step 303 to the flow process of step 305 is each DOM node to be analyzed the process of preliminary generation state path, that is to say, to the equal execution in step of the DOM node that respectively downloads to 303 to step 305, the execution in step 306 final state path of determining basic page correspondences after having downloaded all DOM nodes.
Embodiment three,
The process flow diagram of the generation state path that Fig. 4 provides for the embodiment of the invention three as shown in Figure 4, can specifically may further comprise the steps:
Step 401: in the extracting process of the basic page and script thereof, download each DOM node.
Step 402: judge whether to finish the download of DOM node, if finish the extracting flow process of the basic page; Otherwise, to the current DOM node execution in step 403 that downloads to.
Step 403: judge whether the current DOM node that downloads to is the script label, if the DOM node that the next one is downloaded to goes to step 402; Otherwise, execution in step 404.
For the node of script label, the script of this script label correspondence can be sent to the script analytics engine and compile execution.
Step 404: judge whether this DOM node contains DOM incident and call back function, if not, jumps out the analysis of this DOM node, and the DOM node that the next one is downloaded to goes to step 402; If, execution in step 405.
Step 405: utilize the DOM incident in this DOM node to produce state path.
In this step, can produce state path, more preferably, also can produce state path the DOM incident in default DOM list of thing to all DOM incidents.DOM incident in the wherein default DOM list of thing can comprise: onclick, ondbclick, onmouseover, onmousemove, onmouseout, onblur, onfocus, onchange, onsubmit, onselect etc., these DOM incidents all are the DOM incidents that possible produce page multidate information.
Step 406: obtain the pairing target pages of this state path, judge whether to produce new content of pages or produce page jump, if, execution in step 407; Otherwise the DOM node that the next one downloads to is gone to step 402.
Step 407: determine that this state path is the state path of basic page correspondence, can store this state path and corresponding target pages thereof, the DOM node that the next one is downloaded to goes to step 402.
Different with embodiment two is, state path of every generation is all judged among the embodiment three, whether the target pages of determining this state path formation correspondence comprises multidate information (being step 406), if comprise then store this state path and corresponding target pages thereof.Step 403 to step 407 is that the DOM node that respectively downloads to is analyzed the process that the back produces state path, that is to say, to the equal execution in step of the DOM node that respectively downloads to 403 to step 407, until having downloaded all DOM nodes.
So far flow process shown in the embodiment three finishes.
In the foregoing description two and embodiment three, obtaining the pairing target pages of state path, when judging whether to produce new content of pages or producing the step of page jump, the call back function index of DOM incident correspondence can be sent to the script analytics engine, obtain corresponding call back function by the script analytics engine according to this call back function index, obtain the pairing target pages of state path according to result's execution that the call back function that obtains is compiled and carries out, the step that judges whether to produce new content of pages or produce page jump.Wherein, for anonymous function, the script analytics engine is after obtaining corresponding call back function, can compile in real time and carry out the call back function that obtains, and for non-anonymous function, the script analytics engine after obtaining corresponding call back function, before can utilizing to the compiling and the execution result of this call back function.
Whether the mode that browser utilizes state path to obtain target pages possesses the parse state path function according to browser can be divided into two kinds, is described by embodiment four and embodiment five respectively.
Embodiment four,
When browser possessed the function in parse state path, corresponding process flow diagram may further comprise the steps as shown in Figure 5:
Step 501: browser sends the searching request (Query) that comprises keyword to search engine.
Step 502: search engine execution in step 205 is returned the Search Results that comprises state path to browser.
Step 503: the state path that browser is selected according to the user sends the target pages request to the target pages website.
When the user clicked the state path of target pages, the state path that browser resolves user clicks sent the target pages request according to this state path to the target pages website.
Step 504: the target pages website pushes target pages to browser.
Embodiment five,
When browser did not possess the function in parse state path, corresponding process flow diagram may further comprise the steps as shown in Figure 6:
Step 601: browser sends the searching request that comprises keyword to search engine.
Step 602: search engine execution in step 205 is returned the Search Results that comprises state path to browser.
Step 603: browser sends to search engine with the state path that the user selects.
Step 604: search engine sends the target pages request according to the state path that the user selects to the target pages website.
Because browser does not possess the state path analytical capabilities, therefore, browser only sends to search engine with the state path that the user selects, and sends the target pages request by search engine parse state path and according to this state path to the target pages website.
Step 605: the target pages website pushes target pages to browser.
The target pages website can comprise browser information in the target pages request that search engine sends, so that can be pushed to browser with target pages.
So far flow process shown in the embodiment five finishes.
Also have a kind of situation, if when search engine comprises the target pages SNAPSHOT INFO in the Search Results that the step 205 of embodiment one is returned, if the user clicks the target pages snapshot, can carry out according to embodiment six alternately between browser and the search engine then.
Embodiment six,
The browser that Fig. 7 provides for embodiment six obtains the process flow diagram of target snapshot, as shown in Figure 7, can may further comprise the steps:
Step 701: browser sends the searching request that comprises keyword to search engine.
Step 702: search engine execution in step 205 is returned the Search Results that comprises state path and target pages SNAPSHOT INFO to browser.
Step 703: browser sends to search engine with the target pages SNAPSHOT INFO that the user selects.
Step 704: search engine is determined corresponding target pages snapshot and is returned to browser.
Because search engine stored each target pages snapshot in this locality, therefore need not to carry out alternately with the target pages website again, return to browser after directly obtaining the target pages snapshot of correspondence from this locality.
It more than is the detailed description that method provided by the present invention is carried out, below the device that obtains target pages provided by the present invention is described in detail, as shown in Figure 8, this device can comprise: first placement unit 800, analytic unit 810 and second placement unit 820.
First placement unit 800 is used to grasp the basic page of received URL correspondence and script that should the basis page.
Analytic unit 810 is used for the basic page and script that first placement unit 800 grasps are analyzed, and one or more that produces basic page correspondence comprises the state path of multidate information; Wherein, state path comprises: the positional information of the DOM incident of generation multidate information and the call back function index of DOM incident correspondence in the URL of the basic page, the basic page.
Second placement unit 820, the state path that is used to utilize analytic unit 810 to produce grasps target pages.
Wherein, analytic unit 810 can adopt two kinds of structures, and first kind of structure specifically comprises as shown in Figure 9: first judge module 811, second judge module 812, the first path generation module 813 and the first path determination module 814.
First placement unit 800 is downloaded each DOM node in the extracting process of the basic page and script thereof, and the current DOM node that downloads to sent to first judge module 811, after the download that finishes all DOM nodes, send definite notice to the first path determination module 814.
First judge module 811 is used to judge whether the current DOM node that downloads to is the script label, if, trigger first placement unit 800 and download next DOM node, otherwise, the judgement notice sent to second judge module 812.
Second judge module 812 is used to judge whether the current DOM node that downloads to contains DOM incident and call back function, if not, triggers first placement unit 800 and downloads next DOM node, if send exercise notice to the first path generation module 813.
The first path generation module 813 after being used to receive exercise notice, utilizes the current DOM node that downloads to produce state path, and the state path that produces is kept in the state path formation, triggers first placement unit 800 and downloads next DOM node.
The first path determination module 814, be used to receive when determining notice, trigger the target pages that second placement unit 820 obtains each state path correspondence in the state queue one by one, the result that obtains according to second placement unit 820 judges whether to produce new content of pages or page jump takes place, and the new content of pages that produces or state path that page jump takes place is defined as the state path of basic page correspondence.
In addition, second kind of structure of analytic unit 810 can specifically comprise as shown in figure 10: the 3rd judge module 911, the 4th judge module 912, the second path generation module 913 and the second path determination module 914.
First placement unit 800 is downloaded each DOM node in the extracting process of the basic page and script thereof, and the current DOM node that downloads to is sent to the 3rd judge module 911, until the download that finishes all DOM nodes.
The 3rd judge module 911 is used to judge whether the DOM node of current download is the script label, if, trigger first placement unit 800 and download next DOM node, otherwise, the judgement notice sent to the 4th judge module 912.
The 4th judge module 912 is used to judge whether the current DOM node that downloads to contains DOM incident and call back function, if not, triggers first placement unit 800 and downloads next DOM node, if send exercise notice to the second path generation module 913.
The second path generation module 913, when being used to receive exercise notice, the DOM incident of utilizing the current DOM node that downloads to comprise produces state path, and the state path that produces is sent to the second path determination module 914.
The second path determination module 914, when being used to receive state path, trigger second placement unit 820 and obtain the pairing target pages of this state path, the result that obtains according to second placement unit 820, judge whether to produce new content of pages or produce page jump, if determine that this state path is the state path of basic page correspondence, trigger first placement unit 800 and download next DOM node, download next DOM node otherwise trigger first placement unit 800.
Particularly, when being applied to above-mentioned two kinds of structures, judge whether that page jump takes place can be comprised:, then determine to take place page jump if the URL of the target pages that obtains and the basic page is different.
Judge whether that producing new content of pages can comprise: the target pages and the basic page that obtain are carried out sentence signature or character string comparison, if comparison result shows target pages and has different content of pages with the basic page, then determine to produce new content of pages; Perhaps, the target pages that calculating is obtained and the similarity of the basic page have different content of pages if result of calculation shows target pages with the basic page, then determine to produce new content of pages.
Wherein, the positional information of the above-mentioned DOM incident in the state path comprises: the Xpath of DOM node identification, DOM node and DOM event identifier.
Further, this device can also comprise:
Storage unit 830 is used for the state path of the basic page correspondence that inventory analysis unit 810 produces and the snapshot of second placement unit 820 target pages that grasps, sets up and store the index of target pages.
In addition, storage unit 830 also can be stored the basic page that first placement unit 800 grasps, wherein, can adopt the mode of separate, stored respectively, also can adopt the mode of storage and uniform for snapshot three parts of the basic page, state path and target pages.
Figure 11 is a search engine architecture synoptic diagram provided by the invention, and as shown in figure 11, this search engine comprises: the device shown in Fig. 8, user interface section 1101 and search processing 1102.
User interface section 1101 is used to receive the searching request from browser, and the keyword that comprises in this searching request is sent to search processing 1102; The Search Results that search processing 1102 is sent returns to browser, obtains corresponding target pages for the state path that browser utilizes the user to select.
Search processing 1102 is used for the index of the target pages of keyword and storage unit 830 storages is mated, and the pairing state path of the target pages that matches is included in sends to user interface section 1101 in the Search Results.
More preferably, can also comprise in the Search Results: the SNAPSHOT INFO of the target pages of coupling.At this moment,
User interface section 1101, the SNAPSHOT INFO of the target pages that the user who is used for that also browser is returned selects sends to search processing 1102; The snapshot of the target pages that search processing 1102 is sent returns to browser.
Search processing 1102 also is used for the SNAPSHOT INFO according to the target pages of user's selection, obtains the snapshot of corresponding target pages and send to user interface section 1101 from storage unit 830.
Further, when browser does not possess the analytical capabilities of state path,, this search engine assists to finish the propelling movement of target pages to browser thereby need possessing this function.At this moment, this search engine can further include: path resolution unit 1103 and network interface unit 1104.
User interface section 1101, also be used to receive the state path that user that browser sends selects after, this state path is sent to path resolution unit 1103.
Path resolution unit 1103 is used for generating the target pages request according to the state path that receives.
Network interface unit 1104 is used for the target pages request that path resolution unit 1103 generates is sent to the target pages website.
Figure 12 is a browser structure synoptic diagram provided by the invention, and this browser possesses the state path analytical capabilities, and as shown in figure 12, this browser can comprise: network side interface unit 1201, path resolution unit 1202 and user side interface unit 1203.
Network side interface unit 1201 is used to receive the Search Results that comprises state path that search engine shown in Figure 11 sends; The target pages request that path resolution unit 1202 is sent sends to the target pages website.
User side interface unit 1203 is used for the Search Results that network side interface unit 1201 receives is shown to the user; The state path that the user is selected sends to path resolution unit 1202.
Path resolution unit 1202, the state path that is used for selecting according to the user generates the target pages Intra-request Concurrency and gives network side interface unit 1201.
The above only is preferred embodiment of the present invention, and is in order to restriction the present invention, within the spirit and principles in the present invention not all, any modification of being made, is equal to replacement, improvement etc., all should be included within the scope of protection of the invention.

Claims (22)

1. a method of obtaining target pages is characterized in that, this method may further comprise the steps:
A, grasp the basic page of received uniform resource position mark URL correspondence and script that should the basis page;
B, the basic page and the script that grasp are analyzed, one or more that produces described basic page correspondence comprises the state path of multidate information, utilizes the state path that produces to grasp target pages; Wherein, described state path comprises: the positional information of the DOM Document Object Model DOM incident of generation multidate information and the call back function index of described DOM incident correspondence in the URL of the basic page, the basic page.
2. method according to claim 1 is characterized in that, described step B specifically comprises:
In the extracting process of the described basic page and script, download each DOM node, the DOM node execution in step B11 to B13 to downloading to successively, after the download that finishes all DOM nodes, execution in step B14;
B11, judge whether the current DOM node that downloads to is the script label, if the DOM node that the next one is downloaded to goes to step B11, otherwise, execution in step B12;
B12, judge whether the current DOM node that downloads to contains DOM incident and call back function, and if not, the DOM node that the next one is downloaded to goes to step B11, if, execution in step B13;
B13, the DOM incident of utilizing the current DOM node that downloads to comprise produce state path, and the state path that produces is kept in the state path formation, and the DOM node that the next one is downloaded to goes to step B11;
B14, obtain the pairing target pages of each state path in the state queue one by one, judge whether to produce new content of pages or page jump takes place, the state path that produces new content of pages or generation page jump is defined as the state path of described basic page correspondence.
3. method according to claim 1 is characterized in that, described step B specifically comprises:
Download each DOM node in the extracting process of the described basic page and script, the DOM node execution in step B21 to B23 to downloading to successively is until the download that finishes all DOM nodes;
B21, judge whether the current DOM node that downloads to is the script label, if the DOM node that the next one is downloaded to goes to step B21, otherwise execution in step B22;
B22, judge whether the current DOM node that downloads to contains DOM incident and call back function, and if not, the DOM node that the next one is downloaded to goes to step B21, if, execution in step B23;
B23, the DOM incident of utilizing the current DOM node that downloads to comprise produce state path;
B24, obtain the pairing target pages of this state path, judge whether to produce new content of pages or produce page jump, if, determine that this state path is the state path of described basic page correspondence, the DOM node that the next one is downloaded to goes to step B21; Otherwise the DOM node that the next one downloads to is gone to step B21.
4. according to claim 2 or 3 described methods, it is characterized in that, judge whether that page jump takes place to be comprised:, then determine to take place page jump if the URL of the target pages that obtains and the described basic page is different.
5. according to claim 2 or 3 described methods, it is characterized in that, judge whether that producing new content of pages comprises: the target pages and the described basic page that obtain are carried out sentence signature or character string comparison, if comparison result shows target pages and has different content of pages with the basic page, then determine to produce new content of pages; Perhaps,
The target pages that calculating is obtained and the similarity of the described basic page have different content of pages if result of calculation shows target pages with the basic page, then determine to produce new content of pages.
6. according to the described method of the arbitrary claim of claim 1 to 3, it is characterized in that the positional information of described DOM incident comprises: the path Xpath of DOM node identification, DOM node and DOM event identifier.
7. according to the described method of the arbitrary claim of claim 1 to 3, it is characterized in that after described step B, this method also comprises:
The state path of the basic page correspondence that C, storing step B produce and the snapshot of the target pages that grasps, the index of foundation and storage target pages.
8. a method of obtaining target pages is characterized in that, after the described method of claim 7, comprising:
After receiving searching request from browser, the index of the target pages of keyword that searching request comprises and storage is mated, the pairing state path of target pages of coupling is included in and returns to browser in the Search Results, obtain corresponding target pages for the state path that browser utilizes the user to select.
9. method according to claim 8 is characterized in that, also comprises in the described Search Results: the SNAPSHOT INFO of the target pages of described coupling;
After receiving the SNAPSHOT INFO of the target pages that user that browser returns selects, return the snapshot of corresponding target pages to described browser.
10. method according to claim 8 is characterized in that, described with the coupling the pairing state path of target pages be included in return to browser in the Search Results after, this method also comprises:
After receiving the state path that user that described browser sends selects, the state path of selecting according to the user sends the target pages request to the target pages website, so that described target pages website pushes target pages to described browser.
11. a method of obtaining target pages is characterized in that, this method comprises:
Browser receives the Search Results that comprises state path that search engine returns after search engine sends searching request;
State path according to the user selects sends the target pages request to the target pages website;
Receive the target pages that described target pages website pushes;
Wherein, the described Search Results that comprises state path is that described search engine adopts the described method of claim 8 to return.
12. a device that obtains target pages is characterized in that, this device comprises:
First placement unit is used to grasp the basic page of received uniform resource position mark URL correspondence and script that should the basis page;
Analytic unit is used for the basic page and script that described first placement unit grasps are analyzed, and one or more that produces described basic page correspondence comprises the state path of multidate information; Wherein, described state path comprises: the positional information of the DOM Document Object Model DOM incident of generation multidate information and the call back function index of described DOM incident correspondence in the URL of the basic page, the basic page;
Second placement unit, the state path that is used to utilize described analytic unit to produce grasps target pages.
13. device according to claim 12 is characterized in that, described analytic unit specifically comprises: first judge module, second judge module, the first path generation module and the first path determination module;
Described first placement unit is downloaded each DOM node in the extracting process of the described basic page and script thereof, and the current DOM node that downloads to sent to described first judge module, after the download that finishes all DOM nodes, send definite notice to the described first path determination module;
Described first judge module is used to judge whether the current DOM node that downloads to is the script label, if, trigger described first placement unit and download next DOM node, otherwise, the judgement notice sent to described second judge module;
Described second judge module, be used to judge whether the current DOM node that downloads to contains DOM incident and call back function, if not, trigger described first placement unit and download next DOM node, if send exercise notice to the described first path generation module;
The described first path generation module, after being used to receive described exercise notice, utilize the current DOM node that downloads to produce state path, and the state path that produces is kept in the state path formation, trigger described first placement unit and download next DOM node;
The described first path determination module, when being used to receive described definite notice, trigger the target pages that described second placement unit obtains each state path correspondence in the state queue one by one, the result that obtains according to described second placement unit judges whether to produce new content of pages or page jump takes place, and the new content of pages that produces or state path that page jump takes place is defined as the state path of described basic page correspondence.
14. device according to claim 12 is characterized in that, described analytic unit specifically comprises: the 3rd judge module, the 4th judge module, the second path generation module and the second path determination module;
Described first placement unit is downloaded each DOM node in the extracting process of the described basic page and script thereof, and the current DOM node that downloads to is sent to described the 3rd judge module, until the download that finishes all DOM nodes;
Described the 3rd judge module is used to judge whether the DOM node of current download is the script label, if, trigger described first placement unit and download next DOM node, otherwise, the judgement notice sent to described the 4th judge module;
Described the 4th judge module, be used to judge whether the current DOM node that downloads to contains DOM incident and call back function, if not, trigger described first placement unit and download next DOM node, if send exercise notice to the described second path generation module;
The described second path generation module, when being used to receive exercise notice, the DOM incident of utilizing the current DOM node that downloads to comprise produces state path, and the state path that produces is sent to the described second path determination module;
The second path determination module, when being used to receive state path, trigger described second placement unit and obtain the pairing target pages of this state path, the result that obtains according to described second placement unit, judge whether to produce new content of pages or produce page jump, if determine that this state path is the state path of described basic page correspondence, trigger described first placement unit and download next DOM node, download next DOM node otherwise trigger described first placement unit.
15. according to claim 13 or 14 described devices, it is characterized in that, judge whether that page jump takes place to be comprised:, then determine to take place page jump if the URL of the target pages that obtains and the described basic page is different.
16. according to claim 13 or 14 described devices, it is characterized in that, judge whether that producing new content of pages comprises: the target pages and the described basic page that obtain are carried out sentence signature or character string comparison, if comparison result shows target pages and has different content of pages with the basic page, then determine to produce new content of pages; Perhaps,
The target pages that calculating is obtained and the similarity of the described basic page have different content of pages if result of calculation shows target pages with the basic page, then determine to produce new content of pages.
17. according to the described device of the arbitrary claim of claim 12 to 14, it is characterized in that the positional information of described DOM incident comprises: the path Xpath of DOM node identification, DOM node and DOM event identifier.
18., it is characterized in that this device also comprises according to the described device of the arbitrary claim of claim 12 to 14:
Storage unit is used to store the state path of the basic page correspondence that described analytic unit produces and the snapshot of described second target pages that placement unit grasps, and sets up and store the index of target pages.
19. a search engine is characterized in that, this search engine comprises: device as claimed in claim 18, user interface section and search processing;
Described user interface section is used to receive the searching request from browser, and the keyword that comprises in this searching request is sent to described search processing; The Search Results that described search processing is sent returns to browser, obtains corresponding target pages for the state path that browser utilizes the user to select;
Described search processing is used for the index of the target pages of described keyword and described device cell stores is mated, and the pairing state path of the target pages that matches is included in sends to described user interface section in the Search Results.
20. search engine according to claim 19 is characterized in that, also comprises in the described Search Results: the SNAPSHOT INFO of the target pages of described coupling;
Described user interface section, the SNAPSHOT INFO of the target pages that the user who is used for that also browser is returned selects sends to described search processing; The snapshot of the target pages that described search processing is sent returns to described browser;
Described search processing also is used for the SNAPSHOT INFO according to the target pages of described user's selection, obtains the snapshot of corresponding target pages and send to described user interface section from described storage unit.
21. search engine according to claim 19 is characterized in that, this search engine also comprises: path resolution unit and network interface unit;
Described user interface section, also be used to receive the state path that user that browser sends selects after, this state path is sent to described path resolution unit;
Described path resolution unit is used for generating the target pages request according to the state path that receives;
Described network interface unit is used for the target pages request that described path resolution unit generates is sent to the target pages website.
22. a browser is characterized in that, this browser comprises: network side interface unit, path resolution unit and user side interface unit;
Described network side interface unit is used to receive the Search Results that comprises state path that sends as search engine as described in the claim 19; The target pages request that described path resolution unit is sent sends to the target pages website;
Described user side interface unit is used for the Search Results that described network side interface unit receives is shown to the user; The state path that the user is selected sends to described path resolution unit;
Described path resolution unit, the state path that is used for selecting according to the user generates the target pages Intra-request Concurrency and gives described network side interface unit.
CN2010105314609A 2010-11-04 2010-11-04 Method and device for acquiring destination page, search engine and browser Active CN101984429B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010105314609A CN101984429B (en) 2010-11-04 2010-11-04 Method and device for acquiring destination page, search engine and browser

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010105314609A CN101984429B (en) 2010-11-04 2010-11-04 Method and device for acquiring destination page, search engine and browser

Publications (2)

Publication Number Publication Date
CN101984429A true CN101984429A (en) 2011-03-09
CN101984429B CN101984429B (en) 2012-03-14

Family

ID=43641598

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010105314609A Active CN101984429B (en) 2010-11-04 2010-11-04 Method and device for acquiring destination page, search engine and browser

Country Status (1)

Country Link
CN (1) CN101984429B (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150307A (en) * 2011-12-06 2013-06-12 株式会社理光 Method and equipment for searching name related to thematic word from network
CN103268361A (en) * 2013-06-07 2013-08-28 百度在线网络技术(北京)有限公司 Extracting method, device and system of hidden URL (Uniform Resource Locator) in webpage
CN103645968A (en) * 2013-12-02 2014-03-19 北京奇虎科技有限公司 Browser status restoration method and device
CN103955495A (en) * 2014-04-18 2014-07-30 百度在线网络技术(北京)有限公司 Downloading method and device for page sub-resource
CN104408198A (en) * 2014-12-15 2015-03-11 北京国双科技有限公司 Method and device for acquiring webpage contents
CN105740417A (en) * 2016-01-29 2016-07-06 青岛海信移动通信技术股份有限公司 Webpage based target data search method and module, browser and terminal
CN105740290A (en) * 2014-12-11 2016-07-06 富士通株式会社 System and method for searching self-adaptive networks of mobile devices
CN105867897A (en) * 2015-12-07 2016-08-17 乐视网信息技术(北京)股份有限公司 Page redirection analysis method and apparatus
WO2017124692A1 (en) * 2016-01-20 2017-07-27 百度在线网络技术(北京)有限公司 Method and apparatus for searching for conversion relationship between form pages and target pages
CN107025111A (en) * 2017-03-17 2017-08-08 烽火通信科技股份有限公司 The method and system that a kind of browser target pages entire screen switch is shown
CN107169011A (en) * 2017-03-31 2017-09-15 百度在线网络技术(北京)有限公司 The original recognition methods of webpage based on artificial intelligence, device and storage medium
CN110674427A (en) * 2019-09-20 2020-01-10 北京达佳互联信息技术有限公司 Method, device, equipment and storage medium for responding to webpage access request
CN110874446A (en) * 2018-08-31 2020-03-10 北京京东尚科信息技术有限公司 Page display method and system, computer system and computer readable medium
CN111177539A (en) * 2019-12-16 2020-05-19 北京百度网讯科技有限公司 Search result page generation method and device, electronic equipment and storage medium
WO2021218468A1 (en) * 2020-04-29 2021-11-04 百度在线网络技术(北京)有限公司 Data update method and device, search server, terminal, and storage medium
CN113657076A (en) * 2021-08-17 2021-11-16 中国平安财产保险股份有限公司 Page operation record table generation method and device, electronic equipment and storage medium
US11803597B2 (en) 2020-04-29 2023-10-31 Baidu Online Network Technology (Beijing) Co., Ltd. Data updating method, apparatus, search server, terminal and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070294281A1 (en) * 2006-05-05 2007-12-20 Miles Ward Systems and methods for consumer-generated media reputation management
CN101587488A (en) * 2009-05-25 2009-11-25 深圳市腾讯计算机系统有限公司 Method and device for detecting re-orientation of page in search engine

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070294281A1 (en) * 2006-05-05 2007-12-20 Miles Ward Systems and methods for consumer-generated media reputation management
CN101587488A (en) * 2009-05-25 2009-11-25 深圳市腾讯计算机系统有限公司 Method and device for detecting re-orientation of page in search engine

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150307A (en) * 2011-12-06 2013-06-12 株式会社理光 Method and equipment for searching name related to thematic word from network
CN103268361A (en) * 2013-06-07 2013-08-28 百度在线网络技术(北京)有限公司 Extracting method, device and system of hidden URL (Uniform Resource Locator) in webpage
CN103268361B (en) * 2013-06-07 2019-05-31 百度在线网络技术(北京)有限公司 Extracting method, the device and system of URL are hidden in webpage
CN103645968A (en) * 2013-12-02 2014-03-19 北京奇虎科技有限公司 Browser status restoration method and device
CN103645968B (en) * 2013-12-02 2017-03-15 北京奇虎科技有限公司 A kind of browser status restored method and device
CN103955495A (en) * 2014-04-18 2014-07-30 百度在线网络技术(北京)有限公司 Downloading method and device for page sub-resource
CN105740290A (en) * 2014-12-11 2016-07-06 富士通株式会社 System and method for searching self-adaptive networks of mobile devices
CN104408198B (en) * 2014-12-15 2018-07-17 北京国双科技有限公司 The acquisition methods and device of Webpage content
CN104408198A (en) * 2014-12-15 2015-03-11 北京国双科技有限公司 Method and device for acquiring webpage contents
CN105867897A (en) * 2015-12-07 2016-08-17 乐视网信息技术(北京)股份有限公司 Page redirection analysis method and apparatus
WO2017124692A1 (en) * 2016-01-20 2017-07-27 百度在线网络技术(北京)有限公司 Method and apparatus for searching for conversion relationship between form pages and target pages
CN105740417A (en) * 2016-01-29 2016-07-06 青岛海信移动通信技术股份有限公司 Webpage based target data search method and module, browser and terminal
CN107025111A (en) * 2017-03-17 2017-08-08 烽火通信科技股份有限公司 The method and system that a kind of browser target pages entire screen switch is shown
CN107169011A (en) * 2017-03-31 2017-09-15 百度在线网络技术(北京)有限公司 The original recognition methods of webpage based on artificial intelligence, device and storage medium
CN107169011B (en) * 2017-03-31 2021-06-11 百度在线网络技术(北京)有限公司 Webpage originality identification method and device based on artificial intelligence and storage medium
CN110874446A (en) * 2018-08-31 2020-03-10 北京京东尚科信息技术有限公司 Page display method and system, computer system and computer readable medium
CN110674427A (en) * 2019-09-20 2020-01-10 北京达佳互联信息技术有限公司 Method, device, equipment and storage medium for responding to webpage access request
CN110674427B (en) * 2019-09-20 2022-04-22 北京达佳互联信息技术有限公司 Method, device, equipment and storage medium for responding to webpage access request
CN111177539A (en) * 2019-12-16 2020-05-19 北京百度网讯科技有限公司 Search result page generation method and device, electronic equipment and storage medium
WO2021218468A1 (en) * 2020-04-29 2021-11-04 百度在线网络技术(北京)有限公司 Data update method and device, search server, terminal, and storage medium
US11803597B2 (en) 2020-04-29 2023-10-31 Baidu Online Network Technology (Beijing) Co., Ltd. Data updating method, apparatus, search server, terminal and storage medium
CN113657076A (en) * 2021-08-17 2021-11-16 中国平安财产保险股份有限公司 Page operation record table generation method and device, electronic equipment and storage medium
CN113657076B (en) * 2021-08-17 2023-08-22 中国平安财产保险股份有限公司 Page operation record table generation method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN101984429B (en) 2012-03-14

Similar Documents

Publication Publication Date Title
CN101984429B (en) Method and device for acquiring destination page, search engine and browser
CN101452453B (en) A kind of method of input method Web side navigation and a kind of input method system
CN1320454C (en) Programming language extensions for processing XML objects and related applications
US7536389B1 (en) Techniques for crawling dynamic web content
US20200364033A1 (en) API Specification Generation
CN102479252B (en) Query expression conversion apparatus and query expression conversion method
CN101957844B (en) On-line application system and implementation method thereof
CN102073726B (en) Structured data import method and device for search engine system
CN102760151B (en) Implementation method of open source software acquisition and searching system
CN102063488A (en) Code searching method based on semantics
CN1601528A (en) Systems and methods for client-based web crawling
CN105122237A (en) Sharing application states
CN101344881A (en) Index generation method and device and search system for mass file type data
CN101971172A (en) Mobile sitemaps
EP3161678A1 (en) Deep links for native applications
CN102521232B (en) Distributed acquisition and processing system and method of internet metadata
CN102054028A (en) Web crawler system with page-rendering function and implementation method thereof
CN105138312A (en) Table generation method and apparatus
CN106294885A (en) A kind of data collection towards isomery webpage and mask method
CN104281619A (en) System and method for ordering search results
US9959305B2 (en) Annotating structured data for search
CN111159590A (en) Serial connection method and device based on front-end and back-end service call links
Sharma et al. A novel architecture for deep web crawler
CN105930385A (en) Data crawling method and system
KR102214990B1 (en) System for providing bookmark management and information searching service and method for providing bookmark management and information searching service using it

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
ASS Succession or assignment of patent right

Owner name: BEIJING BAIDU NETWORK INFORMATION TECHNOLOGY CO.,

Free format text: FORMER OWNER: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.

Effective date: 20111228

C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20111228

Address after: 100085 Beijing, Haidian District, No. ten on the ground floor, No. 10 Baidu building, layer 2

Applicant after: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY Co.,Ltd.

Address before: 100085 Beijing, Haidian District, No. ten on the street Baidu building, No. 10

Applicant before: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY Co.,Ltd.

C14 Grant of patent or utility model
GR01 Patent grant