CN103246680B - A kind of method in browser, web page contents polymerization being represented and device - Google Patents

A kind of method in browser, web page contents polymerization being represented and device Download PDF

Info

Publication number
CN103246680B
CN103246680B CN201210031482.8A CN201210031482A CN103246680B CN 103246680 B CN103246680 B CN 103246680B CN 201210031482 A CN201210031482 A CN 201210031482A CN 103246680 B CN103246680 B CN 103246680B
Authority
CN
China
Prior art keywords
web page
page contents
contents
user
browser
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210031482.8A
Other languages
Chinese (zh)
Other versions
CN103246680A (en
Inventor
蒋进舟
滕跃龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201210031482.8A priority Critical patent/CN103246680B/en
Publication of CN103246680A publication Critical patent/CN103246680A/en
Application granted granted Critical
Publication of CN103246680B publication Critical patent/CN103246680B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention provides a kind of method in browser, web page contents polymerization being represented, comprise the information source information generated source identifier selected according to user; By predetermined method, web page contents corresponding to described information source identifier analyzed, extracted corresponding web page contents and preserve, in the time of the information fusion page of user's open any browser, read and show corresponding web page contents. The present invention is by analyzing web page contents, extract that corresponding web page contents is preserved and be user's demonstration, even if corresponding website does not provide the subscription of RSS or ATOM, also can realize corresponding web page contents is aggregated in to browser, and access each website without user.

Description

A kind of method in browser, web page contents polymerization being represented and device
Technical field
The invention provides a kind of method in browser, web page contents polymerization being represented and device, belong to web page contents poly-Close technical field.
Background technology
In online, often can pay close attention to the content of multiple websites, if there is no info web polymerization, so userWhen user is wanting to check the information that he pays close attention to, can only go to browse each website, until browse end, whole mistakeJourney as shown in Figure 1.
In order to address this problem, present browser has generally all been introduced the function of polymerization, provides by subscription websiteRSS (ReallySimpleSyndicatio, simple and easy information fusion) or Atom (document format based on XML and based onThe agreement of HTTP, is used to website and client's instrument that converging network content is provided), the information that user is paid close attention to pulls thisGround is also combined, and the process of polymerization as shown in Figure 2. But while going in this way polymerization resource, if website does not haveThere is the subscription that RSS or ATOM are provided, so just have no idea these information fusions in browser, must accessCorresponding content just can be browsed in corresponding website.
Summary of the invention
If the present invention solves exist in the web page contents polymerization technique of existing browser not content-aggregated by what accessIn browser, must access corresponding website and just can browse the problem of corresponding content, and then provide a kind of clearLook at the method and the device that web page contents polymerization are represented in device.
The method in browser, web page contents polymerization being represented, comprising:
According to the selected information source information generated source identifier of user;
By predetermined method, web page contents corresponding to described information source identifier analyzed, extracted in corresponding webpageHold and preserve, in the time of the information fusion page of user's open any browser, read and show corresponding web page contents.
The device in browser, web page contents polymerization being represented, comprising:
Identifier generation module, for according to the selected information source information generated source identifier of user;
Polymerization represents module, for the method by predetermined, web page contents corresponding to described information source identifier is dividedAnalyse, extract corresponding web page contents and preserve, in the time of the information fusion page of user's open any browser, read and show described inCorresponding web page contents.
As seen from the above technical solution provided by the invention, by web page contents is analyzed, extract corresponding netPage content is preserved and for user shows, even if corresponding website does not provide the subscription of RSS or ATOM, also can be realizedCorresponding web page contents is aggregated in to browser, and accesses each website without user.
Brief description of the drawings
Fig. 1 is that in prior art, user browses each website until browse the schematic flow sheet of end;
Fig. 2 be in prior art by content subscription by the schematic flow sheet of web page contents polymerization;
Fig. 3 is the flow process signal of the method in browser, web page contents polymerization being represented that provides of the specific embodiment of the inventionFigure;
Fig. 4 is the mark schematic diagram of regional in Tengxun's homepage of providing of the specific embodiment of the invention;
Fig. 5 is that the reptile that adds that the specific embodiment of the invention provides is analyzed the schematic flow sheet of generation polymerization page afterwards;
Fig. 6 is the structural representation of the device in browser, web page contents polymerization being represented that provides of the specific embodiment of the inventionFigure.
Detailed description of the invention
The specific embodiment of the invention provides a kind of method in browser, web page contents polymerization being represented, and comprises basisThe information source information generated source identifier that user is selected; By the predetermined method web page contents corresponding to information source identifierAnalyze, extract corresponding web page contents and preserve, in the time of the information fusion page of user's open any browser, read and showShow corresponding web page contents. Below not support the content-aggregated exhibiting method of website of content subscription as example combination to certainFigure of description illustrates this detailed description of the invention, as shown in Figure 3, in browser, web page contents is poly-accordinglyClosing the method representing comprises:
Step 31, according to the selected information source information generated source identifier of user.
Because existing number of site does not provide the subscription of RSS or ATOM, so just have no idea these information are poly-Be combined in browser, when user is wanting to check the information of concern, can only go to browse each website. For example TengxunThe Today's news of homepage, this information is not owing to providing subscription, so if user wants to check this information, only haveAccess Tengxun homepage just can be checked its content.
Concrete, existing most of webpage all forms by multiple regions are nested, and these regions all can have one oneselfTitle or mark, this mark can be id, the className of web page element or even the element order in this regionNumber. Take www.qq.com as example, as shown in Figure 4, in www.qq.com's page, there is a mark each zonule, so onceUser has selected a web page area of oneself paying close attention in webpage, so just can by the unique expression of this mark thisIndividual region. In each region, there are several to comprise the information source of link or address. For example, the mark of www.qq.com's the first rowBe #TextNav, the mark of the second line search is #SOSO, and the mark of lower left corner press center is #NewsInfo, right side the presentThe mark of day topic is #txArea.
After user has selected certain information source of www.qq.com, for example user has selected the press center in the lower left corner, TengxunThe press center that the server of net needs to select according to user generates on a network can this information source of unique identificationIdentifier, i.e. the #NewsInfo of press center mark, this identifier can identify with URL added elements path, butBe to be not limited to this mode, be one here and give an example. For example, need to preserve news region time, just can set upSuch corresponding relation:
URL Element path Content
***.com #txArea#NewsInfo The content of extracting after the HTML in this region or analysis
After user, in open any browser, the server of www.qq.com just can come by this URL and element pathAgain capture its page pointed, upgrade its content.
Step 32, analyzes web page contents corresponding to information source identifier by predetermined method, extracts corresponding netPage content is also preserved, and in the time of the information fusion page of user's open any browser, reads and shows corresponding web page contents.
For the website of subscription that RSS or ATOM are not provided, need the source of the server active analysis webpage of websiteCode, just can extract the content that user is concerned about, and ignore the unconcerned part of user in the code of website.
Concrete, take www.qq.com as example, when the relatively simple for structure of webpage or for more common structure, www.qq.comServer can in browser, directly analyze webpage, for example, for simple URL link. At user's warpCross after the press center in the selected lower left corner of step 31, the server of www.qq.com is according to the information source mark of #NewsInfo markThe region of symbol representative, can directly capture the HTML code in this region to get off to analyze, and by its contentBe kept at this locality, in the time of the information fusion page of user's open any browser, just directly content read and shown. ItsIn, the descriptive text that HTML code is made up of HTML order, HTML order can comment, figure,Animation, sound, form, link etc. The structure of HTML code comprises head (Head), main body, and (Body) two is largePart, wherein head is described the required information of browser, and main body comprises the particular content that will illustrate. The step of extractingRapid from the URL of Initial page, obtain the URL on Initial page, extracting in the process of webpage, constantly from working asOn the front page, extract new URL and put into queue, until meet certain stop condition of system. By HTML codeBody matter just can analyze the link of HTML code the inside and the particular content of word, for the server of www.qq.comExtract. In the time that user has browsed a region and turned to another region, the server of www.qq.com can be with reference to above-mentioned sideMethod again captures new content according to the information source identifier of the selected content of user and browses for user.
Concrete analytic process comprises: link and the word of HTML code the inside can be extracted, be linked in source codeCan be with<a></a>Mark surround, and the word that does not have this mark to surround can be regarded common language as, as long as can take outTake out word content and link, just can represent. For example, in the webpage of html format, if having link orPerson has list, link or list can be extracted, and in analysis, can search<a>,<ul>,<ol>With<li>Extract these information on label.
Take www.qq.com as example, in the time facing the webpage of some more complicated, for example, adopt the net of frame structure or dynamic linkPage, can be submitted to the information source identifier of generation on the server of www.qq.com, and join on the server of www.qq.comPut the grasping means of various different web sites, by the server of www.qq.com, information source identifier is carried out to special analysis, for exampleAnalyze by reptile, according to predetermined rule, for example, according to the identifier of the selected information source of user, extract the inside and useThe part that family is concerned about most. Add the flow process of reptile analysis generation polymerization page afterwards as shown in Figure 5. Concrete, when user's choosingDetermine after the framework architecture at press center place in the lower left corner news content that the server of www.qq.com is concerned about extraction user mostProcess comprise the web page contents representing according to predetermined web page analysis algorithm and information source identifier, filter with theme irrelevantLink, the URL queue to be extracted such as remain with the link of use and put it into; Then, by according to certain searchStrategy is selected next step webpage URL that will capture from queue, and repeats said process, until reach a certain of systemWhen condition, stop. In the time that user has browsed a region and turned to another region, the server of www.qq.com can be with reference to above-mentionedMethod again captures new content according to the information source identifier of the selected content of user and browses for user.
Corresponding analytical method is with that HTML code is captured to the method for getting off to analyze is similar, but due to the method for analyzingCan on the server of www.qq.com, customize, so the relative complex that can do. For the indeterminable net of general-purpose algorithmStand, specified rule that can be artificial on backstage, but final information or the word content extracting, main link and theseThe satellite information of link. For example, can specify if the block in region, if id is " content ", is so labeled as chainThe satellite information connecing; If id is " title ", and have<a>mark, be so main link. By the time it is poly-that user opens informationWhen hinge, the server of www.qq.com can be inquired about on backstage the content of this specific region after putting in order, is presented at and browsesIn device.
Due to the general more complicated of method of analyzing web page extraction content, in this detailed description of the invention, also can adopt otherMethod is downloaded overall webpage and is only shown that the method for a part wherein simplifies the process of analysis, for example, can utilize HTML netIn page<iframe>label, webpage embed wholly that user is paid close attention to, in polymerization page, then utilizes absolute fix in cssMethod, adjusts this<iframe>position and the size of label, reach and hide except user pays close attention to content all the elementsMethod. Same, the structure of all right amendment webpage initiatively, the DOM interface coming out by kernel, as IEIHtmlElement interface etc., travels through web page element, only user is concerned about to the web page element in region is directly related with itFather's element remain, other element is all deleted, thereby is reached cutting webpage, the most at last net of these cuttingsPage is aggregated in the object in browser polymerization page.
The technical scheme that adopts the present embodiment to provide, by web page contents is analyzed, extracts corresponding web page contents and protectsDeposit and for user shows, even if corresponding website does not provide the subscription of RSS or ATOM, also can realize accordinglyWeb page contents is aggregated in browser, and accesses each website without user.
The specific embodiment of the present invention also provides a kind of device in browser, web page contents polymerization being represented, this dressEach module of putting can be arranged in the server of website with the form of software module or hardware entities, as shown in Figure 6, and bagDraw together:
Identifier generation module 61, for according to the selected information source information generated source identifier of user;
Polymerization represents module 62, for the method by predetermined, web page contents corresponding to information source identifier analyzed,Extract corresponding web page contents and preserve, in the time of the information fusion page of user's open any browser, read and show corresponding netPage content.
Optionally, in identifier generation module 61, information source identifier is combined and marks with element path by URLKnow.
Optionally, represent in module 62 and can comprise at least one in following submodule in polymerization:
First content extracts submodule, for by searching in the link of the corresponding web page contents of html web page or listCorresponding label, to extract corresponding web page contents;
Second content extracts submodule, for according to the corresponding capturing webpage contents method of information source identifier configurations, passes throughGrasping means is analyzed corresponding web page contents, to extract corresponding web page contents.
Optionally, represent in module 62 and can also comprise in polymerization:
Information display sub-module, for showing corresponding web page contents, or, show full content by corresponding webpageWeb page contents beyond content is hidden or is deleted.
The implementation of the processing capacity of the each module comprising in the above-mentioned device in browser, web page contents polymerization being representedIn method detailed description of the invention before, describe, be no longer repeated in this description at this.
The technical scheme that adopts the present embodiment to provide, by web page contents is analyzed, extracts corresponding web page contents and protectsDeposit and for user shows, even if corresponding website does not provide the subscription of RSS or ATOM, also can realize accordinglyWeb page contents is aggregated in browser, and accesses each website without user.
The above, be only preferably detailed description of the invention of the present invention, but protection scope of the present invention is not limited to this,Anyly be familiar with in technical scope that those skilled in the art disclose in the present invention the variation that can expect easily or replaceChange, within all should being encompassed in protection scope of the present invention.

Claims (8)

1. the method in browser, web page contents polymerization being represented, is characterized in that, comprising:
According to user's information source information generated source identifier in selected web page area in webpage; Described webpage forms by multiple regions are nested, the mark that each region is corresponding unique, and each region comprises several information sources;
By search corresponding label in the link of web page contents accordingly or list in html web page, extract corresponding web page contents and preserve; Or according to the corresponding capturing webpage contents method of described information source identifier configurations, by described grasping means, corresponding web page contents is analyzed, extract corresponding web page contents and preserve;
In the time of the information fusion page of user's open any browser, read and show corresponding web page contents.
2. method according to claim 1, is characterized in that, described information source identifier is combined and identifies with element path by URL.
3. method according to claim 1, is characterized in that, described in read and show that corresponding web page contents comprises:
Show corresponding web page contents, or, show whole web page contents and the web page contents beyond corresponding web page contents hidden or deleted.
4. the device in browser, web page contents polymerization being represented, is characterized in that, comprising:
Identifier generation module, for the information source information generated source identifier in the selected web page area of webpage according to user; Described webpage forms by multiple regions are nested, the mark that each region is corresponding unique, and each region comprises several information sources;
Polymerization represents module, for by search corresponding label in the link of the corresponding web page contents of html web page or list, extracts corresponding web page contents and preserves; Or according to the corresponding capturing webpage contents method of described information source identifier configurations, by described grasping means, corresponding web page contents is analyzed, extract corresponding web page contents and preserve; Also for when the information fusion page of user's open any browser, read and show corresponding web page contents.
5. device according to claim 4, is characterized in that, in identifier generation module, described information source identifier is combined and identifies with element path by URL.
6. according to the device described in claim 4 or 5, it is characterized in that, represent module in polymerization and comprise:
First content extracts submodule, for by search corresponding label in the link of the corresponding web page contents of html web page or list, to extract corresponding web page contents.
7. according to the device described in claim 4 or 5, it is characterized in that, represent module in polymerization and comprise:
Second content extracts submodule, for according to the corresponding capturing webpage contents method of described information source identifier configurations, by described grasping means, corresponding web page contents is analyzed, to extract corresponding web page contents.
8. device according to claim 4, is characterized in that, represents in module and also comprises in polymerization:
Information display sub-module, for showing corresponding web page contents, or, show whole web page contents and the web page contents beyond corresponding web page contents hidden or deleted.
CN201210031482.8A 2012-02-13 2012-02-13 A kind of method in browser, web page contents polymerization being represented and device Active CN103246680B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210031482.8A CN103246680B (en) 2012-02-13 2012-02-13 A kind of method in browser, web page contents polymerization being represented and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210031482.8A CN103246680B (en) 2012-02-13 2012-02-13 A kind of method in browser, web page contents polymerization being represented and device

Publications (2)

Publication Number Publication Date
CN103246680A CN103246680A (en) 2013-08-14
CN103246680B true CN103246680B (en) 2016-05-18

Family

ID=48926204

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210031482.8A Active CN103246680B (en) 2012-02-13 2012-02-13 A kind of method in browser, web page contents polymerization being represented and device

Country Status (1)

Country Link
CN (1) CN103246680B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106446055B (en) * 2016-08-31 2020-10-30 陶德龙 Webpage generation method and system
CN110515691A (en) * 2019-08-29 2019-11-29 北京明略软件系统有限公司 The display methods and device of parameter information, storage medium, processor
CN110750748A (en) * 2019-10-24 2020-02-04 杭州网景汇网络科技有限公司 Webpage display method
CN116661653A (en) * 2022-02-18 2023-08-29 华为云计算技术有限公司 Method for realizing collection function of browser, readable medium and electronic equipment

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101894138A (en) * 2010-06-25 2010-11-24 优视科技有限公司 Visual page content subscription processing method and system thereof

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102298621B (en) * 2006-02-22 2013-11-06 王东 System for obtaining page user focus degree PageFocus by method for aggregating and displaying same source information search engine based on focus degree
US7917840B2 (en) * 2007-06-05 2011-03-29 Aol Inc. Dynamic aggregation and display of contextually relevant content
CN101739425B (en) * 2008-11-04 2012-07-04 北大方正集团有限公司 Webpage integration method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101894138A (en) * 2010-06-25 2010-11-24 优视科技有限公司 Visual page content subscription processing method and system thereof

Also Published As

Publication number Publication date
CN103246680A (en) 2013-08-14

Similar Documents

Publication Publication Date Title
US9015144B2 (en) Configuring web crawler to extract web page information
US20150295942A1 (en) Method and server for performing cloud detection for malicious information
CN103714115A (en) Method and device for loading web page content
US20140214559A1 (en) Method, device and system for publishing merchandise information
US8739024B2 (en) Method and apparatus for processing world wide web page
CN102314516B (en) Webpage processing method and mobile terminal and electronic device thereof
US20150227276A1 (en) Method and system for providing an interactive user guide on a webpage
CN102065114A (en) Method and device for mobile terminal to access webpage
US20160173953A1 (en) Method, Device, Server, and Client Device for Video Processing
CN101894138B (en) Visual page content subscription processing method and system thereof
WO2013021391A4 (en) Automatic website accessibility and compatability
CN102750352A (en) Method and device for classified collection of historical access records in browser
CN103258058B (en) Page display method and system and browser
WO2017124692A1 (en) Method and apparatus for searching for conversion relationship between form pages and target pages
TW201437826A (en) Method and device for combining webpage style address
CN103246680B (en) A kind of method in browser, web page contents polymerization being represented and device
CN103577447A (en) Method and equipment used for determining page type information of target pages
CN103577526A (en) Method and system as well as browser for verifying page modification
CN104978373A (en) Webpage display method and webpage display device
CN104268282A (en) Web banner advertisement displaying method and system
CN106547749B (en) Webpage data acquisition method and device
CN103902571A (en) Method and system for saving webpage complete content and corresponding client end and server
CN103458065A (en) Method for extracting video address based on Webkit kernel under HTML5 standard
CN101763432A (en) Method for constructing lightweight webpage dynamic view
CN103440340A (en) Method and device for navigation webpage content display

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant