CN103577439A - Webpage pre-reading method and webpage pre-reading system - Google Patents

Webpage pre-reading method and webpage pre-reading system Download PDF

Info

Publication number
CN103577439A
CN103577439A CN201210265609.2A CN201210265609A CN103577439A CN 103577439 A CN103577439 A CN 103577439A CN 201210265609 A CN201210265609 A CN 201210265609A CN 103577439 A CN103577439 A CN 103577439A
Authority
CN
China
Prior art keywords
webpage
link
user
clicked
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201210265609.2A
Other languages
Chinese (zh)
Other versions
CN103577439B (en
Inventor
胡又欢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Beijing Sogou Information Service Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Beijing Sogou Information Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd, Beijing Sogou Information Service Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN201210265609.2A priority Critical patent/CN103577439B/en
Publication of CN103577439A publication Critical patent/CN103577439A/en
Application granted granted Critical
Publication of CN103577439B publication Critical patent/CN103577439B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9574Browsing optimisation, e.g. caching or content distillation of access to content, e.g. by caching

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a webpage pre-reading method and a webpage pre-reading system. The method includes recording browsing behavior information of a user on a webpage; determining a link which is not clicked by the user but anchor information of which is browsed by the user according to the browsing behavior information; when the user accesses to the webpage next time, excluding a target webpage corresponding to the link which is not clicked by the user but the anchor information of which is browsed by the user from pre-reading range. By the webpage pre-reading method and the webpage pre-reading system, effectiveness of pre-reading can be improved, and waste on system resources can be reduced.

Description

Webpage pre-reading method and system
Technical field
The present invention relates to browser technology field, particularly relate to webpage pre-reading method and system.
Background technology
User often uses browser to visit various websites, but access to netwoks speed is affected by various factors, possible subscription client self connection speed is restricted, or the website service end Bandwidth-Constrained system of accessing, make user when accessed web page, need to wait for a period of time, webpage can be presented in face of user completely.Yet user always wishes to obtain access speed faster, do not want time waste to open at wait webpage.
In order to improve webpage access speed, in prior art, there is reading in advance technology, so-calledly read and refer in advance, the target web from backstage, the link comprising when the webpage of front opening being pointed in advance reads, and be buffered in computing machine this locality, like this, when the real clickthrough of user is accessed certain corresponding target web, only need to from the buffer memory of subscriber computer this locality, read the page and represent accordingly, thereby reach the object that improves access speed.
Yet in actual applications, which webpage being read is in advance the problem that needs consideration.Because if include a large amount of links in the webpage of front opening, if every target web corresponding to link is all read in advance, need to expend the system resources such as very many downloads, storage.Yet, in fact user clicks access to working as the all-links comprising in the webpage of front opening, this will cause following result: read in advance a large amount of webpages, and user's actual access may only have a few webpage wherein, other webpages not accessed by the user but that carried out reading in advance, when it is read in advance, spent system resource is a kind of waste.
Therefore, how improving the validity reading in advance, reduce the waste to system resource, is the technical matters solving in the urgent need to those skilled in the art.
Summary of the invention
The invention provides webpage pre-reading method and system, can improve the validity reading in advance, reduce the waste to system resource.
The invention provides following scheme:
A pre-reading method, comprising:
Recording user is browsed behavioural information on webpage;
According to described, browse that behavioural information determines that this user had browsed its anchor information but the link do not clicked by this user;
When this webpage of user's access next time, the target web of the described link correspondence of having browsed its anchor information but not clicked by this user is got rid of outside the scope reading in advance.
Optionally, the described recording user behavioural information of browsing on webpage comprises:
The link that recording user is clicked on webpage;
Described in described basis, browse behavioural information and determine that this user had browsed its anchor information but the link do not clicked by this user comprises:
The link of clicking on webpage according to described user, determines that this user had browsed its anchor information but the link do not clicked by this user.
Optionally, the described link of clicking on webpage according to described user, determine that this user had browsed its anchor information but the link do not clicked by this user comprises:
Obtain each in webpage and be linked at the positional information in webpage;
By with clicked link adjacent front N1 link and latter N2 link, be defined as that user had browsed its anchor information but the link do not clicked by this user, wherein, N1 and N2 are preset fixed value.
Optionally, the described recording user behavioural information of browsing on webpage also comprises:
Be recorded in the residence time and/or webpage scrolling information on webpage;
The described link of clicking on webpage according to described user, determine that this user had browsed its anchor information but the link do not clicked by this user comprises:
Obtain each in webpage and be linked at the positional information in webpage;
By with clicked link adjacent front N1 link and latter N2 link, be defined as that user had browsed its anchor information but the link do not clicked by this user, wherein, the residence time and/or the webpage scrolling information of the value of N1 and N2 basis on webpage is definite.
Optionally, in the link of clicking on webpage according to described user, determine that this user had browsed its anchor information but before the link do not clicked by this user, also comprise:
Whether the link that judgement is clicked on webpage is the link in the main contents list of webpage, if, trigger to carry out the described link of clicking according to described user on webpage, determine that user had browsed its anchor information but the step of the link do not clicked by this user.
Whether the link that optionally, described judgement is clicked on webpage is that the link in the main contents list of webpage comprises:
According to the position in the clicked DOM Document Object Model that is linked at webpage, search the father node of clicked link;
Judge and under described father node, whether comprise the child node similar to described clicked link structure;
If so, add up the average length value of link anchor text corresponding to each child node under described father node;
If described average length value is greater than preset threshold value, determine the link in the main contents list that is linked as webpage of clicking on webpage.
Optionally, also comprise:
Obtain the affiliated classification information of webpage;
If this webpage belongs to preset classification,, when this webpage of user's access next time, the target web of the link correspondence that described user is clicked on webpage is got rid of outside the scope reading in advance.
Optionally, if user does not carry out click behavior in webpage, the browse behavioural information of described recording user on webpage comprises:
Be recorded in the residence time and/or page scrolling information on webpage;
Described in described basis, browse behavioural information and determine that this user had browsed its anchor information but the link do not clicked by this user comprises:
If the described residence time and/or webpage scrolling information on webpage meets prerequisite, the whole links based on comprising in this webpage, determine that this user had browsed its anchor information but the link do not clicked by this user.
Optionally, described whole links based on comprising in this webpage, determine that this user had browsed its anchor information but the link do not clicked by this user comprises:
Search the main contents list of webpage;
All-links in the main contents list of webpage is defined as to this user had browsed its anchor information but the link do not clicked by this user.
The main contents list of searching webpage optionally, comprises:
Whole nodes in traversal web document object model, carry out respectively following operation for each node: judge whether to exist a plurality of child nodes, and the structural similarity of each child node, if so, this node is main contents list node.
Optionally, according to the described behavioural information of browsing, determine that this user had browsed its anchor information but after the link do not clicked by this user, also comprise:
The uniqueness identification information of the target web of the link correspondence of having browsed its anchor information described in preservation but not clicked by this user;
The described target web by the described link correspondence of having browsed its anchor information but not clicked by this user is got rid of and is comprised outside the scope reading in advance:
Extract the uniqueness identification information of the target web of the link correspondence comprising in this webpage, whether the uniqueness identification information that judgement extracts appears in preserved uniqueness identification information, if so, skip this is linked to the operation that corresponding target web reads in advance.
A pre-reading system, comprising:
Browse behavioural information record cell, for recording user on webpage, browse behavioural information;
Link determining unit, for browsing described in basis that behavioural information determines that this user had browsed its anchor information but the link do not clicked by this user;
Pre-reading unit, for when user accesses this webpage next time, gets rid of the target web of the described link correspondence of having browsed its anchor information but not clicked by this user outside the scope reading in advance.
Optionally, described in, browse behavioural information record cell specifically for:
The link that recording user is clicked on webpage;
Described link determining unit specifically for:
The link of clicking on webpage according to described user, determines that this user had browsed its anchor information but the link do not clicked by this user.
Optionally, described link determining unit comprises:
Positional information is obtained subelement, for each that obtains webpage, is linked at the positional information in webpage;
First determines subelement, for by with clicked link adjacent front N1 link and afterwards N2 link, be defined as that user had browsed its anchor information but the link do not clicked by this user, wherein, N1 and N2 are preset fixed value.
Optionally, described in, browse behavioural information record cell also for:
Be recorded in the residence time and/or webpage scrolling information on webpage;
Described link determining unit comprises:
Positional information is obtained subelement, for each that obtains webpage, is linked at the positional information in webpage;
Second determines subelement, for by with clicked link adjacent front N1 link and afterwards N2 link, be defined as that user had browsed its anchor information but the link do not clicked by this user, wherein, the value of N1 and N2 was determined according to the residence time on webpage and/or webpage scrolling information.
Optionally, also comprise:
Judging unit, be used in the link of clicking on webpage according to described user, determine that user had browsed its anchor information but before the link do not clicked by this user, whether the link that judgement is clicked on webpage is the link in the main contents list of webpage, if, trigger to carry out the described link of clicking according to described user on webpage, determine that user had browsed its anchor information but the step of the link do not clicked by this user.
Optionally, described judging unit comprises:
Father node is searched subelement, for according to the clicked position that is linked at the DOM Document Object Model of webpage, searches the father node of clicked link;
Child node structural similarity judgment sub-unit, for judging whether comprise the child node similar to described clicked link structure under described father node;
Anchor text size statistics subelement, if for comprising the child node similar to described clicked link structure under described father node, adds up the average length value of link anchor text corresponding to each child node under described father node;
Anchor text average length comparer unit, if be greater than preset threshold value for described average length value, determines the link in the main contents list that is linked as webpage of clicking on webpage.
Optionally, also comprise:
Webpage classification acquiring unit, for obtaining the classification information under webpage;
Clickthrough rejected unit, if belong to preset classification for this webpage,, when this webpage of user's access next time, the target web of the link correspondence that described user is clicked on webpage is got rid of outside the scope reading in advance.
Optionally, if user does not carry out click behavior in webpage, browse described in behavioural information record cell specifically for:
Be recorded in the residence time and/or webpage scrolling information on webpage;
Described link determining unit specifically for:
If the described residence time and/or webpage scrolling information on webpage meets prerequisite, the whole links based on comprising in this webpage, determine that this user had browsed its anchor information but the link do not clicked by this user.
Optionally, described link determining unit comprises:
Subelement is searched in main contents list, for searching the main contents list of webpage;
Subelement is determined in link, for the all-links of the main contents list of webpage being defined as to this user had browsed its anchor information but the link do not clicked by this user.
Optionally, described main contents list search subelement specifically for:
Whole nodes in traversal web document object model, carry out respectively following operation for each node: judge whether to exist a plurality of child nodes, and the structural similarity of each child node, if so, this node is main contents list node.
Optionally, also comprise:
Storage unit, for according to described in browse that behavioural information determines that this user had browsed its anchor information but after the link do not clicked by this user, preserve that described this user had browsed its anchor information but the uniqueness identification information of not clicked corresponding target web by this user;
Described pre-reading unit specifically for:
Extract the uniqueness identification information of the target web of the link correspondence comprising in this webpage, whether the uniqueness identification information that judgement extracts appears in preserved uniqueness identification information, if so, skip this is linked to the operation that corresponding target web reads in advance.
According to specific embodiment provided by the invention, the invention discloses following technique effect:
By the present invention, can be according to user on webpage, browse behavioural information, determine that this user had browsed its anchor information but the link do not clicked by this user, therefore, just target web corresponding to this link when accessing this webpage next time, user can not needed to the webpage reading in advance, so, when this webpage of user's access next time, just these can be linked to corresponding target web gets rid of outside the scope reading in advance, thereby dwindled the scope reading in advance, improve the validity of pre-read operation, reduce pre-read operation taking system resource.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, to the accompanying drawing of required use in embodiment be briefly described below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skills, do not paying under the prerequisite of creative work, can also obtain according to these accompanying drawings other accompanying drawing.
Fig. 1 is the process flow diagram of the method that provides of the embodiment of the present invention;
Fig. 2 is an interface schematic diagram in the method that provides of the embodiment of the present invention;
Fig. 3 is the schematic diagram of the system that provides of the embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the present invention's part embodiment, rather than whole embodiment.Embodiment based in the present invention, the every other embodiment that those of ordinary skills obtain, belongs to the scope of protection of the invention.
Referring to Fig. 1, the webpage pre-reading method that the embodiment of the present invention provides comprises the following steps:
S101: recording user is browsed behavioural information on webpage;
For the ease of understanding, first basic concepts is simply introduced.Browser, after opening certain webpage, can show the various elements that form this webpage in web interface, for example, may comprise word, picture, audio frequency, video etc. content, and in addition, a kind of common page elements that also may comprise links exactly.So-called link, conventionally be also referred to as hyperlink, that Web webpage is different from one of key character of other media, the link that visitor only need to click in webpage just can jump to the target place of link automatically, this target place is another webpage normally, that is to say, link is corresponding a webpage conventionally, (for the ease of distinguishing, this webpage can be called to target web corresponding to link, accordingly, just the webpage that includes this link in content of pages can be called to source web page, for example, in the content of pages of certain webpage A, comprise certain link, webpage B is pointed in this link, the source web page that webpage A is this link, webpage B is the target web of this link), when link is clicked, for browser, just be equivalent to receive the instruction of opening its corresponding target web.Usually, in web browser, linked object is different from other the non-linked contents in webpage by underscore and specific color conventionally, when mouse-pointing linked object, it will be become by " arrow " " hand shape " (under mouse scheme of acquiescence), just can open target web corresponding to link after clicking.The profile of linked object, can comprise two kinds of situations of text link and image links, also be, be used for the object of hyperlink, can be one section of text or a picture, but no matter be text link or image links, be the equal of all by modes such as word or thumbnails, to point out out the main contents of target web, user can be according to link text or image, probably determines whether to click the detailed content of this webpage that links to check that it is corresponding.Existence due to link, has changed traditional custom of reading in order webpage, when user sees a link, all means and can open, browse a more detailed information.For example in the news column of ,Mou portal website, conventionally will comprise a lot of links, conventionally can using the title of news as link text, after each link is clicked, all may open a webpage of introducing in detail news content, etc.
In existing pre-read schemes, generally can be after user opens certain webpage, the target web that the all-links comprising in this web page contents is corresponding reads in advance, when clicking certain link wherein as user, just can directly from buffer memory, take out web page contents, target web is presented in face of user as soon as possible.But, for some portal websites etc., owing to wherein including a large amount of links, therefore, if all read in advance the system resource that obviously can take much more very to wherein all linking corresponding target web, and user generally can not have requirements for access to every target web corresponding to link wherein, therefore, can cause a large amount of wastings of resources.
In the embodiment of the present invention, the behavior prediction of browsing that is intended to carry out in webpage according to user goes out the link that user may not can click next time, like this, when reading in advance based on this webpage, the link that just this user may not can be clicked next time forecloses, like this, just can avoid pre-read operation blindly to cause waste to system resource.For this reason, first need recording user to browse behavioural information on webpage, for example, page rolling event information of can comprise the link of the click of user in webpage, the residence time in webpage, carrying out in webpage etc.Wherein, the residence time in webpage refers to, from user, open webpage and start the time of experiencing to closing this webpage, certainly, generally, only have after the whole loadeds of web page contents, user can see the full content in webpage, when therefore, the residence time in webpage also can be from webpage loaded, start at.The page rolling event of carrying out in webpage refers to, user is by dragging the roller of scroll bar in page window or roll mouse or utilizing the modes such as mouse drag to check content of pages.Wherein, concrete page rolling event information can comprise direction, number of times of rolling etc., and these information can be monitored by hook() mode such as some critical system functions gets from operating system.
S102: browse that behavioural information determines that this user had browsed its anchor information but the link do not clicked by this user according to described;
Wherein, so-called anchor information is the text (conventionally also referred to as anchor text) for linking namely, or for picture of linking etc., as mentioned before, because anchor information generally can be pointed out out the main contents of target web in modes such as word or pictures, user can be according to anchor information, probably determine whether to click the detailed content of this target web that links to check that it is corresponding, therefore, if user has browsed the anchor information of certain link, but do not click this link, prove that this user may link corresponding target web to this and lose interest in, therefore, next time is while opening this webpage again, even if also there is this link in webpage, this user also may not can click this link, like this, when this user visits again this webpage next time, just can to target web corresponding to this link, read in advance again, to avoid causing waste.
Specifically, when definite this user had browsed the link of its anchor information, can there is multiple implementation.For example, under a kind of mode, if in step S101, record browse the link that behavioural information comprises that user clicks, the link that can click on webpage according to user, determines that user had browsed its anchor information but the link do not clicked by this user.
Because user is when the browsing page, generally all can be from numerous links, select interested link to click, therefore, if user has clicked certain link, prove that this user may browse near the anchor information of other links of this link, only may all lose interest in, so do not click.Therefore, the link that can click on webpage according to user, determines that user had browsed the link of its anchor information.During specific implementation, can first obtain each in webpage and be linked at the positional information in webpage, then, by with clicked link adjacent front N1 link and latter N2 link, be defined as that user had browsed its anchor information but the link do not clicked by this user, wherein, N1 and N2 can be preset fixed values.Also be, N1, N2 are changeless, for example, pre-set N1=4, N2=3, suppose to have 100 links in certain webpage, wherein by user, clicked for the 10th, wherein the 6th to the 9th and the 11st to the 13rd can have been browsed to anchor information but the link do not clicked by this user as user.Wherein, when obtaining the positional information being linked in webpage, concrete positional information can be linked at DOM(Document Object Model with each, DOM Document Object Model) path in tree represents, such as (0) div (1) a represents to start the 1st link the 0th div from root node, etc., about dom tree, hereinafter have introduction.
Under above-mentioned implementation, obtain user on webpage browse behavioural information time, only need to get the link that user clicked, just can get that user may browse anchor information but the link do not clicked by this user.
And under another kind of implementation, the value of N1, N2 can not be changeless, but according to the situation of browsing of reality and dynamic change.During specific implementation, obtain user on webpage browse behavioural information time, can also obtain the residence time and/or the page rolling event information of user on webpage, according to one of information of this two aspect or all, dynamically determine the value of N1, N2.For example, suppose that the page residence time is t, when t is less than a preset threshold value t1, N1=0, N2=0, also be, if the page residence time is very short, proves that user may not have enough time to browse the anchor information of other links, or may not see clearly, therefore, in this webpage, just there is not the link of being browsed anchor information by user.When t is greater than t1, the also increase of t in time and increasing of the value of N1 and N2, the step value that N1 and N2 increase can be the same or different.In addition, if the frequency table of the event of rolling is downwards shown r, the value of N2 increases with the increase of r, and similarly, if the number of times of the event that scrolls up is designated s, the value of N1 increases with the increase of s.
For example, suppose user's first some day to the click information of certain forum page as shown in Figure 2, can find out that user has clicked the 7th and the 15th link of forum, the page residence time of hypothetical record is 120 seconds, downwards rolling event times is 2, and the initial value of supposing N1 is that the initial value of 2, N2 is 1, the page residence time is 100 seconds when above, the value of N1, N2 all adds 1, and once, the value of N2 adds 1 in every rolling downwards, the N1 value calculating is 2+1=3, and N2 value is 1+1+1*2=4.Therefore, the 7th link of clicking according to user, this user who gets may browse anchor information but by this user, do not clicked be linked as the 4th to 6, the 8th to 11; And the 15th link of clicking according to user, this user who gets may browse anchor information but by this user, do not clicked be linked as the 12nd to 14, the 16th to 19, so just the relevant information of these links can be deposited in file, use when reading in advance.
Certainly, it should be noted that, for some webpages, the content of the target web that link is wherein pointed to may be exactly one piece of article or news report etc., for this link, if user is after click is opened target web and read its content, when opening again this webpage next time, even if still there is this link in webpage, user generally also can not click this again and linked, because this user had read the content in its corresponding target web.But, for other webpages, the target web that link is wherein pointed to may be the content that can upgrade, for example, in the webpage of some mhkc classes, wherein every link may be corresponding a topic, the promoter of topic may constantly link in corresponding target web and increase new content at this, and also may have other users this topic is commented on, etc.For another example, some link corresponding target web may be publishing in instalments of certain article, and author can constantly have new chapters and sections to be updated in target web, etc.In a word, for this link, when user visits again this webpage next time, may also can again click this link, therefore, the target web of the link correspondence that just this user can have been clicked reads in advance.Whether that is to say for the link of having clicked for user, according to the difference of classification under webpage, its target web is read in advance, situation can be different.Therefore, in embodiments of the present invention, can treat with a certain discrimination according to the classification under webpage.During specific implementation, can first obtain the classification described in webpage, for example, if find that webpage belongs to certain preset classification (news category etc.), the target web of the link correspondence that user can be clicked is also got rid of outside the scope reading in advance in the lump.If find that webpage belongs to mhkc, article and other classifications such as publishes in instalments, the target web of the link correspondence that user can be clicked reads in advance.Wherein, during classification under obtaining webpage, can rule of thumb wait information to obtain, for example, set up in advance various classifications, and collect the URL(Uniform/Universal Resource Locator of webpage of all categories, URL(uniform resource locator)) etc. uniqueness identification information, like this, just can find the classification under this webpage according to the URL of webpage.Or, also can determine the classification that webpage is affiliated according to the information such as key word in webpage, etc., no longer describe in detail here.
It should be noted that in addition, in actual applications, if also comprise navigation area in webpage, page turning district etc., wherein, navigation area generally also comprises some links, after these links are clicked, generally can make current web page jump to another column or channel etc., for example in Fig. 2 the leftmost side once being listed as the navigation area can be regarded as in this webpage, wherein show " stock community ", " financing community ", " reading community " etc. link, when clicking " stock community " link of navigation area, the article of delivering in the Jiu Doushi stock community showing in the article list on right side, etc..For this navigation area, the link that the link that user clicked also can be clicked when access next time often, therefore, if the webpage of may not can clicking when access this webpage next time using it as user is obviously inappropriate.In addition, for page turning district, generally also can comprise by numeral the link as link text, when user clicks correspondingly when digital, page jump can be gone on corresponding number of pages.The behavior of this click number of pages link is all generally the operation that user may carry out when each accessed web page, therefore, if the webpage of may not can clicking when access this webpage next time using it as user is also obviously inappropriate.
Therefore, in optional embodiment of the present invention, when finding user at accessed web page, after having clicked certain link, can first judge that whether clicked link is the link in the main contents list of webpage, if, recycling this chain fetches and determines that user had browsed its anchor information but link of not clicked by this user etc., if and find that clicked link is not the link in the list of webpage main contents, the link that can not utilize this chain to fetch to determine user to browse its anchor information but do not clicked by this user, and it can not carried out to record as the link that does not need to read in advance yet.
During specific implementation, because the navigation area of webpage generally has following features: the link in navigation area is generally text link, and the length of anchor text is generally shorter, and the number of words that the anchor text packets of different linking contains is all generally identical; Link in page turning district, owing to being generally that numeral with corresponding is done hyperlink, therefore, is also a kind of text link, and because anchor text is numeral, therefore also shorter.And main contents list is also generally to comprise a plurality of links, but the general title that adopts news etc. are as the anchor text of each link, so each anchor text is different in size, but generally all long.Therefore, based on These characteristics, just can be by the main contents list of webpage and navigation area, page turning is trivial separates.Concrete, in order to judge that whether clicked link is the link in the main contents list of webpage, can realize in the following ways: first, according to the position in the clicked dom tree that is linked at webpage, search the father node of clicked link; Judge and under this father node, whether comprise the child node similar to clicked link structure, if, add up the average length value of link anchor text corresponding to each child node under this father node, if this average length value is greater than certain preset threshold value, determine the link in this clicked main contents list that is linked as webpage.
Wherein, DOM provides access XML(Extensible Markup Language, extend markup language) medium of document information, this medium is a kind of hierarchical object model, and the structure of this level is a node tree generating according to XML document.An XML analyzer, after XML document is analyzed, no matter this document have many simply or how complicated, information wherein all can be converted to an Object node tree.In this node tree, there is a root node--Document node, every other node is all the descendent node of root node.After node tree generates, just can pass through node and content in DOM interface accessing, modification, interpolation, deletion, establishment tree.In dom tree, contents all in document all represent with node.A node can comprise other nodes again, and node itself also may comprise some information, such as the name of node, nodal value, node type etc.If certain node is the main contents list node of webpage, this node generally can have a plurality of child nodes of structural similarity, the corresponding link of each child node, the name that the structural similarity between different child nodes just can be by each child node, nodal value, node type etc. embody.
The above is to have introduced the in the situation that of having carried out clickthrough behavior user in webpage, this user of how to confirm had browsed its anchor information but the method for the link do not clicked by this user, and in actual applications, may there are following special circumstances: after user has opened a webpage, do not click any one link wherein.In this case, the embodiment of the present invention has provided that corresponding this user of determining had browsed its anchor information but the method for the link do not clicked by this user is introduced equally below.
If user is after opening a webpage, if stopped the sufficiently long time, and abundant number of times has also rolled the page downwards, but never any one link in webpage clicking, prove that user may lose interest in to whole links wherein, therefore in the time of, just can all regarding this webpage of access next time as, do not need the webpage reading in advance.During specific implementation, at recording user in webpage browse behavioural information time, just can record the page residence time and/or page rolling event information, if page residence time t is greater than certain threshold value r-max, and/or, roll the downwards frequency r of event of the page is greater than certain threshold value r-max, the whole links that comprise in this webpage all can have been browsed to its anchor information but the link do not clicked by this user as user.Certainly, if also comprise navigation area and page turning district in webpage, be not suitable for equally this example.Therefore, the page residence time and/or page rolling event information meet under the prerequisite of aforementioned condition, can also find out the main contents list in the page, then using the all-links in main contents list as user, browse its anchor information but the link do not clicked by this user.Wherein, while searching the main contents list of webpage, can carry out as follows: the whole nodes in traversal webpage dom tree, for each node, carry out respectively following operation: judge whether to exist a plurality of child nodes, and the structural similarity of each child node, if so, this node is main contents list node.
It should be noted that, in embodiments of the present invention, getting that user had browsed its anchor information but after the link do not clicked by this user, be not to record these to be linked at the position in webpage, but record these links itself.For example, in the example shown in Fig. 2, if final definite user had browsed its anchor information but had not been comprised the 4th to the 6th link by this user clicks on links, while recording, it is not record " the 4th to 6 " such numeral, but record current the 4th to the 6th link itself, for example can the uniqueness in network identify to represent with target web corresponding to link, such as the URL of webpage etc.Also, can get respectively the URL of current the 4th to the 6th each self-corresponding target web of link, then record hereof.Wherein, can be from the HTML(Hypertext Markup Language of link place source web page, HTML (Hypertext Markup Language) about linking the URL of corresponding target web) obtain in code.
S103: when this webpage of user's access next time, the target web of the described link correspondence of having browsed its anchor information but not clicked by this user is got rid of outside the scope reading in advance.
When user visits again this webpage next time, need to read in advance the target web of the link correspondence comprising in webpage, but in embodiments of the present invention, the user who determines in step S102 need to have been browsed to its anchor information but the target web of the link correspondence do not clicked by this user is got rid of outside the scope reading in advance.Like this, just the scope reading in advance can be dwindled, reduce pre-read operation taking system resource.
During specific implementation, in step S102, determine that user had browsed its anchor information but after the link do not clicked by this user, can get these uniqueness identification informations such as URL that link corresponding file destination (for ease of describing, below all URL is that example is introduced), the URL that these are linked to corresponding target web is saved in local file, generation does not need the url list reading in advance, can record the URL of the source web page at these link places simultaneously.When user opens certain webpage next time, can first judge in local file, whether record this webpage read in advance relevant information, if had, the url list that does not need to read in advance can be taken out, then can take out one by one each in this webpage and link the URL of corresponding target web, judge respectively whether this URL appears in the url list that does not need to read in advance, if, read no longer in advance, otherwise, just carry out pre-read operation.
It should be noted that, as mentioned before, when not needing to carry out the link of reading in advance and record, be not that record chain is connected on the position in webpage, but link the URL of corresponding target web, when next user opens webpage again, while judging whether wherein certain to link corresponding target web and read in advance, not to be linked at the position in webpage according to this equally, but link the URL of corresponding target web according to this.The link of its target web being read in advance for not needing of determining, when user opens this webpage next time, in webpage, may not exist this to link, now, naturally also with regard to not linking corresponding target web to this, read in advance, but also may still have this link in webpage, now, because this URL that links corresponding target web appears in the url list that does not need to read in advance, therefore, also and not can read in advance.
In a word, in the webpage pre-reading method providing in the embodiment of the present invention, can on webpage, browse according to user behavioural information, determine that this user had browsed its anchor information but the link do not clicked by this user, therefore, the link of just can be using this link as user may not can clicking during this webpage of access next time, and target web corresponding to this link when accessing this webpage next time, user do not needed to the webpage reading in advance, so, when this webpage of user's access next time, just these can be linked to corresponding target web gets rid of outside the scope reading in advance, thereby dwindled the scope reading in advance, improve the validity of pre-read operation, reduce pre-read operation taking system resource.
The webpage pre-reading method providing with the embodiment of the present invention is corresponding, and the embodiment of the present invention also provides a kind of webpage pre-reading system, and referring to Fig. 3, this system can comprise:
Browse behavioural information record cell 301, for recording user on webpage, browse behavioural information;
Link determining unit 302, for browsing described in basis that behavioural information determines that this user had browsed its anchor information but the link do not clicked by this user;
Pre-reading unit 302, for when user accesses this webpage next time, gets rid of the target web of the described link correspondence of having browsed its anchor information but not clicked by this user outside the scope reading in advance.
During specific implementation, under a kind of mode, browsing behavioural information record cell 301 specifically can be for:
The link that recording user is clicked on webpage;
Accordingly, link determining unit 302 specifically for:
The link of clicking on webpage according to described user, determines that this user had browsed its anchor information but the link do not clicked by this user.
Wherein, link determining unit 302 can comprise:
Positional information is obtained subelement, for each that obtains webpage, is linked at the positional information in webpage;
First determines subelement, for by with clicked link adjacent front N1 link and afterwards N2 link, be defined as that user had browsed its anchor information but the link do not clicked by this user, wherein, N1 and N2 are preset fixed value.
Or browsing behavioural information record cell 301 can also be for:
Be recorded in the residence time and/or webpage scrolling information on webpage;
Now, link determining unit 302 specifically can comprise:
Positional information is obtained subelement, for each that obtains webpage, is linked at the positional information in webpage;
Second determines subelement, for by with clicked link adjacent front N1 link and afterwards N2 link, be defined as that user had browsed its anchor information but the link do not clicked by this user, wherein, the value of N1 and N2 was determined according to the residence time on webpage and/or webpage scrolling information.
If include navigation area or page turning district in webpage, the implementation of the inapplicable embodiment of the present invention, now, this system can also comprise:
Judging unit, be used in the link of clicking on webpage according to described user, determine that user had browsed its anchor information but before the link do not clicked by this user, whether the link that judgement is clicked on webpage is the link in the main contents list of webpage, if, trigger to carry out the described link of clicking according to described user on webpage, determine that user had browsed its anchor information but the step of the link do not clicked by this user.
During specific implementation, described judging unit can comprise:
Father node is searched subelement, for according to the clicked position that is linked at the DOM Document Object Model dom tree of webpage, searches the father node of clicked link;
Child node structural similarity judgment sub-unit, for judging whether comprise the child node similar to described clicked link structure under described father node;
Anchor text size statistics subelement, if for comprising the child node similar to described clicked link structure under described father node, adds up the average length value of link anchor text corresponding to each child node under described father node;
Anchor text average length comparer unit, if be greater than preset threshold value for described average length value, determines the link in the main contents list that is linked as webpage of clicking on webpage.
During specific implementation, this system can also comprise:
Webpage classification acquiring unit, for obtaining the classification information under webpage;
Clickthrough rejected unit, if belong to preset classification for this webpage,, when this webpage of user's access next time, the target web of the link correspondence that described user is clicked on webpage is got rid of outside the scope reading in advance.
In actual applications, if user does not carry out click behavior in webpage, browsing behavioural information record cell 301 specifically can be for:
Be recorded in the residence time and/or webpage scrolling information on webpage;
Described link determining unit 302 specifically can be for:
If the described residence time and/or webpage scrolling information on webpage meets prerequisite, the whole links based on comprising in this webpage, determine that this user had browsed its anchor information but the link do not clicked by this user.
Now, link determining unit 302 specifically can comprise:
Subelement is searched in main contents list, for searching the main contents list of webpage;
Subelement is determined in link, for the all-links of the main contents list of webpage being defined as to this user had browsed its anchor information but the link do not clicked by this user.
Wherein, described main contents list search subelement specifically for:
Whole nodes in traversal web document object model dom tree, carry out respectively following operation for each node: judge whether to exist a plurality of child nodes, and the structural similarity of each child node, if so, this node is main contents list node.
In addition, this system can also comprise:
Storage unit, for according to described in browse that behavioural information determines that this user had browsed its anchor information but after the link do not clicked by this user, preserve that described this user had browsed its anchor information but the uniqueness identification information of the target web of the link correspondence do not clicked by this user;
Accordingly, pre-reading unit 303 specifically can be for:
Extract the uniqueness identification information of the target web of the link correspondence comprising in this webpage, whether the uniqueness identification information that judgement extracts appears in preserved uniqueness identification information, if so, skip this is linked to the operation that corresponding target web reads in advance.
In the pre-reading system of webpage providing in the embodiment of the present invention, can be according to user on webpage, browse behavioural information, determine that this user had browsed its anchor information but the link do not clicked by this user, therefore, just target web corresponding to this link when accessing this webpage next time, user can not needed to the webpage reading in advance, so, when this webpage of user's access next time, just these can be linked to corresponding target web gets rid of outside the scope reading in advance, thereby dwindled the scope reading in advance, improve the validity of pre-read operation, reduce pre-read operation taking system resource.
As seen through the above description of the embodiments, those skilled in the art can be well understood to the mode that the present invention can add essential general hardware platform by software and realizes.Understanding based on such, the part that technical scheme of the present invention contributes to prior art in essence in other words can embody with the form of software product, this computer software product can be stored in storage medium, as ROM/RAM, magnetic disc, CD etc., comprise that some instructions are with so that a computer equipment (can be personal computer, server, or the network equipment etc.) carry out the method described in some part of each embodiment of the present invention or embodiment.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, between each embodiment identical similar part mutually referring to, each embodiment stresses is the difference with other embodiment.Especially, for device or system embodiment, because it is substantially similar in appearance to embodiment of the method, so describe fairly simplely, relevant part is referring to the part explanation of embodiment of the method.Apparatus and system embodiment described above is only schematic, the wherein said unit as separating component explanation can or can not be also physically to separate, the parts that show as unit can be or can not be also physical locations, can be positioned at a place, or also can be distributed in a plurality of network element.Can select according to the actual needs some or all of module wherein to realize the object of the present embodiment scheme.Those of ordinary skills, in the situation that not paying creative work, are appreciated that and implement.
Above to webpage pre-reading method provided by the present invention and system, be described in detail, applied specific case herein principle of the present invention and embodiment are set forth, the explanation of above embodiment is just for helping to understand method of the present invention and core concept thereof; Meanwhile, for one of ordinary skill in the art, according to thought of the present invention, all will change in specific embodiments and applications.In sum, this description should not be construed as limitation of the present invention.

Claims (22)

1. a webpage pre-reading method, is characterized in that, comprising:
Recording user is browsed behavioural information on webpage;
According to described, browse that behavioural information determines that this user had browsed its anchor information but the link do not clicked by this user;
When this webpage of user's access next time, the target web of the described link correspondence of having browsed its anchor information but not clicked by this user is got rid of outside the scope reading in advance.
2. method according to claim 1, is characterized in that, the browse behavioural information of described recording user on webpage comprises:
The link that recording user is clicked on webpage;
Described in described basis, browse behavioural information and determine that this user had browsed its anchor information but the link do not clicked by this user comprises:
The link of clicking on webpage according to described user, determines that this user had browsed its anchor information but the link do not clicked by this user.
3. method according to claim 2, is characterized in that, the described link of clicking on webpage according to described user determines that this user had browsed its anchor information but the link do not clicked by this user comprises:
Obtain each in webpage and be linked at the positional information in webpage;
By with clicked link adjacent front N1 link and latter N2 link, be defined as that user had browsed its anchor information but the link do not clicked by this user, wherein, N1 and N2 are preset fixed value.
4. method according to claim 2, is characterized in that, the browse behavioural information of described recording user on webpage also comprises:
Be recorded in the residence time and/or webpage scrolling information on webpage;
The described link of clicking on webpage according to described user, determine that this user had browsed its anchor information but the link do not clicked by this user comprises:
Obtain each in webpage and be linked at the positional information in webpage;
By with clicked link adjacent front N1 link and latter N2 link, be defined as that user had browsed its anchor information but the link do not clicked by this user, wherein, the residence time and/or the webpage scrolling information of the value of N1 and N2 basis on webpage is definite.
5. according to the method described in claim 2 to 4 any one, it is characterized in that, in the link of clicking on webpage according to described user, determine that this user had browsed its anchor information but before the link do not clicked by this user, also comprise:
Whether the link that judgement is clicked on webpage is the link in the main contents list of webpage, if, trigger to carry out the described link of clicking according to described user on webpage, determine that user had browsed its anchor information but the step of the link do not clicked by this user.
6. method according to claim 5, is characterized in that, whether the link that described judgement is clicked on webpage is that the link in the main contents list of webpage comprises:
According to the position in the clicked DOM Document Object Model that is linked at webpage, search the father node of clicked link;
Judge and under described father node, whether comprise the child node similar to described clicked link structure;
If so, add up the average length value of link anchor text corresponding to each child node under described father node;
If described average length value is greater than preset threshold value, determine the link in the main contents list that is linked as webpage of clicking on webpage.
7. according to the method described in claim 2 to 4 any one, it is characterized in that, also comprise:
Obtain the affiliated classification information of webpage;
If this webpage belongs to preset classification,, when this webpage of user's access next time, the target web of the link correspondence that described user is clicked on webpage is got rid of outside the scope reading in advance.
8. method according to claim 1, is characterized in that, if user does not carry out click behavior in webpage, the browse behavioural information of described recording user on webpage comprises:
Be recorded in the residence time and/or page scrolling information on webpage;
Described in described basis, browse behavioural information and determine that this user had browsed its anchor information but the link do not clicked by this user comprises:
If the described residence time and/or webpage scrolling information on webpage meets prerequisite, the whole links based on comprising in this webpage, determine that this user had browsed its anchor information but the link do not clicked by this user.
9. method according to claim 8, is characterized in that, described whole links based on comprising in this webpage determine that this user had browsed its anchor information but the link do not clicked by this user comprises:
Search the main contents list of webpage;
All-links in the main contents list of webpage is defined as to this user had browsed its anchor information but the link do not clicked by this user.
10. method according to claim 9, is characterized in that, described in search webpage main contents list comprise:
Whole nodes in traversal web document object model, carry out respectively following operation for each node: judge whether to exist a plurality of child nodes, and the structural similarity of each child node, if so, this node is main contents list node.
11. according to the method described in claim 1 to 4,6,8 to 10 any one, it is characterized in that, according to the described behavioural information of browsing, determines that this user had browsed its anchor information but after the link do not clicked by this user, also comprises:
The uniqueness identification information of the target web of the link correspondence of having browsed its anchor information described in preservation but not clicked by this user;
The described target web by the described link correspondence of having browsed its anchor information but not clicked by this user is got rid of and is comprised outside the scope reading in advance:
Extract the uniqueness identification information of the target web of the link correspondence comprising in this webpage, whether the uniqueness identification information that judgement extracts appears in preserved uniqueness identification information, if so, skip this is linked to the operation that corresponding target web reads in advance.
12. 1 kinds of pre-reading systems of webpage, is characterized in that, comprising:
Browse behavioural information record cell, for recording user on webpage, browse behavioural information;
Link determining unit, for browsing described in basis that behavioural information determines that this user had browsed its anchor information but the link do not clicked by this user;
Pre-reading unit, for when user accesses this webpage next time, gets rid of the target web of the described link correspondence of having browsed its anchor information but not clicked by this user outside the scope reading in advance.
13. systems according to claim 12, is characterized in that, described in browse behavioural information record cell specifically for:
The link that recording user is clicked on webpage;
Described link determining unit specifically for:
The link of clicking on webpage according to described user, determines that this user had browsed its anchor information but the link do not clicked by this user.
14. systems according to claim 13, is characterized in that, described link determining unit comprises:
Positional information is obtained subelement, for each that obtains webpage, is linked at the positional information in webpage;
First determines subelement, for by with clicked link adjacent front N1 link and afterwards N2 link, be defined as that user had browsed its anchor information but the link do not clicked by this user, wherein, N1 and N2 are preset fixed value.
15. systems according to claim 13, is characterized in that, described in browse behavioural information record cell also for:
Be recorded in the residence time and/or webpage scrolling information on webpage;
Described link determining unit comprises:
Positional information is obtained subelement, for each that obtains webpage, is linked at the positional information in webpage;
Second determines subelement, for by with clicked link adjacent front N1 link and afterwards N2 link, be defined as that user had browsed its anchor information but the link do not clicked by this user, wherein, the value of N1 and N2 was determined according to the residence time on webpage and/or webpage scrolling information.
16. according to claim 13 to the system described in 15 any one, it is characterized in that, also comprises:
Judging unit, be used in the link of clicking on webpage according to described user, determine that user had browsed its anchor information but before the link do not clicked by this user, whether the link that judgement is clicked on webpage is the link in the main contents list of webpage, if, trigger to carry out the described link of clicking according to described user on webpage, determine that user had browsed its anchor information but the step of the link do not clicked by this user.
17. systems according to claim 16, is characterized in that, described judging unit comprises:
Father node is searched subelement, for according to the clicked position that is linked at the DOM Document Object Model of webpage, searches the father node of clicked link;
Child node structural similarity judgment sub-unit, for judging whether comprise the child node similar to described clicked link structure under described father node;
Anchor text size statistics subelement, if for comprising the child node similar to described clicked link structure under described father node, adds up the average length value of link anchor text corresponding to each child node under described father node;
Anchor text average length comparer unit, if be greater than preset threshold value for described average length value, determines the link in the main contents list that is linked as webpage of clicking on webpage.
18. according to claim 13 to the system described in 15 any one, it is characterized in that, also comprises:
Webpage classification acquiring unit, for obtaining the classification information under webpage;
Clickthrough rejected unit, if belong to preset classification for this webpage,, when this webpage of user's access next time, the target web of the link correspondence that described user is clicked on webpage is got rid of outside the scope reading in advance.
19. systems according to claim 12, is characterized in that, if user does not carry out click behavior in webpage, browse described in behavioural information record cell specifically for:
Be recorded in the residence time and/or webpage scrolling information on webpage;
Described link determining unit specifically for:
If the described residence time and/or webpage scrolling information on webpage meets prerequisite, the whole links based on comprising in this webpage, determine that this user had browsed its anchor information but the link do not clicked by this user.
20. systems according to claim 19, is characterized in that, described link determining unit comprises:
Subelement is searched in main contents list, for searching the main contents list of webpage;
Subelement is determined in link, for the all-links of the main contents list of webpage being defined as to this user had browsed its anchor information but the link do not clicked by this user.
21. systems according to claim 19, is characterized in that, described main contents list search subelement specifically for:
Whole nodes in traversal web document object model, carry out respectively following operation for each node: judge whether to exist a plurality of child nodes, and the structural similarity of each child node, if so, this node is main contents list node.
22. according to claim 12 to the system described in 15,17,19 to 21 any one, it is characterized in that, also comprises:
Storage unit, for according to described in browse that behavioural information determines that this user had browsed its anchor information but after the link do not clicked by this user, preserve that described this user had browsed its anchor information but the uniqueness identification information of not clicked corresponding target web by this user;
Described pre-reading unit specifically for:
Extract the uniqueness identification information of the target web of the link correspondence comprising in this webpage, whether the uniqueness identification information that judgement extracts appears in preserved uniqueness identification information, if so, skip this is linked to the operation that corresponding target web reads in advance.
CN201210265609.2A 2012-07-27 2012-07-27 Webpage pre-reading method and webpage pre-reading system Active CN103577439B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210265609.2A CN103577439B (en) 2012-07-27 2012-07-27 Webpage pre-reading method and webpage pre-reading system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210265609.2A CN103577439B (en) 2012-07-27 2012-07-27 Webpage pre-reading method and webpage pre-reading system

Publications (2)

Publication Number Publication Date
CN103577439A true CN103577439A (en) 2014-02-12
CN103577439B CN103577439B (en) 2017-02-08

Family

ID=50049244

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210265609.2A Active CN103577439B (en) 2012-07-27 2012-07-27 Webpage pre-reading method and webpage pre-reading system

Country Status (1)

Country Link
CN (1) CN103577439B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105260424A (en) * 2015-09-28 2016-01-20 北京奇虎科技有限公司 Processing method and apparatus for webpage browsing historical records and most common accesses of user

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101075236A (en) * 2006-06-12 2007-11-21 腾讯科技(深圳)有限公司 Apparatus and method for accelerating browser webpage display
CN101369280A (en) * 2008-10-10 2009-02-18 深圳市茁壮网络技术有限公司 Method and device for web page browsing in digital television terminal
CN102222098A (en) * 2011-06-20 2011-10-19 北京邮电大学 Method and system for pre-fetching webpage
WO2012097701A1 (en) * 2011-01-18 2012-07-26 腾讯科技(深圳)有限公司 Method, system and computer storage medium for pre-reading network data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101075236A (en) * 2006-06-12 2007-11-21 腾讯科技(深圳)有限公司 Apparatus and method for accelerating browser webpage display
CN101369280A (en) * 2008-10-10 2009-02-18 深圳市茁壮网络技术有限公司 Method and device for web page browsing in digital television terminal
WO2012097701A1 (en) * 2011-01-18 2012-07-26 腾讯科技(深圳)有限公司 Method, system and computer storage medium for pre-reading network data
CN102222098A (en) * 2011-06-20 2011-10-19 北京邮电大学 Method and system for pre-fetching webpage

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105260424A (en) * 2015-09-28 2016-01-20 北京奇虎科技有限公司 Processing method and apparatus for webpage browsing historical records and most common accesses of user
CN105260424B (en) * 2015-09-28 2019-02-26 北京奇虎科技有限公司 The processing method and processing device that user browses web-page histories record and most frequentation is asked

Also Published As

Publication number Publication date
CN103577439B (en) 2017-02-08

Similar Documents

Publication Publication Date Title
CN103605688B (en) Intercept method and intercept device for homepage advertisements and browser
CN102693271B (en) A kind of network information recommending method and system
CN101454781B (en) Expanded snippets
CN102609474B (en) A kind of visit information supplying method and system
US20080282186A1 (en) Keyword generation system and method for online activity
CN107590169B (en) Operator gateway data preprocessing method and system
CN104035753A (en) Double-WebView customized page display method and system
CN104239298A (en) Text message recommendation method, server, browser and system
US20220114269A1 (en) Page processing method, electronic apparatus and non-transitory computer-readable storage medium
CN103838862B (en) Video searching method, device and terminal
US9280522B2 (en) Highlighting of document elements
CN112699295A (en) Webpage content recommendation method and device and computer readable storage medium
Insa Cabrera et al. Using the words/leafs ratio in the DOM tree for content extraction
Ghasemisharif et al. Speedreader: Reader mode made fast and private
RU2399090C2 (en) System and method for real time internet search of multimedia content
US11960834B2 (en) Reader mode-optimized attention application
Yatskov et al. Extraction of data from mass media web sites
US9384283B2 (en) System and method for deterring traversal of domains containing network resources
US20090313558A1 (en) Semantic Image Collection Visualization
Gali et al. Extracting representative image from web page
CN103729354A (en) Webpage information processing method and device
CN103577439A (en) Webpage pre-reading method and webpage pre-reading system
CN108319622B (en) Media content recommendation method and device
JP2004341965A (en) Object additional display method, and program, script, plug-in, tag, image, data, object, content, advertisement, and document for object additive display
CN105224552A (en) The disposal route of the network information, device and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant