CN102902791B - Web page classification storage system and method - Google Patents

Web page classification storage system and method Download PDF

Info

Publication number
CN102902791B
CN102902791B CN201210376351.3A CN201210376351A CN102902791B CN 102902791 B CN102902791 B CN 102902791B CN 201210376351 A CN201210376351 A CN 201210376351A CN 102902791 B CN102902791 B CN 102902791B
Authority
CN
China
Prior art keywords
page
page framework
framework
catalogue
title
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210376351.3A
Other languages
Chinese (zh)
Other versions
CN102902791A (en
Inventor
卢宏林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Qizhi Software Beijing Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201210376351.3A priority Critical patent/CN102902791B/en
Publication of CN102902791A publication Critical patent/CN102902791A/en
Application granted granted Critical
Publication of CN102902791B publication Critical patent/CN102902791B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a kind of web page classification storage system, relate to Internet technical field, this system includes: page framework ID computing module, is suitable to extract the page framework of the webpage obtained in advance, calculates page framework ID;Page framework memory module, is suitable to be stored under the catalogue with described page framework ID as title the page framework of same page framework ID.The invention also discloses a kind of Web page classifying storage method.The web page classification storage system of the present invention and method can be by the web storage of identical category under same catalogues, the result thus solving the whole network search will not be by the problem of webpage classification storage, owing to Search Results to be pressed the storage of webpage classification, reduce vertical search to interference factor during page framework pattern recognition.

Description

Web page classification storage system and method
Technical field
The present invention relates to Internet technical field, be specifically related to a kind of web page classification storage system and method.
Background technology
In search technique, it is essentially divided into two big classes.One class is with whole the Internet as object, capture whole webpage (the crawl degree of depth can be limited at present in a website, and typically do not process js(javascript), and simply process the partial dynamic page), and Webpage search webpage being processed and analyzing, i.e. the whole network search.The another kind of vertical search being the page just for certain classification and carrying out capturing and analyzing and processing, such as: picture searching, video search, Blog Search, forum's search, news search etc..For major part vertical search, it is all based on seed (also referred to as list page) at present and processes.The process of vertical search can be divided into two parts: the first looks for seed;Its two be from kind of subpage frame discovery the specific product page, the page of the most different classes of (picture, video, news etc.), then these product pages are processed.
Existing the whole network is searched for, and does not the most consider the demand of vertical search, it is impossible to distinguish webpage classification, and the treatment principle to each page is substantially consistent.Therefore the webpage captured during the whole network search is all unified storage, if will not put together by the webpage classification storage classification difference page carry out pattern recognition, interference factor is too many, and result is difficult to expect.If the Search Results utilizing the whole network to search for is wanted in vertical search, the result being necessary for searching for the whole network is by webpage category classification, and category storage, to facilitate the pattern recognition to Webpage framework during Web page classifying, if the site page not theed least concerned is put together carry out pattern recognition, interference factor is too many, and result is difficult to expect.Therefore, the result searched for the whole network is problem demanding prompt solution by webpage category classification storage.
Summary of the invention
In view of the above problems, it is proposed that the present invention is to provide a kind of web page classification storage system and method overcoming the problems referred to above or solving the problems referred to above at least in part.
According to one aspect of the present invention, it is provided that a kind of web page classification storage system, including:
Page framework ID computing module, is suitable to extract the page framework of the webpage obtained in advance, calculates page framework ID;
Page framework memory module, is suitable to be stored under the catalogue with described page framework ID as title the page framework of same page framework ID.
Alternatively, described page framework memory module is particularly adapted to search under current subdirectory whether the catalogue with described page framework ID as title exists, if existing, then page framework is stored under the catalogue of corresponding ID, if not existing, then create the catalogue with described page framework ID as title, then page framework is stored under the catalogue of corresponding ID.
Alternatively, described system also includes:
Framework quantity statistical module, is suitable to add up the catalogue lower page framework quantity that described page framework ID is title;
Web page contents memory module, if being suitable to reach threshold value, calculating page framework pattern, and by the page framework pattern calculated, the webpage that the page framework under this catalogue is corresponding being carried out data content download, and being stored under the catalogue specified by the data content of download.
Alternatively, described web page contents memory module farther includes: Fast Page memory module, being suitable to be stored under the quickly process catalogue specified the page of downloading of needs quickly process, what described needs quickly processed downloads the page is the new page occurred in website homepage and direct lower page thereof.
Alternatively, web page contents memory module farther includes: threshold adjustment, is suitable to judge whether the page framework quantity of the most corresponding ID has reached described threshold value, if not having, then by threshold value corresponding for this ID with certain increments.
Alternatively, described page framework ID computing module farther includes: Hash calculation module, is suitable for use with hash function and calculates described page framework, and using the rear nbit of cryptographic Hash as page framework ID.
Alternatively, described system also includes: domain name directory creating module, is suitable to set up priority catalogue for the different domain names of same website by different priorities, and the catalogue with described page framework ID as title is positioned under each domain name under corresponding priority catalogue.
Alternatively, described system also includes: webpage acquisition module, is suitable to obtain webpage by the whole network search, and obtains webpage in units of website, and under same website, the corresponding web storage of different domain names is under identical root.
According to a further aspect in the invention, it is provided that a kind of Web page classifying storage method, comprise the following steps:
The page framework of the webpage that extraction obtains in advance, calculates page framework ID;
The page framework of same page framework ID is stored under the catalogue with described page framework ID as title.
Alternatively, the described page framework by same page framework ID is stored under the catalogue with described page framework ID as title and specifically includes:
Under current subdirectory, search whether the catalogue with described page framework ID as title exists, if existing, then page framework is stored under the catalogue of corresponding ID, if not existing, then create the catalogue with described page framework ID as title, then page framework is stored under the catalogue of corresponding ID.
Alternatively, further comprise the steps of: after under page framework is stored in the catalogue that described page framework ID is title
Add up the catalogue lower page framework quantity that described page framework ID is title, if reaching threshold value, calculate page framework pattern, and by the page framework pattern calculated, the webpage that the page framework under this catalogue is corresponding is carried out data content download, and the data content of download is stored under the catalogue specified;If the most described threshold value, then continue to add up this catalogue lower page framework quantity.
Alternatively, the page of downloading quickly processed by described needs is stored under the quickly process catalogue specified, and what described needs quickly processed downloads the page is the new page occurred in website homepage and direct lower page thereof.
Alternatively, it is judged that whether the page framework quantity of the most corresponding ID has reached described threshold value, if not having, then by threshold value corresponding for this ID with certain increments.
Alternatively, hash function is used to calculate described page framework, and using the rear nbit of cryptographic Hash as page framework ID.
Alternatively, also including before the page framework of same page framework ID being stored under the catalogue with described page framework ID as title: the different domain names for same website set up priority catalogue by different priorities, the catalogue with described page framework ID as title is positioned under each domain name under corresponding priority catalogue.
Alternatively, capturing webpage by the whole network search, and capture webpage in units of website, under same website, the corresponding web storage of different domain names is under identical root.
Web page classification storage system and method according to the present invention can be by the web storage of identical category under same catalogues, the result thus solving the whole network search will not be by the problem of webpage classification storage, owing to Search Results to be pressed the storage of webpage classification, reduce vertical search to interference factor during page framework pattern recognition.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention, and can be practiced according to the content of description, and in order to above and other objects of the present invention, feature and advantage can be become apparent, below especially exemplified by the detailed description of the invention of the present invention.
Accompanying drawing explanation
By reading the detailed description of hereafter preferred implementation, various other advantage and benefit those of ordinary skill in the art be will be clear from understanding.Accompanying drawing is only used for illustrating the purpose of preferred implementation, and is not considered as limitation of the present invention.And in whole accompanying drawing, it is denoted by the same reference numerals identical parts.In the accompanying drawings:
Fig. 1 shows Web page classifying storage method flow diagram according to an embodiment of the invention;
Fig. 2 shows web page classification storage system structural representation according to an embodiment of the invention.
Detailed description of the invention
It is more fully described the exemplary embodiment of the disclosure below with reference to accompanying drawings.Although accompanying drawing showing the exemplary embodiment of the disclosure, it being understood, however, that may be realized in various forms the disclosure and should not limited by embodiments set forth here.On the contrary, it is provided that these embodiments are able to be best understood from the disclosure, and complete for the scope of the present disclosure can be conveyed to those skilled in the art.
The Web page classifying storage method flow of the present embodiment is as it is shown in figure 1, include:
Step S110, the page framework of the webpage that extraction obtains in advance, calculate page framework ID.The webpage obtained in advance can be the webpage that the whole network search captures.The mode of the page framework extracting described webpage is: extract the page framework of described webpage according to the html linguistic labels in web page source code, the labelling of html linguistic labels middle frame class is only retained during extraction, as: frame, table etc., retain id, name, class attribute simultaneously, remove remaining attribute.Web page text can also be identified by punctuate, remove text to obtain the page framework of webpage.After extraction page framework, attribute in the page is calculated according to hash algorithm the hash value of page framework, it is page framework ID, such as: after extraction page framework, utilize the salted hash Salted methods such as MD5 or FNV to calculate the hash value of page framework, will the labelling of frame clsss, as: frame, table and id, name, class attribute etc. are calculated by hash algorithm, and acquired results value is page framework ID.Owing to using identical hash function, page framework ID that identical page framework calculates is the most identical.
In the present embodiment, it is preferred to use hash function calculates page framework, and using the rear nbit of cryptographic Hash as page framework ID.Wherein the value of n makes the rear nbit cryptographic Hash of page framework ID that different page framework calculates not repeat, such as: rear 8bit.After so also allowing for using, nbit cryptographic Hash is as the title of storage catalogue.
Step S120, is stored in the page framework of same page framework ID under the catalogue with page framework ID as title.After calculating page framework ID of webpage, under current subdirectory, search whether the catalogue with this page framework ID as title exists, if existing, then page framework is stored under the catalogue of corresponding ID, if not existing, then create the catalogue with this page framework ID as title, then page framework is stored under the catalogue of corresponding ID.
The Web page classifying storage method of the present embodiment is by the page framework of page framework ID classification storage webpage, reduce vertical search to interference factor during page framework pattern recognition, make vertical search can utilize the result that the whole network is searched for, improve the utilization ratio of resource, give full play to the whole network search comprehensive advantage of coverage, hence it is evident that improve the coverage of vertical search.
Owing to page framework pattern recognition needs to accumulate the page framework of a number of identical ID, therefore, further, page framework further comprises the steps of: after being stored under the catalogue that page framework ID is title
Page framework ID is the catalogue lower page framework quantity of title, if reaching threshold value, calculate page framework pattern, by the page framework pattern calculated, the webpage that the page framework under this catalogue is corresponding is carried out data content download, and the data content of download is stored under the catalogue specified.
If not up to this threshold value, then continue to add up this catalogue lower page framework quantity.
Process to prevent some webpage from can not get for a long time, it is judged that whether the page framework quantity of the most corresponding same ID has reached this threshold value, if not having, then by threshold value corresponding for this ID with certain increments.Wherein this threshold value is preferably 23.
It is typically website homepage due to web data and the more renewal page occurs in homepage direct lower floor webpage, answer the data in priority treatment website homepage and homepage direct lower floor webpage.Therefore, the new page occurred in website homepage and direct lower page thereof is stored under the quickly process catalogue specified.Other deeper pages are often historical data, can some process slowly.
Further, for adapting to the demand of the priority of the different domain names of same website, also including before under the catalogue being stored in the page framework of same page framework ID with page framework ID as title: the different domain names for same website set up priority catalogue by different priorities, the catalogue with page framework ID as title is positioned under each domain name under corresponding priority catalogue.
If the site page not theed least concerned being put together carry out pattern recognition, interference factor is too many, result is difficult to expect, therefore, further, in the present embodiment, obtaining webpage when obtaining webpage by the whole network search in units of website, under same website, the corresponding web storage of different domain names is under identical root.
Present invention also offers a kind of web page classification storage system 2, its structural representation is as in figure 2 it is shown, include: page framework ID computing module 210 and web page frame memory module 220
Page framework ID computing module 210 is suitable to extract the page framework of the webpage obtained in advance, calculates page framework ID.Page framework ID computing module farther includes: Hash calculation module, is suitable for use with hash function and calculates described page framework, and using the rear nbit of cryptographic Hash as page framework ID, such as: rear 8bit.
Page framework memory module 220, is suitable to be stored under the catalogue with described page framework ID as title the page framework of same page framework ID.Page framework memory module 220 is particularly adapted to search under current subdirectory whether the catalogue with described page framework ID as title exists, if existing, then page framework is stored under the catalogue of corresponding ID, if not existing, then create the catalogue with described page framework ID as title, then page framework is stored under the catalogue of corresponding ID.
Owing to page framework pattern recognition needs to accumulate the page framework of a number of identical ID, therefore, the web page classification storage system of the present embodiment also includes:
Framework quantity statistical module, is suitable to add up the catalogue lower page framework quantity that described page framework ID is title;
Web page contents memory module, if being suitable to reach threshold value, calculating page framework pattern, and by the page framework pattern calculated, the webpage that the page framework under this catalogue is corresponding being carried out data content download, and being stored under the catalogue specified by the data content of download.
Web page contents memory module farther includes: Fast Page memory module, being suitable to be stored under the quickly process catalogue specified the page of downloading of needs quickly process, what described needs quickly processed downloads the page is the new page occurred in website homepage and direct lower page thereof.
Web page contents memory module farther includes: threshold adjustment, is suitable to judge whether the page framework quantity of the most corresponding ID has reached described threshold value, if not having, then by threshold value corresponding for this ID with certain increments.
The web page classification storage system of the present embodiment also includes: domain name directory creating module, being suitable to set up priority catalogue for the different domain names of same website by different priorities, the catalogue with described page framework ID as title is positioned under each domain name under corresponding priority catalogue.
The web page classification storage system of the present embodiment also includes: webpage acquisition module, is suitable to obtain webpage by the whole network search, and obtains webpage in units of website, and under same website, the corresponding web storage of different domain names is under identical root.
Algorithm and display are not intrinsic to any certain computer, virtual system or miscellaneous equipment relevant provided herein.Various general-purpose systems can also be used together with based on teaching in this.As described above, construct the structure required by this kind of system to be apparent from.Additionally, the present invention is also not for any certain programmed language.It is understood that, it is possible to use various programming languages realize the content of invention described herein, and the description done language-specific above is the preferred forms in order to disclose the present invention.
In description mentioned herein, illustrate a large amount of detail.It is to be appreciated, however, that embodiments of the invention can be put into practice in the case of not having these details.In some instances, it is not shown specifically known method, structure and technology, in order to do not obscure the understanding of this description.
Similarly, it is to be understood that, one or more in order to simplify that the disclosure helping understands in each inventive aspect, above in the description of the exemplary embodiment of the present invention, each feature of the present invention is grouped together in single embodiment, figure or descriptions thereof sometimes.But, the method for the disclosure should not being construed to reflect an intention that, i.e. the present invention for required protection requires than the more feature of feature being expressly recited in each claim.More precisely, as the following claims reflect, inventive aspect is all features less than single embodiment disclosed above.Therefore, it then follows claims of detailed description of the invention are thus expressly incorporated in this detailed description of the invention, the most each claim itself is as the independent embodiment of the present invention.
Those skilled in the art are appreciated that and can adaptively change the module in the equipment in embodiment and they are arranged in one or more equipment different from this embodiment.Module in embodiment or unit or assembly can be combined into a module or unit or assembly, and multiple submodule or subelement or sub-component can be put them in addition.In addition at least some in such feature and/or process or unit excludes each other, can use any combination that all features disclosed in this specification (including adjoint claim, summary and accompanying drawing) and so disclosed any method or all processes of equipment or unit are combined.Unless expressly stated otherwise, each feature disclosed in this specification (including adjoint claim, summary and accompanying drawing) can be replaced by the alternative features providing identical, equivalent or similar purpose.
In addition, those skilled in the art it will be appreciated that, although embodiments more described herein include some feature included in other embodiments rather than further feature, but the combination of the feature of different embodiment means to be within the scope of the present invention and formed different embodiments.Such as, in the following claims, one of arbitrarily can mode the using in any combination of embodiment required for protection.
The all parts embodiment of the present invention can realize with hardware, or realizes with the software module run on one or more processor, or realizes with combinations thereof.It will be understood by those of skill in the art that the some or all functions that microprocessor or digital signal processor (DSP) can be used in practice to realize the some or all parts in web page classification storage system equipment according to embodiments of the present invention.The present invention is also implemented as part or all the equipment for performing method as described herein or device program (such as, computer program and computer program).The program of such present invention of realization can store on a computer-readable medium, or can be to have the form of one or more signal.Such signal can be downloaded from internet website and obtain, or provides on carrier signal, or provides with any other form.
The present invention will be described rather than limits the invention to it should be noted above-described embodiment, and those skilled in the art can design alternative embodiment without departing from the scope of the appended claims.In the claims, any reference marks that should not will be located between bracket is configured to limitations on claims.Word " comprises " and does not excludes the presence of the element or step not arranged in the claims.Word "a" or "an" before being positioned at element does not excludes the presence of multiple such element.The present invention by means of including the hardware of some different elements and can realize by means of properly programmed computer.If in the unit claim listing equipment for drying, several in these devices can be specifically to be embodied by same hardware branch.Word first, second and third use do not indicate that any order.Can be title by these word explanations.

Claims (16)

1. a web page classification storage system based on page framework, including:
Page framework ID computing module, is suitable to extract the page framework of the webpage obtained in advance, calculates page framework ID;
Page framework memory module, is suitable to be stored under the catalogue with described page framework ID as title the page framework of same page framework ID.
2. web page classification storage system as claimed in claim 1, it is characterized in that, described page framework memory module is particularly adapted to search under current subdirectory whether the catalogue with described page framework ID as title exists, if existing, then page framework is stored under the catalogue of corresponding ID, if not existing, then create the catalogue with described page framework ID as title, then page framework is stored under the catalogue of corresponding ID.
3. web page classification storage system as claimed in claim 2, it is characterised in that described system also includes:
Framework quantity statistical module, is suitable to add up the catalogue lower page framework quantity that described page framework ID is title;
Web page contents memory module, if being suitable to reach threshold value, calculating page framework pattern, and by the page framework pattern calculated, the webpage that the page framework under this catalogue is corresponding being carried out data content download, and being stored under the catalogue specified by the data content of download.
4. web page classification storage system as claimed in claim 3, it is characterized in that, described web page contents memory module farther includes: Fast Page memory module, being suitable to be stored under the quickly process catalogue specified the page of downloading of needs quickly process, what described needs quickly processed downloads the page is the new page occurred in website homepage and direct lower page thereof.
5. the web page classification storage system as according to any one of claim 3~4, it is characterized in that, web page contents memory module farther includes: threshold adjustment, be suitable to judge whether the page framework quantity of the most corresponding ID has reached described threshold value, if no, then by threshold value corresponding for this ID with certain increments.
6. the web page classification storage system as according to any one of Claims 1 to 4, it is characterized in that, described page framework ID computing module farther includes: Hash calculation module, is suitable for use with hash function and calculates described page framework, and using the rear nbit of cryptographic Hash as page framework ID.
7. the web page classification storage system as according to any one of Claims 1 to 4, it is characterized in that, described system also includes: domain name directory creating module, being suitable to set up priority catalogue for the different domain names of same website by different priorities, the catalogue with described page framework ID as title is positioned under each domain name under corresponding priority catalogue.
8. the web page classification storage system as according to any one of Claims 1 to 4, it is characterized in that, described system also includes: webpage acquisition module, be suitable to obtain webpage by the whole network search, and in units of website, obtaining webpage, under same website, the corresponding web storage of different domain names is under identical root.
9. Web page classifying based on a page framework storage method, comprises the following steps:
The page framework of the webpage that extraction obtains in advance, calculates page framework ID;
The page framework of same page framework ID is stored under the catalogue with described page framework ID as title.
10. Web page classifying storage method as claimed in claim 9, it is characterised in that the described page framework by same page framework ID is stored under the catalogue with described page framework ID as title and specifically includes:
Under current subdirectory, search whether the catalogue with described page framework ID as title exists, if existing, then page framework is stored under the catalogue of corresponding ID, if not existing, then create the catalogue with described page framework ID as title, then page framework is stored under the catalogue of corresponding ID.
11. Web page classifying as claimed in claim 10 storage methods, it is characterised in that page framework further comprises the steps of: after being stored under the catalogue that described page framework ID is title
Add up the catalogue lower page framework quantity that described page framework ID is title, if reaching threshold value, calculate page framework pattern, and by the page framework pattern calculated, the webpage that the page framework under this catalogue is corresponding is carried out data content download, and the data content of download is stored under the catalogue specified;If the most described threshold value, then continue to add up this catalogue lower page framework quantity.
12. Web page classifying as described in any of claims 11 storage methods, it is characterized in that, being stored under the quickly process catalogue specified by the page of downloading of needs quickly process, what described needs quickly processed downloads the page is the new page occurred in website homepage and direct lower page thereof.
The 13. Web page classifying storage methods as according to any one of claim 11~12, it is characterized in that, judge whether the page framework quantity of the most corresponding ID has reached described threshold value, if not having, then by threshold value corresponding for this ID with certain increments.
The 14. Web page classifyings storage methods as according to any one of claim 9~12, it is characterised in that use hash function to calculate described page framework, and using the rear nbit of cryptographic Hash as page framework ID.
The 15. Web page classifying storage methods as according to any one of claim 9~12, it is characterized in that, also including before the page framework of same page framework ID being stored under the catalogue with described page framework ID as title: the different domain names for same website set up priority catalogue by different priorities, the catalogue with described page framework ID as title is positioned under each domain name under corresponding priority catalogue.
The 16. Web page classifying storage methods as according to any one of claim 9~12, it is characterised in that capturing webpage by the whole network search, and capture webpage in units of website, under same website, the corresponding web storage of different domain names is under identical root.
CN201210376351.3A 2012-09-29 2012-09-29 Web page classification storage system and method Active CN102902791B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210376351.3A CN102902791B (en) 2012-09-29 2012-09-29 Web page classification storage system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210376351.3A CN102902791B (en) 2012-09-29 2012-09-29 Web page classification storage system and method

Publications (2)

Publication Number Publication Date
CN102902791A CN102902791A (en) 2013-01-30
CN102902791B true CN102902791B (en) 2016-08-03

Family

ID=47575023

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210376351.3A Active CN102902791B (en) 2012-09-29 2012-09-29 Web page classification storage system and method

Country Status (1)

Country Link
CN (1) CN102902791B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102902784B (en) * 2012-09-29 2016-03-02 北京奇虎科技有限公司 Web page classification storage system and method
CN104809121B (en) * 2014-01-24 2019-12-27 腾讯科技(深圳)有限公司 Method and device for controlling display of browser webpage window
CN107544994B (en) * 2016-06-27 2021-01-22 北京国双科技有限公司 Associated data processing method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079063A (en) * 2007-06-25 2007-11-28 腾讯科技(深圳)有限公司 Method, system and apparatus for transmitting advertisement based on scene information
CN101251855A (en) * 2008-03-27 2008-08-27 腾讯科技(深圳)有限公司 Equipment, system and method for cleaning internet web page
CN101814083A (en) * 2010-01-08 2010-08-25 上海复歌信息科技有限公司 Automatic webpage classification method and system
CN102298614A (en) * 2011-07-29 2011-12-28 百度在线网络技术(北京)有限公司 Method for determining collection category of page collection information and device and equipment
CN102902784A (en) * 2012-09-29 2013-01-30 北京奇虎科技有限公司 Web page classification storage system and method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7870474B2 (en) * 2007-05-04 2011-01-11 Yahoo! Inc. System and method for smoothing hierarchical data using isotonic regression
CN102411587B (en) * 2010-09-21 2013-08-21 腾讯科技(深圳)有限公司 Webpage classification method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079063A (en) * 2007-06-25 2007-11-28 腾讯科技(深圳)有限公司 Method, system and apparatus for transmitting advertisement based on scene information
CN101251855A (en) * 2008-03-27 2008-08-27 腾讯科技(深圳)有限公司 Equipment, system and method for cleaning internet web page
CN101814083A (en) * 2010-01-08 2010-08-25 上海复歌信息科技有限公司 Automatic webpage classification method and system
CN102298614A (en) * 2011-07-29 2011-12-28 百度在线网络技术(北京)有限公司 Method for determining collection category of page collection information and device and equipment
CN102902784A (en) * 2012-09-29 2013-01-30 北京奇虎科技有限公司 Web page classification storage system and method

Also Published As

Publication number Publication date
CN102902791A (en) 2013-01-30

Similar Documents

Publication Publication Date Title
CN102902784B (en) Web page classification storage system and method
CN104699704B (en) Content pushing and receiving method, device and system
CN104077391A (en) Method, server, client and system for providing special news search
CN102968451B (en) The browser form page loads method and the client of website data
CN103631875A (en) Method for carrying out network search on browser side and browser
CN103714116A (en) Webpage information extracting method and webpage information extracting equipment
CN107045507B (en) Webpage crawling method and device
CN102902794B (en) Web page classification system and method
CN103678511A (en) Method and device for extracting webpage content according to visualized template
CN105447198A (en) Convenient page script importing method and device
CN103823907A (en) Method, device and engine for integrating on-line video resource addresses
CN102955850A (en) Method and device for loading sequencing website
CN102902790B (en) Web page classification system and method
CN104978373A (en) Webpage display method and webpage display device
CN103678509A (en) Method and device for generating webpage template
CN107015986B (en) Method and device for crawling webpage by crawler
CN102902791B (en) Web page classification storage system and method
CN102902792B (en) list page identification system and method
CN105630310A (en) Method and device for displaying titles during graph group switching
CN103605770A (en) Method and server for generating web page templates
CN103530337A (en) Device and method for recognizing invalid parameters in URL
CN103678510A (en) Method and device for providing visualized label for webpage
CN104166545A (en) Webpage resource sniffing method and device
CN102929948B (en) list page identification system and method
CN102890717B (en) Webpage category knowledge base set up system and method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220708

Address after: Room 801, 8th floor, No. 104, floors 1-19, building 2, yard 6, Jiuxianqiao Road, Chaoyang District, Beijing 100015

Patentee after: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

Address before: 100088 room 112, block D, 28 new street, new street, Xicheng District, Beijing (Desheng Park)

Patentee before: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

Patentee before: Qizhi software (Beijing) Co., Ltd