CN102890717B - Webpage category knowledge base set up system and method - Google Patents

Webpage category knowledge base set up system and method Download PDF

Info

Publication number
CN102890717B
CN102890717B CN201210376381.4A CN201210376381A CN102890717B CN 102890717 B CN102890717 B CN 102890717B CN 201210376381 A CN201210376381 A CN 201210376381A CN 102890717 B CN102890717 B CN 102890717B
Authority
CN
China
Prior art keywords
page
web page
framework
knowledge base
webpage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210376381.4A
Other languages
Chinese (zh)
Other versions
CN102890717A (en
Inventor
卢宏林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Qizhi Software Beijing Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201210376381.4A priority Critical patent/CN102890717B/en
Publication of CN102890717A publication Critical patent/CN102890717A/en
Application granted granted Critical
Publication of CN102890717B publication Critical patent/CN102890717B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a kind of webpage category knowledge base sets up system, relates to Internet technical field, and this system includes: sample page framework ID computing module, is suitable to the page framework of sample drawn webpage, calculates page framework ID of sample web page;Pattern accumulation module, is suitable to the page framework quantity of accumulative identical ID when reaching threshold value, calculates the page framework pattern of sample web page;Knowledge base sets up module, is adapted to set up the classification of sample web page and the mapping relations of described page framework pattern to generate webpage category knowledge base.The invention also discloses the method for building up of a kind of webpage category knowledge base.The setting up system and method and can set up the identification other knowledge base of web page class quickly to identify webpage classification of webpage category knowledge base according to the present invention, thus solve the whole network search and cannot be distinguished by the other problem of web page class, achieve and quickly identify the other beneficial effect of web page class.

Description

Webpage category knowledge base set up system and method
Technical field
The present invention relates to Internet technical field, be specifically related to a kind of webpage category knowledge base sets up system And method.
Background technology
In search technique, it is essentially divided into two big classes.One class is with whole the Internet as object, captures All webpages (the crawl degree of depth can be limited at present in a website, and typically do not process js(java script), And simply process the partial dynamic page), and Webpage search webpage being processed and analyzing, the most entirely Net search.Another kind of is to carry out the vertical search that captures and analyze and process, such as just for certain class page: figure Sheet search, video search, Blog Search, forum's search, news search etc..Major part is vertically searched For rope, it is all based on seed (also referred to as list page) at present and processes.The process of vertical search can It is divided into two parts: the first looks for seed;Its two be from kind of subpage frame discovery the specific product page, i.e. Then these product pages are processed by the page of different classes of (picture, video, news etc.).
Existing the whole network is searched for, and does not the most consider the demand of vertical search, it is impossible to different product of classifying, I.e. cannot be distinguished by webpage classification, be only vertical search auxiliary and excavate some useful information.If it is existing Vertical search, due to Webpage search, both analyzing and processing modes are different.Between system the most independently, The page that the whole network search is downloaded, analyzed and processed, what vertical search also can be independent is downloaded and at analysis Reason, it is impossible to share resource, both can not organically integrate the resource making vertical search share the whole network search. Therefore, foundation can identify that the other knowledge base of web page class is problem demanding prompt solution automatically.
Summary of the invention
In view of the above problems, it is proposed that the present invention is to provide one to overcome the problems referred to above or at least partly Ground solve the problems referred to above webpage category knowledge base set up system and method.
According to one aspect of the present invention, it is provided that webpage category knowledge base set up system, including:
Sample page framework ID computing module, is suitable to the page framework of sample drawn webpage, calculates sample net Page framework ID of page;
Pattern accumulation module, is suitable to the page framework quantity of accumulative identical ID when reaching threshold value, calculates sample The page framework pattern of webpage;
Knowledge base sets up module, is adapted to set up classification and the mapping of described page framework pattern of sample web page Relation is to generate webpage category knowledge base.
Alternatively, described knowledge base is set up module and is farther included:
Weight setting module, is suitable to the classification according to different sample web page, for the page framework mould of the category Each web page characteristics in formula gives and presets weight;
Mapping table sets up module, be adapted to set up the classification of sample web page and each web page characteristics of the category and The relation mapping table of weight, to generate webpage category knowledge base.
Alternatively, page framework ID computing module farther includes: page framework abstraction module, is suitable to root The page framework of described sample web page is extracted according to the html linguistic labels in sample web page source code.
Alternatively, page framework ID computing module farther includes: page framework abstraction module, be suitable to by Punctuate identifies the text of sample web page, removes text to obtain the page framework of described sample web page.
Alternatively, described pattern accumulation module farther includes:
List page identification module undetermined, is suitable to determine whether be positioned at page fixed position block and stablize There is the link of certain time, if having, then setting described sample web page as list page undetermined;
List page framework mode determines module, is suitable to dispatch at set intervals the most described list undetermined Page, if it is new url that described link is constantly updated, just sets the page framework pattern of described sample web page For list page framework mode.
According to a further aspect in the invention, it is provided that the method for building up of webpage category knowledge base, including following Step:
The page framework of sample drawn webpage, calculates page framework ID of sample web page;
When the page framework quantity of accumulative identical ID reaches threshold value, calculate the page framework pattern of sample web page;
Set up classification and the mapping relations of described page framework pattern of sample web page, to generate webpage classification Knowledge base.
Alternatively, the mapping relations of the described classification setting up sample web page and described page framework pattern are with life Webpage category knowledge base is become to specifically include:
According to the classification of different sample web page, for each web page characteristics in the page framework pattern of the category Imparting presets weight;
Set up the classification of sample web page and each web page characteristics of the category and the relation mapping table of weight, with Generate webpage category knowledge base.
Alternatively, the mode of the page framework extracting described sample web page is: according to sample web page source code In html linguistic labels extract described sample web page page framework.
Alternatively, the mode of the page framework extracting described sample web page is: identify sample net by punctuate The text of page, removes text to obtain the page framework of described sample web page.
Alternatively, the mode that list page framework mode calculates is:
Determine whether to be positioned at page fixed position block and the link of stable existence certain time, if having, Then set described sample web page as list page undetermined;
The most described list page undetermined is dispatched, if it is new that described link is constantly updated at set intervals Link, is just set to list page framework mode by the page framework pattern of described sample web page.
Webpage category knowledge base according to the present invention set up system and method can set up identification webpage classification Knowledge base quickly to identify webpage classification, thus solve the whole network search and cannot be distinguished by that web page class is other asks Topic, achieves and quickly identifies the other beneficial effect of web page class.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the skill of the present invention Art means, and can being practiced according to the content of description, and in order to allow, the present invention's is above and other Objects, features and advantages can become apparent, below especially exemplified by the detailed description of the invention of the present invention.
Accompanying drawing explanation
By reading the detailed description of hereafter preferred implementation, various other advantage and benefit for this Field those of ordinary skill will be clear from understanding.Accompanying drawing is only used for illustrating the purpose of preferred implementation, And it is not considered as limitation of the present invention.And in whole accompanying drawing, be denoted by the same reference numerals Identical parts.In the accompanying drawings:
Fig. 1 shows the method for building up flow chart of webpage category knowledge base according to an embodiment of the invention;
Fig. 2 shows the particular flow sheet of step S130 in Fig. 1;
Fig. 3 shows that the system structure of setting up of webpage category knowledge base according to an embodiment of the invention is shown It is intended to;
Fig. 4 shows that in Fig. 3, knowledge base sets up module concrete structure schematic diagram.
Detailed description of the invention
It is more fully described the exemplary embodiment of the disclosure below with reference to accompanying drawings.Although accompanying drawing shows The exemplary embodiment of the disclosure, it being understood, however, that may be realized in various forms the disclosure and not Should be limited by embodiments set forth here.On the contrary, it is provided that these embodiments are able to more thoroughly Understand the disclosure, and complete for the scope of the present disclosure can be conveyed to those skilled in the art.
The method for building up flow process of the webpage category knowledge base of the present embodiment is as it is shown in figure 1, include:
Step S110, the page framework of sample drawn webpage, and calculate page framework ID of sample web page. Sample web page is the known other webpage of its web page class chosen in advance.The page framework of sample drawn webpage Mode is: according to the page framework of the html linguistic labels sample drawn webpage in web page source code, extraction Time only retain the labelling of html linguistic labels middle frame class, such as: frame, table etc., retain simultaneously id, Name, class attribute, removes remaining attribute.Web page text can also be identified by punctuate, remove text To obtain the page framework of sample web page.By attribute in the page according to hash algorithm meter after extraction page framework Calculate the hash value of page framework, be page framework ID, such as: after extraction page framework, utilize MD5 Or the salted hash Salted method such as FNV calculates the hash value of page framework, will the labelling of frame clsss.As: Frame, table and id, name, class attribute etc. are calculated by hash algorithm, acquired results value It is page framework ID of sample web page.Owing to using identical hash function, identical page framework meter Page framework ID calculated is the most identical.
Step S 120, when the page framework quantity of accumulative identical ID reaches threshold value, calculates sample web page Page framework pattern.During calculating, part of title, time, text etc. calculate respectively, and computational methods can use Machine Learning Automata system, such as: use support vector machine (support vector machine, SVM) meter Calculate page framework pattern.During study, sample web page is converted into source code based on Html language, and extracts Html linguistic labels key signature, obtains page framework, and this step has been carried out in step s 110.Will Page framework input SVM learns, and page framework i.e. carries out html linguistic labels key signature Coupling, the html linguistic labels key signature in the page framework of some identical ID can mate completely, Therefore, the page framework for identical ID learns after the quantity of above-mentioned threshold value, and SVM just exports accordingly The page framework pattern of page framework.Page framework is also needed to be done as follows: will before study Title and title or anchor(anchor point) inner variable content mates;Time will be according to the form meter of time Calculate;Text to have certain variable ratio and length requirement, so can reject the rubbish contents such as advertisement.
Process to prevent some sample web page from can not get for a long time, it is judged that the most corresponding same Whether the page framework quantity of the sample web page of I D has reached this threshold value, if not having, then this I D is corresponding Threshold value with certain increments.Wherein this threshold value is preferably 23.
Step S130, sets up the classification of sample web page and the mapping relations of its page framework pattern, to generate Webpage category knowledge base.Its concrete generation step is as in figure 2 it is shown, include:
Step S210, according to the classification of different sample web page, each in the page framework pattern of the category Individual web page characteristics gives and presets weight.
Step S220, sets up the classification of sample web page and each web page characteristics of the category and the relation of weight Mapping table, to generate webpage category knowledge base.
Wherein, sample class includes: the web page class such as picture, video, blog, forum (bbs) and news Not.The page framework pattern of the sample web page of each classification has some different web page characteristics, some not The same characterized page framework pattern of web page characteristics, the webpage of i.e. one kind.Certainly, two The webpage of different classifications may comprise the web page characteristics that one or more (being not all of) is identical, but Weight may be different, such as: forum (bbs) and news all include the net of " title, time, text " Page feature.The webpage category knowledge base concrete form generated by above-mentioned steps is that webpage classification is corresponding Web page characteristics under page framework pattern and weight mapping table, as shown in table 1 below:
Web page characteristics under the page framework pattern that table 1 webpage classification is corresponding and weight mapping table
Upper table only lists partial information, it is intended that illustrate under the page framework pattern that webpage classification is corresponding Web page characteristics and weight map mapping relations.Can be seen that from upper table, the page framework mould of news web page Formula, two web page characteristics therein: comprise news keyword in (1) url, in (2) page-mode There are title, time, text.Its weight is respectively 50 and 30.Page-mode has title, time, just Literary composition can also be bbs(forum) web page characteristics of the page framework pattern of webpage, its weight is 20.bbs Also there is feature: containing bbs or forum in url, its weight is 50.The web page characteristics of list page includes: Comprising " more " keyword, navigation bar pattern and webpage in url is top-level domain etc., and the weight of setting is divided It is not: 30,50 and 60.
When using the classification of webpage category knowledge base identification target pages framework mode, according to the difference in table The weight of classification is that this target pages framework mode is given a mark.Such as, if in url containing bbs or Forum, then just add 50 points for bbs, if there being news in url, just adds 50 points for news.If In page-mode, there are title, time, text, just add 30 points for news, it is also possible to add 20 for bbs Point.If having the information, the most respectively bbs such as floor, reply number to add some marks.And so on.If The mark pressing news category weight gained after all characteristic matching of target pages framework mode is the highest, then will This page framework pattern is classified as news category.
For list page, its page frame can be calculated according to the SVM learning method in above-mentioned steps S120 Frame pattern, due to the particularity of the web page characteristics of list page, including: the domain name that webpage is corresponding is one-level territory Name;Navigation bar pattern;Including " more " keyword etc..Accordingly it is also possible to press in the step s 120 State mode Direct Recognition list page:
Judge whether the domain name that webpage is corresponding is top-level domain, the most then arranging this webpage is list page. If the domain name that webpage is corresponding is not top-level domain, the most in the following manner recognized list page: determine whether position In the page in the block of fixed position and the link of stable existence certain time, if having, then set this webpage as List page undetermined;The most described list page undetermined is dispatched, if described link is continuous at set intervals It is updated to new url, just the page framework pattern of this webpage is set to list page framework mode, i.e. this webpage For list page.Such as: the navigation bar of webpage top, and web page frame includes " more " printed words The link that part is generally all in the page in fixed block, i.e. comprises navigation bar and " more " printed words Webpage is list page.
The method for building up of the webpage category knowledge base of the present embodiment establishes and can quickly identify that web page class is other Knowledge base, solves the whole network search and cannot be distinguished by the other problem of web page class, search for for vertical search and the whole network Integration lay a good foundation.
Present invention also offers the system of setting up 3 of a kind of webpage category knowledge base, concrete knot as it is shown on figure 3, Including: sample page framework ID computing module 310, pattern accumulation module 320 and knowledge base set up module 330.
Sample page framework ID computing module 310 is suitable to the page framework of sample drawn webpage, calculates sample net Page framework ID of page.Sample page framework ID computing module 310 farther includes: page framework extraction mould Block, is suitable to extract the page frame of described sample web page according to the html linguistic labels in sample web page source code Frame;Apply also for identifying by punctuate the text of sample web page, remove text to obtain described sample web page Page framework.
Pattern accumulation module 320 is suitable to the page framework quantity of accumulative identical ID when reaching threshold value, calculates sample The page framework pattern of webpage.Pattern accumulation module farther includes: threshold adjustment, is suitable to judge Whether the page framework quantity of the sample web page of the most corresponding same ID has reached described threshold Value, if not having, then by threshold value corresponding for this ID with certain increments.
Pattern accumulation module 320 farther includes: domain name identification module, is suitable to judge the domain name that webpage is corresponding Whether is top-level domain, the most then arranging this webpage is list page.Pattern accumulation module 320 is the most further Including list page identification module undetermined, be suitable to determine whether to be positioned at page fixed position block and steady Surely there is the link of certain time, if having, then setting this webpage as list page undetermined;List page frame mould Formula determines module, is suitable to dispatch at set intervals the most described list page undetermined, if described link Constantly updating is new url, just the page framework pattern of described webpage is set to list page framework mode.
Knowledge base is set up module 330 and is adapted to set up the classification of sample web page and reflecting of described page framework pattern Relation of penetrating is to generate webpage category knowledge base.Knowledge base sets up module 330 concrete structure as shown in Figure 4, enters One step includes:
Weight setting module 410, is suitable to the classification according to different sample web page, for the page framework of the category Each web page characteristics in pattern gives and presets weight;
Mapping table sets up module 420, is adapted to set up the classification of sample web page and each web page characteristics of the category And the relation mapping table of weight, to generate webpage category knowledge base.
Algorithm and display be not solid with any certain computer, virtual system or miscellaneous equipment provided herein Have relevant.Various general-purpose systems can also be used together with based on teaching in this.As described above, Construct the structure required by this kind of system to be apparent from.Additionally, the present invention is also not for any specific Programming language.It is understood that, it is possible to use various programming languages realize the content of invention described herein, And the description done language-specific above is the preferred forms in order to disclose the present invention.
In description mentioned herein, illustrate a large amount of detail.It is to be appreciated, however, that this Bright embodiment can be put into practice in the case of not having these details.In some instances, the most in detail Known method, structure and technology are carefully shown, in order to do not obscure the understanding of this description.
Similarly, it will be appreciated that in order to simplify the disclosure help to understand in each inventive aspect one or Multiple, above in the description of the exemplary embodiment of the present invention, each feature of the present invention sometimes by It is grouped into together in single embodiment, figure or descriptions thereof.But, should be by the disclosure Method is construed to reflect an intention that i.e. the present invention for required protection requires that ratio is in each claim The more feature of feature being expressly recited.More precisely, as the following claims reflect Like that, inventive aspect is all features less than single embodiment disclosed above.Therefore, it then follows tool Claims of body embodiment are thus expressly incorporated in this detailed description of the invention, and the most each right is wanted Ask itself all as the independent embodiment of the present invention.
Those skilled in the art are appreciated that and can carry out the module in the equipment in embodiment certainly Change adaptively and they are arranged in one or more equipment different from this embodiment.Permissible Module in embodiment or unit or assembly are combined into a module or unit or assembly, and in addition may be used To put them into multiple submodule or subelement or sub-component.Except such feature and/or process or Outside at least some in unit excludes each other, can use any combination that (this specification is included companion With claim, summary and accompanying drawing) disclosed in all features and so disclosed any method or All processes of person's equipment or unit are combined.Unless expressly stated otherwise, this specification (includes companion With claim, summary and accompanying drawing) disclosed in each feature can by provide identical, equivalent or phase Replace like the alternative features of purpose.
Although additionally, it will be appreciated by those of skill in the art that embodiments more described herein include other Some feature included in embodiment rather than further feature, but the combination of the feature of different embodiment Mean to be within the scope of the present invention and formed different embodiments.Such as, in following right In claim, one of arbitrarily can mode using in any combination of embodiment required for protection.
The all parts embodiment of the present invention can realize with hardware, or with at one or more processor The software module of upper operation realizes, or realizes with combinations thereof.Those skilled in the art should manage Solve, microprocessor or digital signal processor (DSP) can be used in practice to realize according to this The some or all parts set up in system of the webpage category knowledge base of inventive embodiments some or Repertoire.The present invention is also implemented as the part for performing method as described herein or complete The equipment in portion or device program (such as, computer program and computer program).Such reality The program of the existing present invention can store on a computer-readable medium, or can have one or more The form of signal.Such signal can be downloaded from internet website and obtain, or on carrier signal There is provided, or provide with any other form.
The present invention will be described rather than limits the invention to it should be noted above-described embodiment, and And those skilled in the art can design replacement enforcement without departing from the scope of the appended claims Example.In the claims, any reference marks that should not will be located between bracket is configured to claim Restriction.Word " comprises " and does not excludes the presence of the element or step not arranged in the claims.It is positioned at unit Word "a" or "an" before part does not excludes the presence of multiple such element.The present invention can borrow Help include the hardware of some different elements and realize by means of properly programmed computer.At row If having lifted in the unit claim of equipment for drying, several in these devices can be by same firmly Part item specifically embodies.Word first, second and third use do not indicate that any order.Can It is title by these word explanations.

Claims (10)

1. webpage category knowledge base based on page framework sets up a system, including:
Sample page framework ID computing module, is suitable to extract the page framework removing sample web page text, Calculate page framework ID of sample web page;
Pattern accumulation module, is suitable to the page framework quantity of accumulative identical ID when reaching threshold value, calculates sample The page framework pattern of webpage;
Knowledge base sets up module, is adapted to set up classification and the mapping of described page framework pattern of sample web page Relation is to generate webpage category knowledge base.
2. webpage category knowledge base as claimed in claim 1 set up system, it is characterised in that described Knowledge base is set up module and is farther included:
Weight setting module, is suitable to the classification according to different sample web page, for the page framework mould of the category Each web page characteristics in formula gives and presets weight;
Mapping table sets up module, be adapted to set up the classification of sample web page and each web page characteristics of the category and The relation mapping table of weight, to generate webpage category knowledge base.
3. webpage category knowledge base as claimed in claim 1 or 2 set up system, it is characterised in that Page framework ID computing module farther includes: page framework abstraction module, is suitable to according to sample web page source Html linguistic labels in code extracts the page framework of described sample web page.
4. the webpage category knowledge base as according to any one of claim 1~2 set up system, its feature Being, page framework ID computing module farther includes: page framework abstraction module, is suitable to know by punctuate Do not go out the text of sample web page, remove text to obtain the page framework of described sample web page.
5. the webpage category knowledge base as according to any one of claim 1~2 set up system, its feature Being, described pattern accumulation module farther includes:
List page identification module undetermined, is suitable to determine whether be positioned at page fixed position block and stablize There is the link of certain time, if having, then setting described sample web page as list page undetermined;
List page framework mode determines module, is suitable to dispatch at set intervals the most described list undetermined Page, if it is new url that described link is constantly updated, just sets the page framework pattern of described sample web page For list page framework mode.
6. a webpage category knowledge base method for building up based on page framework, comprises the following steps:
The page framework of sample drawn webpage, calculates page framework ID having removed sample web page text;
When the page framework quantity of accumulative identical ID reaches threshold value, calculate the page framework pattern of sample web page;
Set up classification and the mapping relations of described page framework pattern of sample web page, to generate webpage classification Knowledge base.
7. the method for building up of webpage category knowledge base as claimed in claim 6, it is characterised in that described Set up the classification of sample web page and the mapping relations of described page framework pattern to generate webpage category knowledge base Specifically include:
According to the classification of different sample web page, for each web page characteristics in the page framework pattern of the category Imparting presets weight;
Set up the classification of sample web page and each web page characteristics of the category and the relation mapping table of weight, with Generate webpage category knowledge base.
The method for building up of webpage category knowledge base the most as claimed in claims 6 or 7, it is characterised in that The mode of the page framework extracting described sample web page is: according to the html language in sample web page source code Label extracts the page framework of described sample web page.
9. the method for building up of the webpage category knowledge base as according to any one of claim 6~7, its feature Being, the mode of the page framework extracting described sample web page is: just identifying sample web page by punctuate Literary composition, removes text to obtain the page framework of described sample web page.
10. the method for building up of the webpage category knowledge base as according to any one of claim 6~7, it is special Levying and be, the mode that list page framework mode calculates is:
Determine whether to be positioned at page fixed position block and the link of stable existence certain time, if having, Then set described sample web page as list page undetermined;
The most described list page undetermined is dispatched, if it is new that described link is constantly updated at set intervals Link, is just set to list page framework mode by the page framework pattern of described sample web page.
CN201210376381.4A 2012-09-29 2012-09-29 Webpage category knowledge base set up system and method Active CN102890717B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210376381.4A CN102890717B (en) 2012-09-29 2012-09-29 Webpage category knowledge base set up system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210376381.4A CN102890717B (en) 2012-09-29 2012-09-29 Webpage category knowledge base set up system and method

Publications (2)

Publication Number Publication Date
CN102890717A CN102890717A (en) 2013-01-23
CN102890717B true CN102890717B (en) 2016-09-28

Family

ID=47534219

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210376381.4A Active CN102890717B (en) 2012-09-29 2012-09-29 Webpage category knowledge base set up system and method

Country Status (1)

Country Link
CN (1) CN102890717B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102902793B (en) * 2012-09-29 2016-12-21 北京奇虎科技有限公司 Webpage category knowledge base set up system and method
CN103336786B (en) * 2013-06-05 2017-05-24 腾讯科技(深圳)有限公司 Data processing method and device
CN111914201B (en) * 2020-08-07 2023-11-07 腾讯科技(深圳)有限公司 Processing method and device of network page
CN114706793A (en) * 2022-05-16 2022-07-05 北京百度网讯科技有限公司 Webpage testing method and device, electronic equipment and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101251855A (en) * 2008-03-27 2008-08-27 腾讯科技(深圳)有限公司 Equipment, system and method for cleaning internet web page
CN102411587A (en) * 2010-09-21 2012-04-11 腾讯科技(深圳)有限公司 Webpage classification method and device
CN102902793A (en) * 2012-09-29 2013-01-30 北京奇虎科技有限公司 Creation system and method of webpage classification knowledge base

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102298614B (en) * 2011-07-29 2015-04-22 百度在线网络技术(北京)有限公司 Method for determining collection category of page collection information and device and equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101251855A (en) * 2008-03-27 2008-08-27 腾讯科技(深圳)有限公司 Equipment, system and method for cleaning internet web page
CN102411587A (en) * 2010-09-21 2012-04-11 腾讯科技(深圳)有限公司 Webpage classification method and device
CN102902793A (en) * 2012-09-29 2013-01-30 北京奇虎科技有限公司 Creation system and method of webpage classification knowledge base

Also Published As

Publication number Publication date
CN102890717A (en) 2013-01-23

Similar Documents

Publication Publication Date Title
CN102902794B (en) Web page classification system and method
CN109145216A (en) Network public-opinion monitoring method, device and storage medium
CN103902889A (en) Malicious message cloud detection method and server
US10872270B2 (en) Exploit kit detection system based on the neural network using image
CN110991171B (en) Sensitive word detection method and device
CN102902790B (en) Web page classification system and method
US20190179886A1 (en) Detecting compatible layouts for content-based native ads
RU2014146751A (en) METHOD AND DEVICE FOR PAGE DISPLAY
CN102298614A (en) Method for determining collection category of page collection information and device and equipment
CN102890717B (en) Webpage category knowledge base set up system and method
CN110457579B (en) Webpage denoising method and system based on cooperative work of template and classifier
CN103309862A (en) Webpage type recognition method and system
CN108475275A (en) Identify video page
CN105183843B (en) list page identification system and method
CN111461767B (en) Deep learning-based Android deceptive advertisement detection method, device and equipment
CN106992967A (en) Malicious websites recognition methods and system
CN106095674B (en) A kind of website automation test method and device
CN102929948B (en) list page identification system and method
CN113918794B (en) Enterprise network public opinion benefit analysis method, system, electronic equipment and storage medium
CN102902793B (en) Webpage category knowledge base set up system and method
CN112612990A (en) Webpage analysis method, system and computer readable storage medium
CN112650423A (en) Webpage display method, system and medium
CN102902791B (en) Web page classification storage system and method
CN103870275B (en) Information processing method and device
CN113806667B (en) Method and system for supporting webpage classification

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220711

Address after: Room 801, 8th floor, No. 104, floors 1-19, building 2, yard 6, Jiuxianqiao Road, Chaoyang District, Beijing 100015

Patentee after: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

Address before: 100088 room 112, block D, 28 new street, new street, Xicheng District, Beijing (Desheng Park)

Patentee before: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

Patentee before: Qizhi software (Beijing) Co., Ltd

TR01 Transfer of patent right