CN102890717B - Webpage category knowledge base set up system and method - Google Patents
Webpage category knowledge base set up system and method Download PDFInfo
- Publication number
- CN102890717B CN102890717B CN201210376381.4A CN201210376381A CN102890717B CN 102890717 B CN102890717 B CN 102890717B CN 201210376381 A CN201210376381 A CN 201210376381A CN 102890717 B CN102890717 B CN 102890717B
- Authority
- CN
- China
- Prior art keywords
- page
- web page
- framework
- knowledge base
- webpage
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000013507 mapping Methods 0.000 claims abstract description 20
- 238000009825 accumulation Methods 0.000 claims abstract description 10
- 239000000284 extract Substances 0.000 claims description 4
- 239000000203 mixture Substances 0.000 claims description 2
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 230000008569 process Effects 0.000 description 9
- 238000000605 extraction Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000004590 computer program Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 238000000205 computational method Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000001035 drying Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000000149 penetrating effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Landscapes
- Information Transfer Between Computers (AREA)
Abstract
The invention discloses a kind of webpage category knowledge base sets up system, relates to Internet technical field, and this system includes: sample page framework ID computing module, is suitable to the page framework of sample drawn webpage, calculates page framework ID of sample web page;Pattern accumulation module, is suitable to the page framework quantity of accumulative identical ID when reaching threshold value, calculates the page framework pattern of sample web page;Knowledge base sets up module, is adapted to set up the classification of sample web page and the mapping relations of described page framework pattern to generate webpage category knowledge base.The invention also discloses the method for building up of a kind of webpage category knowledge base.The setting up system and method and can set up the identification other knowledge base of web page class quickly to identify webpage classification of webpage category knowledge base according to the present invention, thus solve the whole network search and cannot be distinguished by the other problem of web page class, achieve and quickly identify the other beneficial effect of web page class.
Description
Technical field
The present invention relates to Internet technical field, be specifically related to a kind of webpage category knowledge base sets up system
And method.
Background technology
In search technique, it is essentially divided into two big classes.One class is with whole the Internet as object, captures
All webpages (the crawl degree of depth can be limited at present in a website, and typically do not process js(java script),
And simply process the partial dynamic page), and Webpage search webpage being processed and analyzing, the most entirely
Net search.Another kind of is to carry out the vertical search that captures and analyze and process, such as just for certain class page: figure
Sheet search, video search, Blog Search, forum's search, news search etc..Major part is vertically searched
For rope, it is all based on seed (also referred to as list page) at present and processes.The process of vertical search can
It is divided into two parts: the first looks for seed;Its two be from kind of subpage frame discovery the specific product page, i.e.
Then these product pages are processed by the page of different classes of (picture, video, news etc.).
Existing the whole network is searched for, and does not the most consider the demand of vertical search, it is impossible to different product of classifying,
I.e. cannot be distinguished by webpage classification, be only vertical search auxiliary and excavate some useful information.If it is existing
Vertical search, due to Webpage search, both analyzing and processing modes are different.Between system the most independently,
The page that the whole network search is downloaded, analyzed and processed, what vertical search also can be independent is downloaded and at analysis
Reason, it is impossible to share resource, both can not organically integrate the resource making vertical search share the whole network search.
Therefore, foundation can identify that the other knowledge base of web page class is problem demanding prompt solution automatically.
Summary of the invention
In view of the above problems, it is proposed that the present invention is to provide one to overcome the problems referred to above or at least partly
Ground solve the problems referred to above webpage category knowledge base set up system and method.
According to one aspect of the present invention, it is provided that webpage category knowledge base set up system, including:
Sample page framework ID computing module, is suitable to the page framework of sample drawn webpage, calculates sample net
Page framework ID of page;
Pattern accumulation module, is suitable to the page framework quantity of accumulative identical ID when reaching threshold value, calculates sample
The page framework pattern of webpage;
Knowledge base sets up module, is adapted to set up classification and the mapping of described page framework pattern of sample web page
Relation is to generate webpage category knowledge base.
Alternatively, described knowledge base is set up module and is farther included:
Weight setting module, is suitable to the classification according to different sample web page, for the page framework mould of the category
Each web page characteristics in formula gives and presets weight;
Mapping table sets up module, be adapted to set up the classification of sample web page and each web page characteristics of the category and
The relation mapping table of weight, to generate webpage category knowledge base.
Alternatively, page framework ID computing module farther includes: page framework abstraction module, is suitable to root
The page framework of described sample web page is extracted according to the html linguistic labels in sample web page source code.
Alternatively, page framework ID computing module farther includes: page framework abstraction module, be suitable to by
Punctuate identifies the text of sample web page, removes text to obtain the page framework of described sample web page.
Alternatively, described pattern accumulation module farther includes:
List page identification module undetermined, is suitable to determine whether be positioned at page fixed position block and stablize
There is the link of certain time, if having, then setting described sample web page as list page undetermined;
List page framework mode determines module, is suitable to dispatch at set intervals the most described list undetermined
Page, if it is new url that described link is constantly updated, just sets the page framework pattern of described sample web page
For list page framework mode.
According to a further aspect in the invention, it is provided that the method for building up of webpage category knowledge base, including following
Step:
The page framework of sample drawn webpage, calculates page framework ID of sample web page;
When the page framework quantity of accumulative identical ID reaches threshold value, calculate the page framework pattern of sample web page;
Set up classification and the mapping relations of described page framework pattern of sample web page, to generate webpage classification
Knowledge base.
Alternatively, the mapping relations of the described classification setting up sample web page and described page framework pattern are with life
Webpage category knowledge base is become to specifically include:
According to the classification of different sample web page, for each web page characteristics in the page framework pattern of the category
Imparting presets weight;
Set up the classification of sample web page and each web page characteristics of the category and the relation mapping table of weight, with
Generate webpage category knowledge base.
Alternatively, the mode of the page framework extracting described sample web page is: according to sample web page source code
In html linguistic labels extract described sample web page page framework.
Alternatively, the mode of the page framework extracting described sample web page is: identify sample net by punctuate
The text of page, removes text to obtain the page framework of described sample web page.
Alternatively, the mode that list page framework mode calculates is:
Determine whether to be positioned at page fixed position block and the link of stable existence certain time, if having,
Then set described sample web page as list page undetermined;
The most described list page undetermined is dispatched, if it is new that described link is constantly updated at set intervals
Link, is just set to list page framework mode by the page framework pattern of described sample web page.
Webpage category knowledge base according to the present invention set up system and method can set up identification webpage classification
Knowledge base quickly to identify webpage classification, thus solve the whole network search and cannot be distinguished by that web page class is other asks
Topic, achieves and quickly identifies the other beneficial effect of web page class.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the skill of the present invention
Art means, and can being practiced according to the content of description, and in order to allow, the present invention's is above and other
Objects, features and advantages can become apparent, below especially exemplified by the detailed description of the invention of the present invention.
Accompanying drawing explanation
By reading the detailed description of hereafter preferred implementation, various other advantage and benefit for this
Field those of ordinary skill will be clear from understanding.Accompanying drawing is only used for illustrating the purpose of preferred implementation,
And it is not considered as limitation of the present invention.And in whole accompanying drawing, be denoted by the same reference numerals
Identical parts.In the accompanying drawings:
Fig. 1 shows the method for building up flow chart of webpage category knowledge base according to an embodiment of the invention;
Fig. 2 shows the particular flow sheet of step S130 in Fig. 1;
Fig. 3 shows that the system structure of setting up of webpage category knowledge base according to an embodiment of the invention is shown
It is intended to;
Fig. 4 shows that in Fig. 3, knowledge base sets up module concrete structure schematic diagram.
Detailed description of the invention
It is more fully described the exemplary embodiment of the disclosure below with reference to accompanying drawings.Although accompanying drawing shows
The exemplary embodiment of the disclosure, it being understood, however, that may be realized in various forms the disclosure and not
Should be limited by embodiments set forth here.On the contrary, it is provided that these embodiments are able to more thoroughly
Understand the disclosure, and complete for the scope of the present disclosure can be conveyed to those skilled in the art.
The method for building up flow process of the webpage category knowledge base of the present embodiment is as it is shown in figure 1, include:
Step S110, the page framework of sample drawn webpage, and calculate page framework ID of sample web page.
Sample web page is the known other webpage of its web page class chosen in advance.The page framework of sample drawn webpage
Mode is: according to the page framework of the html linguistic labels sample drawn webpage in web page source code, extraction
Time only retain the labelling of html linguistic labels middle frame class, such as: frame, table etc., retain simultaneously id,
Name, class attribute, removes remaining attribute.Web page text can also be identified by punctuate, remove text
To obtain the page framework of sample web page.By attribute in the page according to hash algorithm meter after extraction page framework
Calculate the hash value of page framework, be page framework ID, such as: after extraction page framework, utilize MD5
Or the salted hash Salted method such as FNV calculates the hash value of page framework, will the labelling of frame clsss.As:
Frame, table and id, name, class attribute etc. are calculated by hash algorithm, acquired results value
It is page framework ID of sample web page.Owing to using identical hash function, identical page framework meter
Page framework ID calculated is the most identical.
Step S 120, when the page framework quantity of accumulative identical ID reaches threshold value, calculates sample web page
Page framework pattern.During calculating, part of title, time, text etc. calculate respectively, and computational methods can use
Machine Learning Automata system, such as: use support vector machine (support vector machine, SVM) meter
Calculate page framework pattern.During study, sample web page is converted into source code based on Html language, and extracts
Html linguistic labels key signature, obtains page framework, and this step has been carried out in step s 110.Will
Page framework input SVM learns, and page framework i.e. carries out html linguistic labels key signature
Coupling, the html linguistic labels key signature in the page framework of some identical ID can mate completely,
Therefore, the page framework for identical ID learns after the quantity of above-mentioned threshold value, and SVM just exports accordingly
The page framework pattern of page framework.Page framework is also needed to be done as follows: will before study
Title and title or anchor(anchor point) inner variable content mates;Time will be according to the form meter of time
Calculate;Text to have certain variable ratio and length requirement, so can reject the rubbish contents such as advertisement.
Process to prevent some sample web page from can not get for a long time, it is judged that the most corresponding same
Whether the page framework quantity of the sample web page of I D has reached this threshold value, if not having, then this I D is corresponding
Threshold value with certain increments.Wherein this threshold value is preferably 23.
Step S130, sets up the classification of sample web page and the mapping relations of its page framework pattern, to generate
Webpage category knowledge base.Its concrete generation step is as in figure 2 it is shown, include:
Step S210, according to the classification of different sample web page, each in the page framework pattern of the category
Individual web page characteristics gives and presets weight.
Step S220, sets up the classification of sample web page and each web page characteristics of the category and the relation of weight
Mapping table, to generate webpage category knowledge base.
Wherein, sample class includes: the web page class such as picture, video, blog, forum (bbs) and news
Not.The page framework pattern of the sample web page of each classification has some different web page characteristics, some not
The same characterized page framework pattern of web page characteristics, the webpage of i.e. one kind.Certainly, two
The webpage of different classifications may comprise the web page characteristics that one or more (being not all of) is identical, but
Weight may be different, such as: forum (bbs) and news all include the net of " title, time, text "
Page feature.The webpage category knowledge base concrete form generated by above-mentioned steps is that webpage classification is corresponding
Web page characteristics under page framework pattern and weight mapping table, as shown in table 1 below:
Web page characteristics under the page framework pattern that table 1 webpage classification is corresponding and weight mapping table
Upper table only lists partial information, it is intended that illustrate under the page framework pattern that webpage classification is corresponding
Web page characteristics and weight map mapping relations.Can be seen that from upper table, the page framework mould of news web page
Formula, two web page characteristics therein: comprise news keyword in (1) url, in (2) page-mode
There are title, time, text.Its weight is respectively 50 and 30.Page-mode has title, time, just
Literary composition can also be bbs(forum) web page characteristics of the page framework pattern of webpage, its weight is 20.bbs
Also there is feature: containing bbs or forum in url, its weight is 50.The web page characteristics of list page includes:
Comprising " more " keyword, navigation bar pattern and webpage in url is top-level domain etc., and the weight of setting is divided
It is not: 30,50 and 60.
When using the classification of webpage category knowledge base identification target pages framework mode, according to the difference in table
The weight of classification is that this target pages framework mode is given a mark.Such as, if in url containing bbs or
Forum, then just add 50 points for bbs, if there being news in url, just adds 50 points for news.If
In page-mode, there are title, time, text, just add 30 points for news, it is also possible to add 20 for bbs
Point.If having the information, the most respectively bbs such as floor, reply number to add some marks.And so on.If
The mark pressing news category weight gained after all characteristic matching of target pages framework mode is the highest, then will
This page framework pattern is classified as news category.
For list page, its page frame can be calculated according to the SVM learning method in above-mentioned steps S120
Frame pattern, due to the particularity of the web page characteristics of list page, including: the domain name that webpage is corresponding is one-level territory
Name;Navigation bar pattern;Including " more " keyword etc..Accordingly it is also possible to press in the step s 120
State mode Direct Recognition list page:
Judge whether the domain name that webpage is corresponding is top-level domain, the most then arranging this webpage is list page.
If the domain name that webpage is corresponding is not top-level domain, the most in the following manner recognized list page: determine whether position
In the page in the block of fixed position and the link of stable existence certain time, if having, then set this webpage as
List page undetermined;The most described list page undetermined is dispatched, if described link is continuous at set intervals
It is updated to new url, just the page framework pattern of this webpage is set to list page framework mode, i.e. this webpage
For list page.Such as: the navigation bar of webpage top, and web page frame includes " more " printed words
The link that part is generally all in the page in fixed block, i.e. comprises navigation bar and " more " printed words
Webpage is list page.
The method for building up of the webpage category knowledge base of the present embodiment establishes and can quickly identify that web page class is other
Knowledge base, solves the whole network search and cannot be distinguished by the other problem of web page class, search for for vertical search and the whole network
Integration lay a good foundation.
Present invention also offers the system of setting up 3 of a kind of webpage category knowledge base, concrete knot as it is shown on figure 3,
Including: sample page framework ID computing module 310, pattern accumulation module 320 and knowledge base set up module 330.
Sample page framework ID computing module 310 is suitable to the page framework of sample drawn webpage, calculates sample net
Page framework ID of page.Sample page framework ID computing module 310 farther includes: page framework extraction mould
Block, is suitable to extract the page frame of described sample web page according to the html linguistic labels in sample web page source code
Frame;Apply also for identifying by punctuate the text of sample web page, remove text to obtain described sample web page
Page framework.
Pattern accumulation module 320 is suitable to the page framework quantity of accumulative identical ID when reaching threshold value, calculates sample
The page framework pattern of webpage.Pattern accumulation module farther includes: threshold adjustment, is suitable to judge
Whether the page framework quantity of the sample web page of the most corresponding same ID has reached described threshold
Value, if not having, then by threshold value corresponding for this ID with certain increments.
Pattern accumulation module 320 farther includes: domain name identification module, is suitable to judge the domain name that webpage is corresponding
Whether is top-level domain, the most then arranging this webpage is list page.Pattern accumulation module 320 is the most further
Including list page identification module undetermined, be suitable to determine whether to be positioned at page fixed position block and steady
Surely there is the link of certain time, if having, then setting this webpage as list page undetermined;List page frame mould
Formula determines module, is suitable to dispatch at set intervals the most described list page undetermined, if described link
Constantly updating is new url, just the page framework pattern of described webpage is set to list page framework mode.
Knowledge base is set up module 330 and is adapted to set up the classification of sample web page and reflecting of described page framework pattern
Relation of penetrating is to generate webpage category knowledge base.Knowledge base sets up module 330 concrete structure as shown in Figure 4, enters
One step includes:
Weight setting module 410, is suitable to the classification according to different sample web page, for the page framework of the category
Each web page characteristics in pattern gives and presets weight;
Mapping table sets up module 420, is adapted to set up the classification of sample web page and each web page characteristics of the category
And the relation mapping table of weight, to generate webpage category knowledge base.
Algorithm and display be not solid with any certain computer, virtual system or miscellaneous equipment provided herein
Have relevant.Various general-purpose systems can also be used together with based on teaching in this.As described above,
Construct the structure required by this kind of system to be apparent from.Additionally, the present invention is also not for any specific
Programming language.It is understood that, it is possible to use various programming languages realize the content of invention described herein,
And the description done language-specific above is the preferred forms in order to disclose the present invention.
In description mentioned herein, illustrate a large amount of detail.It is to be appreciated, however, that this
Bright embodiment can be put into practice in the case of not having these details.In some instances, the most in detail
Known method, structure and technology are carefully shown, in order to do not obscure the understanding of this description.
Similarly, it will be appreciated that in order to simplify the disclosure help to understand in each inventive aspect one or
Multiple, above in the description of the exemplary embodiment of the present invention, each feature of the present invention sometimes by
It is grouped into together in single embodiment, figure or descriptions thereof.But, should be by the disclosure
Method is construed to reflect an intention that i.e. the present invention for required protection requires that ratio is in each claim
The more feature of feature being expressly recited.More precisely, as the following claims reflect
Like that, inventive aspect is all features less than single embodiment disclosed above.Therefore, it then follows tool
Claims of body embodiment are thus expressly incorporated in this detailed description of the invention, and the most each right is wanted
Ask itself all as the independent embodiment of the present invention.
Those skilled in the art are appreciated that and can carry out the module in the equipment in embodiment certainly
Change adaptively and they are arranged in one or more equipment different from this embodiment.Permissible
Module in embodiment or unit or assembly are combined into a module or unit or assembly, and in addition may be used
To put them into multiple submodule or subelement or sub-component.Except such feature and/or process or
Outside at least some in unit excludes each other, can use any combination that (this specification is included companion
With claim, summary and accompanying drawing) disclosed in all features and so disclosed any method or
All processes of person's equipment or unit are combined.Unless expressly stated otherwise, this specification (includes companion
With claim, summary and accompanying drawing) disclosed in each feature can by provide identical, equivalent or phase
Replace like the alternative features of purpose.
Although additionally, it will be appreciated by those of skill in the art that embodiments more described herein include other
Some feature included in embodiment rather than further feature, but the combination of the feature of different embodiment
Mean to be within the scope of the present invention and formed different embodiments.Such as, in following right
In claim, one of arbitrarily can mode using in any combination of embodiment required for protection.
The all parts embodiment of the present invention can realize with hardware, or with at one or more processor
The software module of upper operation realizes, or realizes with combinations thereof.Those skilled in the art should manage
Solve, microprocessor or digital signal processor (DSP) can be used in practice to realize according to this
The some or all parts set up in system of the webpage category knowledge base of inventive embodiments some or
Repertoire.The present invention is also implemented as the part for performing method as described herein or complete
The equipment in portion or device program (such as, computer program and computer program).Such reality
The program of the existing present invention can store on a computer-readable medium, or can have one or more
The form of signal.Such signal can be downloaded from internet website and obtain, or on carrier signal
There is provided, or provide with any other form.
The present invention will be described rather than limits the invention to it should be noted above-described embodiment, and
And those skilled in the art can design replacement enforcement without departing from the scope of the appended claims
Example.In the claims, any reference marks that should not will be located between bracket is configured to claim
Restriction.Word " comprises " and does not excludes the presence of the element or step not arranged in the claims.It is positioned at unit
Word "a" or "an" before part does not excludes the presence of multiple such element.The present invention can borrow
Help include the hardware of some different elements and realize by means of properly programmed computer.At row
If having lifted in the unit claim of equipment for drying, several in these devices can be by same firmly
Part item specifically embodies.Word first, second and third use do not indicate that any order.Can
It is title by these word explanations.
Claims (10)
1. webpage category knowledge base based on page framework sets up a system, including:
Sample page framework ID computing module, is suitable to extract the page framework removing sample web page text,
Calculate page framework ID of sample web page;
Pattern accumulation module, is suitable to the page framework quantity of accumulative identical ID when reaching threshold value, calculates sample
The page framework pattern of webpage;
Knowledge base sets up module, is adapted to set up classification and the mapping of described page framework pattern of sample web page
Relation is to generate webpage category knowledge base.
2. webpage category knowledge base as claimed in claim 1 set up system, it is characterised in that described
Knowledge base is set up module and is farther included:
Weight setting module, is suitable to the classification according to different sample web page, for the page framework mould of the category
Each web page characteristics in formula gives and presets weight;
Mapping table sets up module, be adapted to set up the classification of sample web page and each web page characteristics of the category and
The relation mapping table of weight, to generate webpage category knowledge base.
3. webpage category knowledge base as claimed in claim 1 or 2 set up system, it is characterised in that
Page framework ID computing module farther includes: page framework abstraction module, is suitable to according to sample web page source
Html linguistic labels in code extracts the page framework of described sample web page.
4. the webpage category knowledge base as according to any one of claim 1~2 set up system, its feature
Being, page framework ID computing module farther includes: page framework abstraction module, is suitable to know by punctuate
Do not go out the text of sample web page, remove text to obtain the page framework of described sample web page.
5. the webpage category knowledge base as according to any one of claim 1~2 set up system, its feature
Being, described pattern accumulation module farther includes:
List page identification module undetermined, is suitable to determine whether be positioned at page fixed position block and stablize
There is the link of certain time, if having, then setting described sample web page as list page undetermined;
List page framework mode determines module, is suitable to dispatch at set intervals the most described list undetermined
Page, if it is new url that described link is constantly updated, just sets the page framework pattern of described sample web page
For list page framework mode.
6. a webpage category knowledge base method for building up based on page framework, comprises the following steps:
The page framework of sample drawn webpage, calculates page framework ID having removed sample web page text;
When the page framework quantity of accumulative identical ID reaches threshold value, calculate the page framework pattern of sample web page;
Set up classification and the mapping relations of described page framework pattern of sample web page, to generate webpage classification
Knowledge base.
7. the method for building up of webpage category knowledge base as claimed in claim 6, it is characterised in that described
Set up the classification of sample web page and the mapping relations of described page framework pattern to generate webpage category knowledge base
Specifically include:
According to the classification of different sample web page, for each web page characteristics in the page framework pattern of the category
Imparting presets weight;
Set up the classification of sample web page and each web page characteristics of the category and the relation mapping table of weight, with
Generate webpage category knowledge base.
The method for building up of webpage category knowledge base the most as claimed in claims 6 or 7, it is characterised in that
The mode of the page framework extracting described sample web page is: according to the html language in sample web page source code
Label extracts the page framework of described sample web page.
9. the method for building up of the webpage category knowledge base as according to any one of claim 6~7, its feature
Being, the mode of the page framework extracting described sample web page is: just identifying sample web page by punctuate
Literary composition, removes text to obtain the page framework of described sample web page.
10. the method for building up of the webpage category knowledge base as according to any one of claim 6~7, it is special
Levying and be, the mode that list page framework mode calculates is:
Determine whether to be positioned at page fixed position block and the link of stable existence certain time, if having,
Then set described sample web page as list page undetermined;
The most described list page undetermined is dispatched, if it is new that described link is constantly updated at set intervals
Link, is just set to list page framework mode by the page framework pattern of described sample web page.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210376381.4A CN102890717B (en) | 2012-09-29 | 2012-09-29 | Webpage category knowledge base set up system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210376381.4A CN102890717B (en) | 2012-09-29 | 2012-09-29 | Webpage category knowledge base set up system and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102890717A CN102890717A (en) | 2013-01-23 |
CN102890717B true CN102890717B (en) | 2016-09-28 |
Family
ID=47534219
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210376381.4A Active CN102890717B (en) | 2012-09-29 | 2012-09-29 | Webpage category knowledge base set up system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102890717B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102902793B (en) * | 2012-09-29 | 2016-12-21 | 北京奇虎科技有限公司 | Webpage category knowledge base set up system and method |
CN103336786B (en) * | 2013-06-05 | 2017-05-24 | 腾讯科技(深圳)有限公司 | Data processing method and device |
CN111914201B (en) * | 2020-08-07 | 2023-11-07 | 腾讯科技(深圳)有限公司 | Processing method and device of network page |
CN114706793A (en) * | 2022-05-16 | 2022-07-05 | 北京百度网讯科技有限公司 | Webpage testing method and device, electronic equipment and medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101251855A (en) * | 2008-03-27 | 2008-08-27 | 腾讯科技(深圳)有限公司 | Equipment, system and method for cleaning internet web page |
CN102411587A (en) * | 2010-09-21 | 2012-04-11 | 腾讯科技(深圳)有限公司 | Webpage classification method and device |
CN102902793A (en) * | 2012-09-29 | 2013-01-30 | 北京奇虎科技有限公司 | Creation system and method of webpage classification knowledge base |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102298614B (en) * | 2011-07-29 | 2015-04-22 | 百度在线网络技术(北京)有限公司 | Method for determining collection category of page collection information and device and equipment |
-
2012
- 2012-09-29 CN CN201210376381.4A patent/CN102890717B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101251855A (en) * | 2008-03-27 | 2008-08-27 | 腾讯科技(深圳)有限公司 | Equipment, system and method for cleaning internet web page |
CN102411587A (en) * | 2010-09-21 | 2012-04-11 | 腾讯科技(深圳)有限公司 | Webpage classification method and device |
CN102902793A (en) * | 2012-09-29 | 2013-01-30 | 北京奇虎科技有限公司 | Creation system and method of webpage classification knowledge base |
Also Published As
Publication number | Publication date |
---|---|
CN102890717A (en) | 2013-01-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102902794B (en) | Web page classification system and method | |
CN109145216A (en) | Network public-opinion monitoring method, device and storage medium | |
CN103902889A (en) | Malicious message cloud detection method and server | |
US10872270B2 (en) | Exploit kit detection system based on the neural network using image | |
CN110991171B (en) | Sensitive word detection method and device | |
CN102902790B (en) | Web page classification system and method | |
US20190179886A1 (en) | Detecting compatible layouts for content-based native ads | |
RU2014146751A (en) | METHOD AND DEVICE FOR PAGE DISPLAY | |
CN102298614A (en) | Method for determining collection category of page collection information and device and equipment | |
CN102890717B (en) | Webpage category knowledge base set up system and method | |
CN110457579B (en) | Webpage denoising method and system based on cooperative work of template and classifier | |
CN103309862A (en) | Webpage type recognition method and system | |
CN108475275A (en) | Identify video page | |
CN105183843B (en) | list page identification system and method | |
CN111461767B (en) | Deep learning-based Android deceptive advertisement detection method, device and equipment | |
CN106992967A (en) | Malicious websites recognition methods and system | |
CN106095674B (en) | A kind of website automation test method and device | |
CN102929948B (en) | list page identification system and method | |
CN113918794B (en) | Enterprise network public opinion benefit analysis method, system, electronic equipment and storage medium | |
CN102902793B (en) | Webpage category knowledge base set up system and method | |
CN112612990A (en) | Webpage analysis method, system and computer readable storage medium | |
CN112650423A (en) | Webpage display method, system and medium | |
CN102902791B (en) | Web page classification storage system and method | |
CN103870275B (en) | Information processing method and device | |
CN113806667B (en) | Method and system for supporting webpage classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220711 Address after: Room 801, 8th floor, No. 104, floors 1-19, building 2, yard 6, Jiuxianqiao Road, Chaoyang District, Beijing 100015 Patentee after: BEIJING QIHOO TECHNOLOGY Co.,Ltd. Address before: 100088 room 112, block D, 28 new street, new street, Xicheng District, Beijing (Desheng Park) Patentee before: BEIJING QIHOO TECHNOLOGY Co.,Ltd. Patentee before: Qizhi software (Beijing) Co., Ltd |
|
TR01 | Transfer of patent right |