CN103714093A - Method and device for mining key pages of website - Google Patents

Method and device for mining key pages of website Download PDF

Info

Publication number
CN103714093A
CN103714093A CN201210380363.3A CN201210380363A CN103714093A CN 103714093 A CN103714093 A CN 103714093A CN 201210380363 A CN201210380363 A CN 201210380363A CN 103714093 A CN103714093 A CN 103714093A
Authority
CN
China
Prior art keywords
link
emphasis
page
website
pair
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201210380363.3A
Other languages
Chinese (zh)
Other versions
CN103714093B (en
Inventor
张冲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201210380363.3A priority Critical patent/CN103714093B/en
Publication of CN103714093A publication Critical patent/CN103714093A/en
Application granted granted Critical
Publication of CN103714093B publication Critical patent/CN103714093B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a device for mining key pages of a website. The method includes: extracting a navigation link list from each page of the website, splitting each extracted navigation link list into link pairs, wherein each link pair is composed of two links at adjacent positions in the navigation link list; determining key link pairs from all the link pairs, and taking the pages corresponding to the key link pairs as the key pages in the website. By the means, recall rate and accuracy rate when the key pages of the website are mined can be increased.

Description

Method for digging and the device of a kind of website emphasis page
[technical field]
The present invention relates to data mining treatment technology, particularly method for digging and the device of a kind of website emphasis page.
[background technology]
Webpage authority is the important references factor that search engine sorts to result.While calculating webpage authority, using all webpages that participate in calculating as a set, and by the authority of the linking relationship iterative computation webpage between the interior webpage of set.But the development along with internet, webpage on internet is more and more, if the webpage that all webpages on internet are all calculated as participation authority, to the framework of computing system, require very high, therefore conventionally only selecting each website and external website exists the webpage of linking relationship as participating in the authoritative webpage calculating, but this mode of prior art, can cause some outstanding webpages of each inside, website cannot obtain authoritative value, in addition also can affect, the accuracy of the authority value that the webpage that participate in to calculate obtains.
In order to improve the problems referred to above, prior art has a kind of way, be in website, to have the webpage of linking relationship with external website, and some the important webpages in website extracts together, as participating in the authoritative webpage calculating of webpage.In the prior art, by anti-chain quantity in the station of webpage in website, to determine the importance of webpage, the webpage that for example anti-chain quantity in station in website is greater than to setting threshold extracts, if anti-chain quantity is also greater than setting threshold in the station of these webpages webpage pointed, using these webpages and indication webpage as emphasis webpage.But the method for this prior art, recall rate is lower, and accuracy is also poor.
[summary of the invention]
Technical matters to be solved by this invention is to provide method for digging and the device of a kind of website emphasis page, recall rate and accuracy rate when the website emphasis page is excavated to improve.
The present invention is the method for digging that technical scheme that technical solution problem adopts is to provide a kind of website emphasis page, comprising: from each webpage of website, extract navigation link string respectively; Respectively each navigation link string extracting is split as to link right, wherein each link forms two links of adjacent position in this navigation link string; From each link pair, determine that emphasis link is right, and described emphasis is linked to the corresponding page to the emphasis page as described website.
The preferred embodiment one of according to the present invention, determines that from each link pair emphasis links right step and comprises: statistics respectively links right occurrence number respectively, and the link that occurrence number is met to prerequisite is to right as emphasis link.
The preferred embodiment one of according to the present invention, described prerequisite comprises: occurrence number is greater than setting value; Or it is right that the rank of occurrence number surpasses the link of each link pair preset proportion.
The preferred embodiment one of according to the present invention, from each link pair, determine that emphasis links right step and comprises: utilize respectively the disaggregated model that training in advance is good to link classifying each, and the link that is divided into important class is right to linking as emphasis, the characteristic of division parameter in wherein said disaggregated model comprises the occurrence number that link is right.
The preferred embodiment one of according to the present invention, the characteristic of division parameter in described disaggregated model further comprise following at least one: link pair points to the out-degree of the corresponding page of link, link pair is pointed to the degree of depth of link, the degree of depth that link pair is referred to link, link pair points to the degree of depth and the difference between the degree of depth that is referred to link of link, links corresponding anchor text word quantity.
The preferred embodiment one of according to the present invention, described method further comprises: the webpage that calculates the described emphasis page is authoritative, wherein said webpage authority is search engine using the described emphasis page when Search Results returns, the foundation that the described emphasis page is sorted.
The present invention also provides the excavating gear of a kind of website emphasis page, comprising: excavate unit, for each webpage from website respectively, extract navigation link string; Split cells, right for respectively each navigation link string extracting being split as to link, wherein each link forms two links of adjacent position in this navigation link string; Determining unit, right for determine emphasis link from each link pair, and described emphasis is linked to the corresponding page to the emphasis page as described website.
The preferred embodiment one of according to the present invention, described determining unit comprises: statistic unit, for statistics respectively, respectively link right occurrence number, and the link that occurrence number is met to prerequisite is to right as emphasis link.
The preferred embodiment one of according to the present invention, described prerequisite comprises: occurrence number is greater than setting value; Or it is right that the rank of occurrence number surpasses the link of each link pair preset proportion.
The preferred embodiment one of according to the present invention, described determining unit comprises: taxon, for utilizing respectively the disaggregated model that training in advance is good to link classifying each, and the link that is divided into important class is right to linking as emphasis, the characteristic of division parameter in wherein said disaggregated model comprises the occurrence number that link is right.
The preferred embodiment one of according to the present invention, the characteristic of division parameter in described disaggregated model further comprise following at least one: link pair points to the out-degree of the corresponding page of link, link pair is pointed to the degree of depth of link, the degree of depth that link pair is referred to link, link pair points to the degree of depth and the difference between the degree of depth that is referred to link of link, links corresponding anchor text word quantity.
The preferred embodiment one of according to the present invention, described device further comprises: computing unit, authoritative for calculating the webpage of the described emphasis page, wherein said webpage authority is search engine using the described emphasis page when searching structure returns, the foundation that the described emphasis page is sorted.
As can be seen from the above technical solutions, during the emphasis page of the present invention in determining website, do not rely on anti-chain quantity in the station of webpage, but the navigation link string of webpage in website is analyzed.Experimental data shows, after adopting method of the present invention to excavate to each the large website on internet, the emphasis page quantity of recalling compared with prior art, increased by 2,000 ten thousand, and the emphasis page of recalling belongs to the catalogue page of website mostly, that is to say that webpage that the inventive method is recalled can reflect the importance of webpage well, the accuracy rate of the inventive method is higher.
[accompanying drawing explanation]
Fig. 1 is the schematic flow sheet of the method for digging of the website emphasis page in the present invention;
Fig. 2 is the schematic diagram of navigation link string in the present invention;
Fig. 3 is the schematic diagram of webpage source file in the present invention;
Fig. 4 is the structural representation block diagram of the embodiment mono-of the excavating gear of the website emphasis page in the present invention;
Fig. 5 is the structural representation block diagram of the embodiment bis-of the excavating gear of the website emphasis page in the present invention;
Fig. 6 is the structural representation block diagram of the embodiment of model training apparatus in the present invention;
Fig. 7 is the structural representation block diagram of the embodiment tri-of the excavating gear of the website emphasis page in the present invention.
[embodiment]
In order to make the object, technical solutions and advantages of the present invention clearer, below in conjunction with the drawings and specific embodiments, describe the present invention.
Please refer to Fig. 1, Fig. 1 is the schematic flow sheet of the method for digging of the website emphasis page in the present invention.As shown in Figure 1, the method comprises:
Step S101: extract navigation link string respectively from each webpage of website.
Step S102: respectively each navigation link string extracting is split as to link right.
Step S103: determine that from each link pair emphasis link is right, and emphasis is linked to the corresponding page to the emphasis page as website.
Below above-mentioned steps is specifically described.
Please refer to Fig. 2, Fig. 2 is the schematic diagram of navigation link string in the present invention.As shown in Figure 2, navigation link string is the link string that webpage top is coupled together by " > " symbol.
Please refer to Fig. 3, Fig. 3 is the schematic diagram of webpage source file in the present invention.As shown in Figure 3, in the present invention, pass through " > " symbol, can locate several adjacent super chain labels from webpage source file, in step S101, the chained address in these super chain labels be extracted, just obtain the navigation link string of a webpage.
Link in step S102 is right, is that two links of adjacent position form in navigation link string.If shape is as the navigation link string of " A->B->C->D ", can extract " A->B ", " B->C ", " C->D " three links are right.
After the navigation link string of each webpage in website is split, obtain a link pair set, in this set, can comprise the element of repetition, for example " A->B " if this link in the navigation link string of a plurality of pages, occur, will become the repeat element of link in pair set.
As a kind of embodiment, in step S103, from each link pair, determine that emphasis links right mode and comprises:
Statistics respectively links right occurrence number respectively, and the link that occurrence number is met to prerequisite is to right as emphasis link.
Linking right occurrence number, is the occurrence number of element in above-mentioned set, links the number of times occurring in each navigation link string.Add up respectively the occurrence number of each element in above-mentioned set, just can determine that emphasis link is right according to the right occurrence number of link, if the link that occurrence number is met to prerequisite is to right as emphasis link.
Above-mentioned prerequisite comprises: occurrence number is greater than setting value; Or it is right that the rank of occurrence number surpasses the link of each link pair preset proportion.
For example occurrence number is greater than to 100 link to right as emphasis link, or link is right adds up to 600 when each, when preset proportion is 70%, due to 600*70%=420, that occurrence number rank is right as emphasis link to 70% link of each link pair (surpass to) the link of first 180.
As preferred embodiment, in step S103, from each link pair, determine that emphasis links right mode and comprises:
Utilize respectively the disaggregated model that training in advance is good to link classifying each, and the link that is divided into important class is right to linking as emphasis.
Characteristic of division parameter in the above-mentioned disaggregated model training comprises the occurrence number that link is right.In addition, the characteristic of division parameter of the above-mentioned disaggregated model training can further include following at least one: the sensing of link pair link the out-degree of the corresponding page, link pair the degree of depth of sensing link, the degree of depth that is referred to link of link pair, the degree of depth and the difference between the degree of depth that is referred to link of the sensing link of link pair, link corresponding anchor text word quantity.
First introduce a kind of embodiment of training in advance disaggregated model below, in the present invention, both can obtain by this way the disaggregated model training, also can obtain model that a third party trains as the disaggregated model in the present invention, as long as the characteristic of division parameter in this model meets above-mentioned restriction.
The method of train classification models comprises:
S1: obtain the link that marked to sample, sample wherein comprises positive sample and negative sample, positive sample is exactly to be labeled as the sample that important link is right, and negative sample is exactly to be labeled as the sample that non-important link is right.
S2: the characteristic of division that extracts each sample, and, utilization has the sample of characteristic of division to be trained characteristic of division parameter corresponding in disaggregated model, with characteristic of division parameter area and the right characteristic of division parameter area of non-important link of determining that important link is right.
After training finishes, the characteristic of division parameter of disaggregated model has just possessed the right ability of the important link of description.
The disaggregated model training in utilization links in the step of classifying each in step S103, first extract the right characteristic of division of link to be sorted, then the characteristic of division of extraction and the characteristic of division parameter in the disaggregated model training are compared, if the characteristic of division extracting falls into the characteristic of division parameter area that important link is right, link to be sorted is to being just divided into important class, otherwise link to be sorted is to being just divided into non-important class.
Above-mentioned each characteristic of division parameter is carried out to a detailed explanation below.
The implication linking in a upper embodiment of right occurrence number and step S103 is identical, links the number of times occurring in each navigation link string to obtaining at step S101.
The link of shape as " A->B " is right, and link A is the sensing link of this link pair, and link B is the link that referred to of this link pair.In the present invention, the sensing of link pair links the out-degree of the corresponding page, the sensing that refers to link pair links the sum of corresponding page all-links that comprise, that point to other pages, as above, " A->B " this link is right, suppose to comprise three links of pointing to other pages on the corresponding page of link A, linking the out-degree of the corresponding page of sensing link A in " A->B " is exactly 3.
In the present invention, the degree of depth of the sensing of link pair link, refers to from the homepage of website and arrives the minimum number of hops of pointing to the corresponding page of link.For example the homepage of website is F, pointing to the corresponding page of link is X, linking relationship " F->T1->T2->X " represents that homepage F has the link of pointing to page T1, page T1 has the link of pointing to page T2, page T2 has the link of pointing to page X, the number of hops from homepage F to page X is 3, if this number of hops is to arrive the minimum number of hops of page X from homepage F, the degree of depth of the corresponding sensing link of page X is exactly 3.
In like manner, in the present invention, the degree of depth that is referred to link of link pair, refers to from the homepage of website and arrives the minimum number of hops that is referred to link the corresponding page.
Suppose that link is in " A->B ", the degree of depth of pointing to link A is 3, and the degree of depth that is referred to link B is 1, and the degree of depth of pointing to link is exactly 3-1=2 with the difference between the degree of depth that is referred to link.
Link, to corresponding anchor text word quantity, refers to the sum of two links anchor text word that the anchor text of correspondence obtains after cutting word respectively of link pair.As shape as " maintenance computer-> software fault" such link is right, anchor text has " maintenance computer " and " software fault ", obtain " computer ", " maintenance ", " software ", " fault " after cutting word, so this link is exactly 4 to corresponding anchor text word quantity.
In the present invention, utilize the various machine learning methods of prior art, as SVM(support vector machine support vector machine), can realize disaggregated model training and utilize the disaggregated model training to linking the step of classifying, do not repeat them here.
After execution of step S103, the present invention has just determined the emphasis page in website.Further, the present invention also comprises that the webpage of the calculation stress page is authoritative, and wherein webpage authority is search engine using the emphasis page of website when Search Results returns, the foundation that the emphasis page is sorted.The webpage that calculates the page is authoritative, the existing multiple well known practice in this area, and the patent document that is 6285999 as U.S.'s patent of invention number discloses a kind of authoritative method of calculating webpage.
In addition, the definite emphasis page of the present invention also can be used for generating the skeleton of website.Emphasis page linking relationship each other, can reflect the webpage distribution situation of a website, utilizes emphasis page linking relationship each other to generate the skeleton of website, just can classify to the type of website and webpage.Conventionally the skeleton of website forms the structure of a tree type, and the website and webpage in same branch can be classified as a class.
Please refer to Fig. 4, Fig. 4 is the structural representation block diagram of the embodiment mono-of the excavating gear of the website emphasis page in the present invention.As shown in Figure 4, this device comprises: excavate unit 201, split cells 202 and determining unit 203.
Wherein excavate unit 201, for each webpage from website respectively, extract navigation link string.Split cells 202, right for respectively each navigation link string extracting being split as to link.Determining unit 203, right for determine emphasis link from each link pair, and emphasis is linked to the corresponding page to the emphasis page as website.
Please refer to Fig. 2, the schematic diagram of navigation link string in Fig. 2 the present invention.As shown in Figure 2, navigation link string is the link string that webpage top is coupled together by " > " symbol.
Please refer to Fig. 3, Fig. 3 is the schematic diagram of webpage source file in the present invention.As shown in Figure 3, in the present invention, pass through " > " symbol, can locate several adjacent super chain labels from webpage source file, excavate unit 201 chained address in these super chain labels is extracted, just obtain the navigation link string of a webpage.
Link in the present invention is right, is that two links of adjacent position form in navigation link string.If shape is as the navigation link string of " A->B->C->D ", can extract " A->B ", " B->C ", " C->D " three links are right.
Split cells 202 is after splitting the navigation link string of each webpage in website, obtain a link pair set, in this set, can comprise the element of repetition, for example " A->B " if this link in the navigation link string of a plurality of pages, occur, through split cells 202, process, will become the repeat element in link pair set.
In the present embodiment, determining unit 203 comprises statistic unit 2031, and wherein statistic unit 2031, and for adding up and respectively link right occurrence number respectively, and the link that occurrence number is met to prerequisite is to right as emphasis link.
Wherein, prerequisite comprises: occurrence number is greater than setting value; Or it is right that the rank of occurrence number surpasses the link of each link pair preset proportion.
For example statistic unit 2031 is greater than 100 link to right as emphasis link using occurrence number, or link is right adds up to 600 when each, when preset proportion is 70%, due to 600*70%=420, that occurrence number rank is right as emphasis link to 70% link of each link pair (surpass to) the link of first 180.
Please refer to Fig. 5, Fig. 5 is the structural representation block diagram of the embodiment bis-of the excavating gear of the website emphasis page in the present invention.As shown in Figure 5, the difference of this embodiment and embodiment mono-is, determining unit 203 comprises taxon 2032, for utilizing respectively 204 pairs of the disaggregated models that training in advance is good respectively to link classifying, and the link that is divided into important class is right to linking as emphasis.
Characteristic of division parameter in the above-mentioned disaggregated model training 204 comprises the occurrence number that link is right.In addition, characteristic of division parameter can further include following at least one: the sensing of link pair link the out-degree of the corresponding page, link pair the degree of depth of sensing link, the degree of depth that is referred to link of link pair, the degree of depth and the difference between the degree of depth that is referred to link of the sensing link of link pair, link corresponding anchor text word quantity.
The disaggregated model training in the present invention can be both a model that third party trains, and can be also the model being obtained by model training apparatus in advance.Please refer to Fig. 6, Fig. 6 is the structural representation block diagram of the embodiment of model training apparatus in the present invention.
As shown in Figure 6, model training apparatus 301 comprises sample acquisition unit 3011 and training unit 3012, and wherein sample acquisition unit 3011, for obtaining the link that marked to sample.Training unit 3012, for extracting the characteristic of division of each sample, and, utilize the sample with characteristic of division to train characteristic of division parameter corresponding in disaggregated model, with characteristic of division parameter area and the right characteristic of division parameter area of non-important link of determining that important link is right.
Please refer to Fig. 7, Fig. 7 is the structural representation block diagram of the embodiment tri-of the excavating gear of the website emphasis page in the present invention.As shown in Figure 7, this embodiment further comprises computing unit 205, and wherein computing unit 205, authoritative for the webpage of the calculation stress page, and wherein webpage authority is search engine while returning to the emphasis page to user, the foundation that the emphasis page is sorted.The embodiment of computing unit 205 can be 6285999 patent document with reference to U.S.'s patent of invention number, does not repeat them here.
The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of making, be equal to replacement, improvement etc., within all should being included in the scope of protection of the invention.

Claims (12)

1. a method for digging for the website emphasis page, comprising:
From each webpage of website, extract navigation link string respectively;
Respectively each navigation link string extracting is split as to link right, wherein each link forms two links of adjacent position in this navigation link string;
From each link pair, determine that emphasis link is right, and described emphasis is linked to the corresponding page to the emphasis page as described website.
2. method according to claim 1, is characterized in that, from each link pair, determines that emphasis links right step and comprises:
Statistics respectively links right occurrence number respectively, and the link that occurrence number is met to prerequisite is to right as emphasis link.
3. method according to claim 2, is characterized in that, described prerequisite comprises:
Occurrence number is greater than setting value; Or it is right that the rank of occurrence number surpasses the link of each link pair preset proportion.
4. method according to claim 1, is characterized in that, from each link pair, determines that emphasis links right step and comprises:
Utilize respectively the disaggregated model that training in advance is good to link classifying each, and the link that is divided into important class is right to linking as emphasis, and the characteristic of division parameter in wherein said disaggregated model comprises the occurrence number that link is right.
5. method according to claim 4, is characterized in that, the characteristic of division parameter in described disaggregated model further comprise following at least one:
Link pair points to the out-degree of the corresponding page of link, link pair is pointed to the degree of depth of link, the degree of depth that link pair is referred to link, link pair points to the degree of depth and the difference between the degree of depth that is referred to link of link, links corresponding anchor text word quantity.
6. method according to claim 1, is characterized in that, described method further comprises:
The webpage that calculates the described emphasis page is authoritative, and wherein said webpage authority is search engine using the described emphasis page when Search Results returns, the foundation that the described emphasis page is sorted.
7. an excavating gear for the website emphasis page, comprising:
Excavate unit, for each webpage from website respectively, extract navigation link string;
Split cells, right for respectively each navigation link string extracting being split as to link, wherein each link forms two links of adjacent position in this navigation link string;
Determining unit, right for determine emphasis link from each link pair, and described emphasis is linked to the corresponding page to the emphasis page as described website.
8. device according to claim 7, is characterized in that, described determining unit comprises:
Statistic unit, for adding up and respectively link right occurrence number respectively, and the link that occurrence number is met to prerequisite is to right as emphasis link.
9. device according to claim 8, is characterized in that, described prerequisite comprises:
Occurrence number is greater than setting value; Or it is right that the rank of occurrence number surpasses the link of each link pair preset proportion.
10. device according to claim 7, is characterized in that, described determining unit comprises:
Taxon, for utilizing respectively the disaggregated model that training in advance is good to link classifying each, and the link that is divided into important class is right to linking as emphasis, the characteristic of division parameter in wherein said disaggregated model comprises the occurrence number that link is right.
11. devices according to claim 10, is characterized in that, the characteristic of division parameter in described disaggregated model further comprise following at least one:
Link pair points to the out-degree of the corresponding page of link, link pair is pointed to the degree of depth of link, the degree of depth that link pair is referred to link, link pair points to the degree of depth and the difference between the degree of depth that is referred to link of link, links corresponding anchor text word quantity.
12. devices according to claim 7, is characterized in that, described device further comprises:
Computing unit, authoritative for calculating the webpage of the described emphasis page, wherein said webpage authority is search engine using the described emphasis page when searching structure returns, the foundation that the described emphasis page is sorted.
CN201210380363.3A 2012-09-29 2012-09-29 A kind of method for digging and device of the website emphasis page Active CN103714093B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210380363.3A CN103714093B (en) 2012-09-29 2012-09-29 A kind of method for digging and device of the website emphasis page

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210380363.3A CN103714093B (en) 2012-09-29 2012-09-29 A kind of method for digging and device of the website emphasis page

Publications (2)

Publication Number Publication Date
CN103714093A true CN103714093A (en) 2014-04-09
CN103714093B CN103714093B (en) 2018-10-16

Family

ID=50407078

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210380363.3A Active CN103714093B (en) 2012-09-29 2012-09-29 A kind of method for digging and device of the website emphasis page

Country Status (1)

Country Link
CN (1) CN103714093B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103914550A (en) * 2014-04-11 2014-07-09 百度在线网络技术(北京)有限公司 Recommended content displaying method and recommended content displaying device
CN105243091A (en) * 2015-09-11 2016-01-13 晶赞广告(上海)有限公司 Hyperlink analysis based page semantic information extraction method and system
CN105608133A (en) * 2015-12-16 2016-05-25 北京神州绿盟信息安全科技股份有限公司 Key page determination method and device
CN106095979A (en) * 2016-06-20 2016-11-09 百度在线网络技术(北京)有限公司 URL merging treatment method and apparatus
CN106649337A (en) * 2015-10-30 2017-05-10 北京国双科技有限公司 Method and device for identifying webpage column

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101601036A (en) * 2006-12-29 2009-12-09 诺基亚公司 Navigation spots on the Web page
CN102043805A (en) * 2009-10-19 2011-05-04 阿里巴巴集团控股有限公司 Method and device for generating Internet navigation page
CN102439586A (en) * 2009-04-14 2012-05-02 自由科学有限公司 Document navigation method
CN102663091A (en) * 2012-04-11 2012-09-12 广东华大集成技术有限责任公司 WEB application navigation management method and system thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101601036A (en) * 2006-12-29 2009-12-09 诺基亚公司 Navigation spots on the Web page
CN102439586A (en) * 2009-04-14 2012-05-02 自由科学有限公司 Document navigation method
CN102043805A (en) * 2009-10-19 2011-05-04 阿里巴巴集团控股有限公司 Method and device for generating Internet navigation page
CN102663091A (en) * 2012-04-11 2012-09-12 广东华大集成技术有限责任公司 WEB application navigation management method and system thereof

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103914550A (en) * 2014-04-11 2014-07-09 百度在线网络技术(北京)有限公司 Recommended content displaying method and recommended content displaying device
CN103914550B (en) * 2014-04-11 2017-08-18 百度在线网络技术(北京)有限公司 Show the method and apparatus of content recommendation
CN105243091A (en) * 2015-09-11 2016-01-13 晶赞广告(上海)有限公司 Hyperlink analysis based page semantic information extraction method and system
CN105243091B (en) * 2015-09-11 2018-11-13 晶赞广告(上海)有限公司 Page Semantic features extraction method and system based on Hypertext Link
CN106649337A (en) * 2015-10-30 2017-05-10 北京国双科技有限公司 Method and device for identifying webpage column
CN105608133A (en) * 2015-12-16 2016-05-25 北京神州绿盟信息安全科技股份有限公司 Key page determination method and device
CN105608133B (en) * 2015-12-16 2019-07-02 北京神州绿盟信息安全科技股份有限公司 A kind of determination method and device of the key page
CN106095979A (en) * 2016-06-20 2016-11-09 百度在线网络技术(北京)有限公司 URL merging treatment method and apparatus
CN106095979B (en) * 2016-06-20 2020-05-08 百度在线网络技术(北京)有限公司 URL merging processing method and device

Also Published As

Publication number Publication date
CN103714093B (en) 2018-10-16

Similar Documents

Publication Publication Date Title
Koomey et al. A reply to “Historical construction costs of global nuclear power reactors”
CN105183770A (en) Chinese integrated entity linking method based on graph model
CN102722709B (en) Method and device for identifying garbage pictures
CN111309910A (en) Text information mining method and device
CN103714093A (en) Method and device for mining key pages of website
CN105677857B (en) method and device for accurately matching keywords with marketing landing pages
CN103823896A (en) Subject characteristic value algorithm and subject characteristic value algorithm-based project evaluation expert recommendation algorithm
CN105786980A (en) Method and apparatus for combining different examples for describing same entity and equipment
CN103020494B (en) Copyright ownership detecting method using Program code programming mode copyright ownership detecting model
CN109325019A (en) Data correlation relation network establishing method
CN102262663B (en) Method for repairing software defect reports
CN108090223B (en) Openers portrait method based on internet information
CN104794108A (en) Webpage title extraction method and device thereof
CN105975455A (en) information analysis system based on bidirectional recurrent neural network
CN103246603A (en) Automatic distribution method for software bug reports of bug tracking system
CN103970898A (en) Method and device for extracting information based on multistage rule base
CN106339455A (en) Webpage text extracting method based on text tag feature mining
CN104346408A (en) Method and equipment for labeling network user
CN112069818B (en) Triplet prediction model generation method, relation triplet extraction method and relation triplet extraction device
CN107423264A (en) A kind of engineering material borrowing-word extracting method
CN104699614A (en) Software defect component predicting method
CN104573033A (en) Dynamic URL filtering method and device
CN103559202B (en) A kind of webpage content extraction apparatus and method
CN103309851B (en) The rubbish recognition methods of short text and system
CN105119910A (en) Template-based online social network rubbish information real-time detecting method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant