CN105608133A - Key page determination method and device - Google Patents

Key page determination method and device Download PDF

Info

Publication number
CN105608133A
CN105608133A CN201510947063.2A CN201510947063A CN105608133A CN 105608133 A CN105608133 A CN 105608133A CN 201510947063 A CN201510947063 A CN 201510947063A CN 105608133 A CN105608133 A CN 105608133A
Authority
CN
China
Prior art keywords
link
effective
arbitrary
page
degree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510947063.2A
Other languages
Chinese (zh)
Other versions
CN105608133B (en
Inventor
张龙
郭洋洋
李丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nsfocus Technologies Inc
Nsfocus Technologies Group Co Ltd
Original Assignee
NSFOCUS Information Technology Co Ltd
Beijing NSFocus Information Security Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NSFOCUS Information Technology Co Ltd, Beijing NSFocus Information Security Technology Co Ltd filed Critical NSFOCUS Information Technology Co Ltd
Priority to CN201510947063.2A priority Critical patent/CN105608133B/en
Publication of CN105608133A publication Critical patent/CN105608133A/en
Application granted granted Critical
Publication of CN105608133B publication Critical patent/CN105608133B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • G06F16/972Access to data in other repository systems, e.g. legacy data or dynamic Web page generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a key page determination method and device. The method comprises the following steps: in allusion to any site, obtaining all the effective links in the site and the set membership among all the effective links; in allusion to each effective link, determining the criticality relevant parameters of all the effective links according to the obtained set membership, and calculating the criticalities of all the effective links according to the criticality relevant parameters and the weights corresponding to the criticality relevant parameters; and finally taking pages corresponding to at least one effective link, the criticality of which is not less than a set threshold value, as the key pages of the site. Through setting parameters associated with the importance of the links and the weights corresponding to the parameters, direct quantitative indexes are provided for the determination of the key pages, so that the criticality of each page in the site can be automatically and quantitatively calculated, the determination and selection of the key pages are more correct and flexible, the workload of manually configuring the key pages is decreased and the efficiency of determining the key pages is improved.

Description

A kind of definite method and device of the crucial page
Technical field
The present invention relates to internet arena, relate in particular to a kind of definite method and device of the crucial page.
Background technology
For a website, all pages in this website can be divided into according to page type and grade: keyThe page, the process page and results page. Conventionally, homepage, navigation page are the crucial pages of a website, and these key pages of connectingThe page between face, as registration, the registration boot page, is exactly the process page, and results page is the end page of user behavior,As succeed in registration, subscribe to successfully, there is not the page etc. in the expired and Search Results of domain name.
The key page of a website regards to this website, not only has the highest exposure, and is search engineImport the entrance of flow. Thereby, for business such as monitoring, scan service, just need to pay close attention to the visit of the crucial page of websiteAsk response speed, whether be tampered, have or not situations such as hanging horse. Thereby, how to obtain the crucial page of a website, just become thisThe enforcement basis of a little services.
At present, generally determine the crucial page of website based on user's manual configuration; Or according to the link number in the pageDetermine the crucial page, as, what link number was more is exactly the crucial page etc. But, because the crucial page of user's manual configuration needs, although accuracy is higher, there is the poor problem of flexibility ratio in artificial participation, makes user experience not good; And according to pageHow many definite key page faces of face internal chaining number are prone to erroneous judgement, for example, have enumerated a lot of download link or friendship chain for oneThe inessential page connecing, is easy to be mistaken for the crucial page.
Therefore, need a kind of method of the new crucial page of definite website badly, solve the existing crucial page and determine modeExisting flexibility is low, easily occurs the problem of erroneous judgement.
Summary of the invention
The embodiment of the present invention provides a kind of definite method and device of the crucial page, in order to solve existing definite keyThe existing flexibility of mode of the page is low, easily occurs the problem of erroneous judgement.
The embodiment of the present invention provides a kind of definite method of the crucial page, and described method comprises:
For arbitrary website, obtain the father between all effective link and the described all effective links in described websiteSubrelation;
Effectively link for each getting, according to the set membership between the described all effective links that get,Determine described effective link respectively for characterizing the crucial degree relevant parameter of importance degree of described effective link, and according to determiningEach crucial degree relevant parameter and the corresponding weight of each crucial degree relevant parameter, adopt the mode of weighted sum, described in calculatingThe effectively crucial degree of link;
According to the crucial degree of the each effective link calculating, determine that corresponding crucial degree in described website is not less than to establishAt least one of determining threshold value effectively links, and using determine at least one effectively link distinguish the corresponding page as described stationThe crucial page of point.
Alternatively, for arbitrary effective link, the crucial degree relevant parameter of described effective link comprises: link density, chainConnect the degree of depth, percent continuity and average layout coefficient; , effectively link for each getting, described in gettingSet membership between all effective links, determine described effective link respectively for characterizing the importance degree of described effective linkKey degree relevant parameter, and according to each crucial degree relevant parameter and the corresponding weight of each crucial degree relevant parameter determined,Adopt the mode of weighted sum, calculate the crucial degree of described effective link, comprising:
Effectively link for each getting, according to the set membership between the described all effective links that get,Determine link density, the link degree of depth, percent continuity and the average layout coefficient of described effective link, and according to described in determiningThe effectively link density of link, the link degree of depth, percent continuity, average layout coefficient, and the link density of described effective link,The link degree of depth, percent continuity, average layout coefficient are distinguished corresponding weight, and the mode of employing weighted sum has described in calculatingThe crucial degree of effect link.
Alternatively, for arbitrary effective link, determine the link density of described arbitrary effective link by following formula:
D e n s i t y ( i ) = c o u n t ( i ) Σ i = 1 N c o u n t ( i ) ;
Wherein, i is that described arbitrary active chain is connected on the numbering in all effective link of described website; N is described websiteIn total number of all effective links, described i, N are positive integer, and the value of described i is not more than the value of described N;The link density that Density (i) is described arbitrary effective link; Count (i) is connected in described website for described arbitrary active chainThe total degree occurring.
Alternatively, for arbitrary effective link, determine the link degree of depth of described arbitrary effective link by following formula:
D e p t h ( i ) = 1 count i ( / ′ ′ ) + count i ( ? ′ ′ ) ;
Wherein, i is that described arbitrary active chain is connected on the numbering in all effective link of described website; Described i is for just wholeNumber, and the value of described i is not more than total number of all effective links in described website; Counti('/') be described arbitraryThe number of times that effectively in link, separator occurs, counti('? ') be the number of times that in described arbitrary effective link, question mark occurs.
Alternatively, for arbitrary effective link, determine the percent continuity of described arbitrary effective link by following formula:
C o n n e c t i v i t y ( i ) = min ( i n ( i ) , o u t ( i ) ) max ( i n ( i ) , o u t ( i ) ) ;
Wherein, i is that described arbitrary active chain is connected on the numbering in all effective link of described website; Described i is for just wholeNumber, and the value of described i is not more than total number of all effective links in described website; In (i) is described arbitrary active chainConnect by other total degrees that effectively link is quoted in described website, out (i) quotes described website for described arbitrary effective linkIn other effective total degrees of link.
Equally, alternatively, for arbitrary effective link, determine the average cloth of described arbitrary effective link by following formulaOffice's coefficient:
L a y o u t ( i ) = Σ k = 1 M l a y o u t ( i , k ) M ;
Wherein, i is that described arbitrary active chain is connected on the numbering in all effective link of described website, and M is described websiteIn total number of all effective pages of quoting described arbitrary effective link; Described i is positive integer, and the value of described i is notBe greater than total number of all effective links in described website; Described k is positive integer, and its value is not more than the value of described M;Layout (i, k) is described arbitrary effective link i being numbered in all effective page of quoting described arbitrary effective link iLayout coefficient in effective page of k;
Wherein, l a y o u t ( i , k ) = 1 - offset k ( i ) Σ j = 1 T offset k ( j ) ;
Wherein, T is total number of quoting the effective link in the effective page that is numbered k of described arbitrary effective link i;offsetk(i) in the described effective page that is numbered k of quoting described arbitrary effective link i, described arbitrary effective linkI is numbered the locus side-play amount of the desired location of effective page of k with respect to this; Offsetk(j) described in quoting described in beingIn the effective page that is numbered k of arbitrary effective link i, j effectively links the institute that is numbered effective page of k with respect to thisState the locus side-play amount of desired location.
Based on same inventive concept, the embodiment of the present invention provides a kind of definite device of the crucial page, described deviceComprise:
Acquiring unit, for for arbitrary website, obtains all effective link in described website and describedly allly hasSet membership between effect link;
Computing unit, for effectively linking for each getting, according to the described all effective links that get itBetween set membership, determine described effective link respectively for characterizing the relevant ginseng of crucial degree of importance degree of described effective linkNumber, and according to each crucial degree relevant parameter and the corresponding weight of each crucial degree relevant parameter determined, adopt weighted sumMode, calculate the crucial degree of described effective link;
Determining unit, for according to the crucial degree of each effective link of calculating, determines corresponding in described websiteKey degree be not less than setting threshold at least one effectively link, and at least one effectively link of determining is distinguished to correspondenceThe page is as the crucial page of described website.
Alternatively, for arbitrary effective link, the crucial degree relevant parameter of described effective link comprises: link density, chainConnect the degree of depth, percent continuity and average layout coefficient; , described computing unit specifically for,
Effectively link for each getting, according to the set membership between the described all effective links that get,Determine link density, the link degree of depth, percent continuity and the average layout coefficient of described effective link, and according to described in determiningThe effectively link density of link, the link degree of depth, percent continuity, average layout coefficient, and the link density of described effective link,The link degree of depth, percent continuity, average layout coefficient are distinguished corresponding weight, and the mode of employing weighted sum has described in calculatingThe crucial degree of effect link.
Alternatively, described computing unit specifically for, for arbitrary effective link, determine described arbitrary by following formulaThe effectively link density of link:
D e n s i t y ( i ) = c o u n t ( i ) Σ i = 1 N c o u n t ( i ) ;
Wherein, i is that described arbitrary active chain is connected on the numbering in all effective link of described website; N is described websiteIn total number of all effective links, described i, N are positive integer, and the value of described i is not more than the value of described N;The link density that Density (i) is described arbitrary effective link; Count (i) is connected in described website for described arbitrary active chainThe total degree occurring.
Alternatively, described computing unit specifically for, for arbitrary effective link, determine described arbitrary by following formulaThe effectively link degree of depth of link:
D e p t h ( i ) = 1 count i ( / ′ ′ ) + count i ( ? ′ ′ ) ;
Wherein, i is that described arbitrary active chain is connected on the numbering in all effective link of described website; Described i is for just wholeNumber, and the value of described i is not more than total number of all effective links in described website; Counti('/') be described arbitraryThe number of times that effectively in link, separator occurs, counti('? ') be the number of times that in described arbitrary effective link, question mark occurs.
Alternatively, described computing unit specifically for, for arbitrary effective link, determine described arbitrary by following formulaThe effectively percent continuity of link:
C o n n e c t i v i t y ( i ) = m i n ( i n ( i ) , o u t ( i ) ) m a x ( i n ( i ) , o u t ( i ) ) ;
Wherein, i is that described arbitrary active chain is connected on the numbering in all effective link of described website; Described i is for just wholeNumber, and the value of described i is not more than total number of all effective links in described website; In (i) is described arbitrary active chainConnect by other total degrees that effectively link is quoted in described website, out (i) quotes described website for described arbitrary effective linkIn other effective total degrees of link.
Equally, alternatively, described computing unit specifically for, for arbitrary effective link, determine institute by following formulaState the average layout coefficient of arbitrary effective link:
L a y o u t ( i ) = Σ k = 1 M l a y o u t ( i , k ) M ;
Wherein, i is that described arbitrary active chain is connected on the numbering in all effective link of described website, and M is described websiteIn total number of all effective pages of quoting described arbitrary effective link; Described i is positive integer, and the value of described i is notBe greater than total number of all effective links in described website; Described k is positive integer, and its value is not more than the value of described M;Layout (i, k) is described arbitrary effective link i being numbered in all effective page of quoting described arbitrary effective link iLayout coefficient in effective page of k;
Wherein, l a y o u t ( i , k ) = 1 - offset k ( i ) Σ j = 1 T offset k ( j ) ;
Wherein, T is total number of quoting the effective link in the effective page that is numbered k of described arbitrary effective link i;offsetk(i) in the described effective page that is numbered k of quoting described arbitrary effective link i, described arbitrary effective linkI is numbered the locus side-play amount of the desired location of effective page of k with respect to this; Offsetk(j) described in quoting described in beingIn the effective page that is numbered k of arbitrary effective link i, j effectively links the institute that is numbered effective page of k with respect to thisState the locus side-play amount of desired location.
Beneficial effect of the present invention is as follows:
The embodiment of the present invention provides a kind of definite method and device of the crucial page, can, for arbitrary website, obtain instituteState the set membership between all effective link and the described all effective links in website; And, every for what getOne effectively link, according to the set membership between the described all effective links that get, determines each use of described effective linkIn the crucial degree relevant parameter of importance degree that characterizes described effective link, and according to the each crucial degree relevant parameter of determining and eachThe corresponding weight of key degree relevant parameter, the mode of employing weighted sum, calculates the crucial degree of described effective link; Finally,According to the crucial degree of the each effective link calculating, determine that corresponding crucial degree in described website is not less than setting thresholdAt least one effectively links, and using the key of the corresponding page as described website that at least one effectively link is distinguished of determiningThe page. That is to say, the scheme that the present embodiment provides, by arrange the parameter relevant to the importance of page link andThe corresponding weight of each parameter, for the crucial page definite provides direct quantizating index, thus can be automatically, calculate quantitativelyThe crucial degree of each page in website, makes determining and choosing more accurately, flexibly of the crucial page,, has reduced manually meanwhileConfigure the workload of crucial page etc., improved the efficiency of definite crucial page.
Brief description of the drawings
In order to be illustrated more clearly in the technical scheme in the embodiment of the present invention, required making in describing embodiment belowWith accompanying drawing briefly introduce, apparently, the accompanying drawing in the following describes is only some embodiments of the present invention, for thisThe those of ordinary skill in field, is not paying under the prerequisite of creative work, can also obtain other according to these accompanying drawingsAccompanying drawing.
Figure 1 shows that the schematic flow sheet of definite method of the crucial page described in the embodiment of the present invention one;
Figure 2 shows that the structural representation of definite device of the crucial page described in the embodiment of the present invention two.
Detailed description of the invention
In order to make the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing to the present invention do intoOne step ground is described in detail, and obviously, described embodiment is only the present invention's part embodiment, instead of whole enforcementExample. Based on the embodiment in the present invention, those of ordinary skill in the art obtain under creative work prerequisite not makingAll other embodiment, belong to the scope of protection of the invention.
Embodiment mono-:
The embodiment of the present invention one provides a kind of definite method of the crucial page, and particularly, as shown in Figure 1, it is thisThe schematic flow sheet of definite method of the crucial page described in bright embodiment mono-, described method can comprise the following steps:
S101: for arbitrary website, obtain all effective link in described website and described all effective links itBetween set membership;
S102: effectively link for each getting, according to the father and son between the described all effective links that getRelation, determine described effective link respectively for characterizing the crucial degree relevant parameter of importance degree of described effective link, and according toEach crucial degree relevant parameter and the corresponding weight of each crucial degree relevant parameter determined, the mode of employing weighted sum, meterCalculate the crucial degree of described effective link;
S103: according to the crucial degree of the each effective link calculating, determine that corresponding crucial degree in described website is notAt least one that is less than setting threshold effectively links, and using the corresponding page that at least one effectively link is distinguished of determining asThe crucial page of described website.
That is to say, the scheme that the present embodiment provides, by arranging, (or the page is important with the importance of page linkProperty) relevant parameter and the corresponding weight of each parameter, for the crucial page definite provides direct quantizating index, fromAnd can calculate automatically, quantitatively the crucial degree of each page in website, make determining and choosing more accurate, clever of the crucial pageLive, meanwhile, reduced the workload of the crucial page of manual configuration etc., improved the efficiency of definite crucial page.
Below, will the each step described in the embodiment of the present invention be elaborated:
Alternatively, in described step S101, obtain efficiency and accuracy for what improve effective link, for arbitrary stationPoint, can, by using reptile to crawl, obtain all effective link and described all effective links in described websiteBetween set membership. Particularly, for arbitrary website, suitable initial link can be set, as initial URL (UniformResourceLocator, URL), website is crawled, and, crawling in process, also can be to crawlingTo effective link carry out serial number in the mode of self-propagation, to facilitate the set membership recording between each effective link.
Wherein, it should be noted that, for the website that does not need login, described in described suitable initial URL can be conventionallyThe corresponding URL of website homepage; For the website of needs login, described suitable initial URL can be user conventionally at described stationPoint logs in the corresponding URL of first page that rear redirect goes out.
In addition, it should be noted that, effectively the set membership of link refers to the pass of quoting and be cited between effective linkSystem. For example, suppose in the page corresponding with URL1 1, enumerated tri-effectively links of URL2, URL3 and URL4, can thinkURL1 has quoted URL2, URL3 and URL4, and URL1 is the father link of URL2, URL3 and URL4; The page 2 corresponding with URL2In, enumerate URL1 and URL4, can think that URL2 has quoted URL1 and URL4, URL2 is the father link of URL1 and URL4;And wherein, because URL1 and URL2 quote mutually, therefore, they are set membership each other.
Further, after all effective link getting in described website, can be according to embodiment of the present invention instituteThe crucial degree of the page link providing is determined mode, determines the crucial degree of each effective link, performs step described in S102Operation.
Alternatively, for arbitrary effective link, the crucial degree relevant parameter of described effective link comprises: link density, chainConnect the degree of depth, percent continuity and average layout coefficient etc.; Correspondingly, described in step S102 for each active chain gettingConnect, according to the set membership between the described all effective links that get, determine described effective link respectively for characterizingState the crucial degree relevant parameter of the importance degree of effective link, and according to each crucial degree relevant parameter and the each crucial degree phase determinedThe corresponding weight of related parameter, the mode of employing weighted sum, calculates the crucial degree of described effective link, comprising:
Effectively link for each getting, according to the set membership between the described all effective links that get,Determine link density, the link degree of depth, percent continuity and the average layout coefficient of described effective link, and according to described in determiningThe effectively link density of link, the link degree of depth, percent continuity, average layout coefficient, and the link density of described effective link,The link degree of depth, percent continuity, average layout coefficient are distinguished corresponding weight, and the mode of employing weighted sum has described in calculatingThe crucial degree of effect link.
Wherein, for arbitrary effective link, can determine by following formula the link density of described arbitrary effective link:
D e n s i t y ( i ) = c o u n t ( i ) Σ i = 1 N c o u n t ( i ) ;
Wherein, i is that described arbitrary active chain is connected on the numbering in all effective link of described website; N is described websiteIn total number of all effective links, described i, N are positive integer, and the value of described i is not more than the value of described N;The link density that Density (i) is described arbitrary effective link; Count (i) is connected in described website for described arbitrary active chainThe total degree occurring.
That is to say, for the arbitrary effective link in website, described in its link density is defined as, is linked at and crawledIn the number of times occurring in journey and website, all-links is at the ratio that crawls the total degree occurring in process, and its span is 0 as seen~1. And, generally can think, for a certain website, the number of times that a certain page wherein occurs is more, and its importance is just moreGreatly, therefore, the value of link density is larger, shows that the crucial degree of link is larger, and the importance of link is stronger, repeats no more herein.
Alternatively, for arbitrary effective link, can determine by following formula the link degree of depth of described arbitrary effective link:
D e p t h ( i ) = 1 count i ( / ′ ′ ) + count i ( ? ′ ′ ) ;
Wherein, i is that described arbitrary active chain is connected on the numbering in all effective link of described website; Described i is for just wholeNumber, and the value of described i is not more than total number of all effective links in described website; Counti('/') be described arbitraryThe number of times that effectively in link, separator occurs, counti('? ') be the number of times that in described arbitrary effective link, question mark occurs.
It should be noted that, due in each effective URL "? " number be generally not more than 1, that is, and for arbitrary active chainMeet its counti('? ') numerical value be 0 or 1, therefore, counti('? ') also can represent whether have in described arbitrary effective linkParameter "? ", in the time that its value is 1, represent to have in described arbitrary effective link parameter "? ", its value be within 0 o'clock, represent described arbitrary effectivelyIn link printenv "? " In addition, owing to conventionally all possessing "/" in each effective URL, and the number of "/" do not limit, can be multiple,Therefore, effectively in URL, the number of "/" is generally the positive integer that is greater than 0, that is, and and the getting of the link degree of depth of described arbitrary effective linkValue scope is also 0~1; And its numerical value is larger, represent that effectively the importance of link is stronger, crucial degree is larger, no longer superfluous hereinState.
Alternatively, for arbitrary effective link, can determine by following formula the percent continuity of described arbitrary effective link:
C o n n e c t i v t y ( i ) = min ( i n ( i ) , o u t ( t ) ) max ( i n ( i ) , o u t ( t ) ) ;
Wherein, i is that described arbitrary active chain is connected on the numbering in all effective link of described website; Described i is for just wholeNumber, and the value of described i is not more than total number of all effective links in described website; In (i) is described arbitrary active chainConnect by other total degrees that effectively link is quoted in described website, out (i) quotes described website for described arbitrary effective linkIn other effective total degrees of link.
That is to say, for arbitrary effective link, its percent continuity is defined as the fan-in number of described link (in websiteOther total degrees that effectively link is quoted) and fanout (quoting other total degrees that effectively link in website) in lessThe ratio of numerical value and larger numerical value, as can be seen here, effectively the span of the percent continuity of link is 0~1; And it is generally acknowledged,For the page in website, the ratio of its fan-in/fan-out (or fan-out/fan-in) is more close to 1, and its flow is larger, at stationStatus in point is also just more important, and therefore, the numerical value of percent continuity is larger, represents that effectively the importance of link is stronger, crucial degreeLarger, repeat no more herein.
Equally, alternatively, for arbitrary effective link, can determine the average of described arbitrary effective link by following formulaLayout coefficient:
L a y o u t ( i ) = Σ k = 1 M l a y o u t ( i , k ) M ;
Wherein, i is that described arbitrary active chain is connected on the numbering in all effective link of described website, and M is described websiteIn total number of all effective pages of quoting described arbitrary effective link; Described i is positive integer, and the value of described i is notBe greater than total number of all effective links in described website; Described k is positive integer, and its value is not more than the value of described M;Layout (i, k) is described arbitrary effective link i being numbered in all effective page of quoting described arbitrary effective link iLayout coefficient in effective page of k;
Wherein, l a y o u t ( i , k ) = 1 - offset k ( i ) Σ j = 1 T offset k ( j ) ;
Wherein, T is total number of quoting the effective link in the effective page that is numbered k of described arbitrary effective link i;offsetk(i) in the described effective page that is numbered k of quoting described arbitrary effective link i, described arbitrary effective linkI is numbered the locus side-play amount of the desired location of effective page of k with respect to this; Offsetk(j) described in quoting described in beingIn the effective page that is numbered k of arbitrary effective link i, j effectively links the institute that is numbered effective page of k with respect to thisState the locus side-play amount of desired location.
It should be noted that the desired location that this is numbered effective page of k refers to that in this effective page, importance degree is the highestPosition, can set flexibly according to the design style of Website page, can be preferably the page position, beginning of the page, the page inHeart position, can be also other optional position of the page, and the embodiment of the present invention is not limited in any way at this.
That is to say, for arbitrary effective link, its layout coefficient in a certain page refers to described effective link phaseFor all effective the link with respect to this page setting position in the locus side-play amount of this page setting position and this pageThe opposite number of the ratio of the locus side-play amount sum of putting and 1 difference, its span is 0~1 as seen; In addition, forArbitrary effective link, because its average layout coefficient is that described arbitrary active chain is connected on and quotes respectively having of described arbitrary effective linkThe mean value of the each layout coefficient in the effect page, thereby the span of its average layout coefficient is also 0~1. And, due to instituteState active chain and be connected in the described a certain page, from desired location more close to, described effective link is with respect to the locus of this positionAll effective in side-play amount and this page links just less with respect to the ratio of the locus side-play amount sum of this position, instituteStating active chain, to be connected on layout coefficient in this page just larger, makes the average layout coefficient of described effective link just larger, and,Described active chain is connected in the described a certain page, shows that described active chain is connected in this page close to more more important from desired location,Thereby known, the average layout coefficient of described effective link is larger, show that this active chain is connected on status in website more important,Repeat no more herein.
As shown in the above, for arbitrary effective link, the crucial degree relevant parameter of described effective link (links closeDegree, the link degree of depth, percent continuity and average layout coefficient etc.) span be 0~1, and numerical value is larger, shows that it is heavyThe property wanted is stronger, therefore, for arbitrary effective link i, can calculate its crucial degree Key (i) by following formula:
Key(i)=W1*Density(i)+W2*Depth(i)+W3*Connectivity(i)+W4*Layout(i);
Wherein, Density (i), Depth (i), Connectivity (i) and Layout (i) are respectively described arbitraryThe effectively link density of link i, the link degree of depth, percent continuity and average layout coefficient; W1、W2、W3, and W4Be respectively withDensity (i), Depth (i), Connectivity (i), and weight corresponding to Layout (i) difference, and W1、W2、W3WithAnd W4Span conventionally can be 0~1, and four sums can equal 1 conventionally.
Further, it should be noted that, the weights of the weight corresponding with each crucial degree relevant parameter, can be by rightA large amount of sample datas is learnt to determine, by great many of experiments and data analysis, obtains W here1=0.1、W2=0.1、W3=0.4, and W4=0.4. Certainly, the weights of the weight corresponding with each crucial degree relevant parameter can be also other numerical value,The present embodiment is in this no limit.
In addition, still it should be noted that, for arbitrary website, at the crucial degree that calculates the each effective link in described websiteTime, can suitably reduce crucial degree relevant parameter (as being set to 0 etc. without the weight of crucial degree relevant parameter of considering), orPerson, also can increase other parameter relevant to the crucial degree effectively linking by modeling, and can be according to each crucial degree relevant parameterTo the contribution of crucial degree, redefine the weight corresponding with each crucial degree relevant parameter. , the present embodiment providesDefinite method of the crucial page, parameter and weight all can be carried out flexible combination and adjustment according to actual conditions, make the method suitableWith scope extensively, comparatively flexible, and also can be those skilled in the art provide optimization, expansion space.
Further, after calculating in the manner described above the crucial degree of all effective links in website,The operation of the definite key page face described in execution step S103.
Alternatively, in step S103, can be according to the crucial degree of the each effective link calculating, according to crucial degree from greatlyOrder to little (or from small to large) taps into line ordering to each active chain, and before sorting (or sequence is last) one or manyIndividual effective link, one or more effective link that corresponding crucial degree is not less than setting threshold is as the highest the having of crucial degreeEffect link, and the crucial page using the page corresponding effective link the highest crucial degree of determining as website.
It should be noted that, in step S103, described setting threshold can be set flexibly according to the result of calculation of step S102,Thereby can determine flexibly effective link that one or more crucial degree are not less than described setting threshold, and then obtain one or manyThe higher page of individual importance degree is as the crucial page of website, and therefore not to repeat here for the present embodiment.
Embodiment bis-:
Based on same inventive concept, the embodiment of the present invention two provides a kind of definite device of the crucial page, particularly,As shown in Figure 2, it is the structural representation of definite device of the crucial page described in the embodiment of the present invention two, and described device canComprise:
Acquiring unit 201, for for arbitrary website, obtains all effective link in described website and described allThe effectively set membership between link;
Computing unit 202, effectively links for each getting for described acquiring unit 201, obtains according to describedSet membership between described all effective links that unit 201 gets, determine described effective link respectively for characterizingState the crucial degree relevant parameter of the importance degree of effective link, and according to each crucial degree relevant parameter and the each crucial degree phase determinedThe corresponding weight of related parameter, the mode of employing weighted sum, calculates the crucial degree of described effective link;
Determining unit 203, for the crucial degree of each effective link of calculating according to described computing unit 202, determinesCorresponding crucial degree in described website be not less than setting threshold at least one effectively link, and at least one that determine hadThe effect crucial page of the corresponding page as described website that link is distinguished.
That is to say, the device that the present embodiment provides, by arranging, (or the page is important with the importance of page linkProperty) relevant parameter and the corresponding weight of each parameter, for the crucial page definite provides direct quantizating index, fromAnd can calculate automatically, quantitatively the crucial degree of each page in website, make determining and choosing more accurate, clever of the crucial pageLive, meanwhile, reduced the workload of the crucial page of manual configuration etc., improved the efficiency of definite crucial page.
Below, will the function of the each unit described in the embodiment of the present invention be elaborated:
Alternatively, obtain efficiency and accuracy for what improve effective link, for arbitrary website, described acquiring unit201 can, by using reptile to crawl described website, obtain all effective link and described institute in described websiteSet membership between effective link. Particularly, for arbitrary website, suitable initial link can be set, as initial URL,Website is crawled, and crawling in process, described acquiring unit 201 also can be to the effective link crawling to increase certainlyLong mode is carried out serial number, to facilitate the set membership recording between each effective link.
Further, after described acquiring unit 201 gets all effective link in described website, described calculatingThe crucial degree of the page link that unit 202 can provide according to the embodiment of the present invention one is determined mode, determines each active chainThe crucial degree connecing.
Alternatively, for arbitrary effective link, the crucial degree relevant parameter of described effective link comprises: link density, chainConnect the degree of depth, percent continuity and average layout coefficient etc.; Correspondingly, described computing unit 202 can be specifically for,
Effectively link for each getting, according to the set membership between the described all effective links that get,Determine link density, the link degree of depth, percent continuity and the average layout coefficient of described effective link, and according to described in determiningThe effectively link density of link, the link degree of depth, percent continuity, average layout coefficient, and the link density of described effective link,The link degree of depth, percent continuity, average layout coefficient are distinguished corresponding weight, and the mode of employing weighted sum has described in calculatingThe crucial degree of effect link.
Wherein, described computing unit 202 can be specifically for, for arbitrary effective link, described in determining by following formulaThe link density of arbitrary effective link:
D e n s i t y ( i ) = c o u n t ( i ) Σ i = 1 N c o u n t ( i ) ;
Wherein, i is that described arbitrary active chain is connected on the numbering in all effective link of described website; N is described websiteIn total number of all effective links, described i, N are positive integer, and the value of described i is not more than the value of described N;The link density that Density (i) is described arbitrary effective link; Count (i) is connected in described website for described arbitrary active chainThe total degree occurring.
That is to say, for the arbitrary effective link in website, described in its link density is defined as, is linked at and crawledIn the number of times occurring in journey and website, all-links is at the ratio that crawls the total degree occurring in process, and its span is 0 as seen~1. And, generally can think, for a certain website, the number of times that a certain page wherein occurs is more, and its importance is just moreGreatly, therefore, the value of link density is larger, shows that the crucial degree of link is larger, and the importance of link is stronger, repeats no more herein.
Alternatively, described computing unit 202 can be specifically for, for arbitrary effective link, determines institute by following formulaState the link degree of depth of arbitrary effective link:
D e p t h ( i ) = 1 count i ( / ′ ′ ) + count i ( ? ′ ′ ) ;
Wherein, i is that described arbitrary active chain is connected on the numbering in all effective link of described website; Described i is for just wholeNumber, and the value of described i is not more than total number of all effective links in described website; Counti('/') be described arbitraryThe number of times that effectively in link, separator occurs, counti('? ') be the number of times that in described arbitrary effective link, question mark occurs.
It should be noted that, due in each effective URL "? " number be generally not more than 1, that is, and for arbitrary active chainMeet its counti('? ') numerical value be 0 or 1, therefore, counti('? ') also can represent whether have in described arbitrary effective linkParameter "? ", in the time that its value is 1, represent to have in described arbitrary effective link parameter "? ", its value be within 0 o'clock, represent described arbitrary effectivelyIn link printenv "? " In addition, owing to conventionally all possessing "/" in each effective URL, and the number of "/" do not limit, can be multiple,Therefore, effectively in URL, the number of "/" is generally the positive integer that is greater than 0, that is, and and the getting of the link degree of depth of described arbitrary effective linkValue scope is also 0~1; And its numerical value is larger, represent that effectively the importance of link is stronger, crucial degree is larger, no longer superfluous hereinState.
Alternatively, described computing unit 202 can be specifically for, for arbitrary effective link, determines institute by following formulaState the percent continuity of arbitrary effective link:
C o n n e c t i v i t y ( i ) = m i n ( i n ( i ) , o u t ( i ) ) m a x ( i n ( i ) , o u t ( i ) ) ;
Wherein, i is that described arbitrary active chain is connected on the numbering in all effective link of described website; Described i is for just wholeNumber, and the value of described i is not more than total number of all effective links in described website; In (i) is described arbitrary active chainConnect by other total degrees that effectively link is quoted in described website, out (i) quotes described website for described arbitrary effective linkIn other effective total degrees of link.
That is to say, for arbitrary effective link, its percent continuity is defined as the fan-in number of described link (in websiteOther total degrees that effectively link is quoted) and fanout (quoting other total degrees that effectively link in website) in lessThe ratio of numerical value and larger numerical value, as can be seen here, effectively the span of the percent continuity of link is 0~1; And it is generally acknowledged,For the page in website, the ratio of its fan-in/fan-out (or fan-out/fan-in) is more close to 1, and its flow is larger, at stationStatus in point is also just more important, and therefore, the numerical value of percent continuity is larger, represents that effectively the importance of link is stronger, crucial degreeLarger, repeat no more herein.
Equally, alternatively, described computing unit 202 can be specifically for, for arbitrary effective link, true by following formulaThe average layout coefficient of fixed described arbitrary effective link:
L a y o u t ( i ) = Σ k = 1 M l a y o u t ( i , k ) M ;
Wherein, i is that described arbitrary active chain is connected on the numbering in all effective link of described website, and M is described websiteIn total number of all effective pages of quoting described arbitrary effective link; Described i is positive integer, and the value of described i is notBe greater than total number of all effective links in described website; Described k is positive integer, and its value is not more than the value of described M;Layout (i, k) is described arbitrary effective link i being numbered in all effective page of quoting described arbitrary effective link iLayout coefficient in effective page of k;
Wherein, l a y o u t ( i , k ) = 1 - offset k ( i ) Σ j = 1 T offset k ( j ) ;
Wherein, T is total number of quoting the effective link in the effective page that is numbered k of described arbitrary effective link i;offsetk(i) in the described effective page that is numbered k of quoting described arbitrary effective link i, described arbitrary effective linkI is numbered the locus side-play amount of the desired location of effective page of k with respect to this; Offsetk(j) described in quoting described in beingIn the effective page that is numbered k of arbitrary effective link i, j effectively links the institute that is numbered effective page of k with respect to thisState the locus side-play amount of desired location.
It should be noted that the desired location that this is numbered effective page of k refers to that in this effective page, importance degree is the highestPosition, can set flexibly according to the design style of Website page, can be preferably the page position, beginning of the page, the page inHeart position, can be also other optional position of the page, and the embodiment of the present invention is not limited in any way at this.
That is to say, for arbitrary effective link, its layout coefficient in a certain page refers to described effective link phaseFor all effective the link with respect to this page setting position in the locus side-play amount of this page setting position and this pageThe opposite number of the ratio of the locus side-play amount sum of putting and 1 difference, its span is 0~1 as seen; In addition, forArbitrary effective link, because its average layout coefficient is that described arbitrary active chain is connected on and quotes respectively having of described arbitrary effective linkThe mean value of the each layout coefficient in the effect page, thereby the span of its average layout coefficient is also 0~1. And, due to instituteState active chain and be connected in the described a certain page, from desired location more close to, described effective link is with respect to the locus of this positionAll effective in side-play amount and this page links just less with respect to the ratio of the locus side-play amount sum of this position, instituteStating active chain, to be connected on layout coefficient in this page just larger, makes the average layout coefficient of described effective link just larger, and,Described active chain is connected in the described a certain page, shows that described active chain is connected in this page close to more more important from desired location,Thereby known, the average layout coefficient of described effective link is larger, show that this active chain is connected on status in website more important,Repeat no more herein.
As shown in the above, for arbitrary effective link, the crucial degree relevant parameter of described effective link (links closeDegree, the link degree of depth, percent continuity and average layout coefficient etc.) span be 0~1, and numerical value is larger, shows that it is heavyThe property wanted is stronger, and therefore, for arbitrary effective link i, described computing unit 202 can calculate its crucial degree Key by following formula(i):
Key(i)=W1*Density(i)+W2*Depth(i)+W3*Connectivity(i)+W4*Layout(i);
Wherein, Density (i), Depth (i), Connectivity (i) and Layout (i) are respectively described arbitraryThe effectively link density of link i, the link degree of depth, percent continuity and average layout coefficient; W1、W2、W3, and W4Be respectively withDensity (i), Depth (i), Connectivity (i), and weight corresponding to Layout (i) difference, and W1、W2、W3WithAnd W4Conventionally can be natural number, and four sums can equal 1 conventionally.
Further, it should be noted that, described computing unit 202, can also be used for, by the sample data to a large amount ofLearn to determine the weights of the weight corresponding with each crucial degree relevant parameter. Here divide by great many of experiments and dataAnalyse, obtain W1=0.1、W2=0.1、W3=0.4, and W4=0.4. Certainly the weight corresponding with each crucial degree relevant parameter,Weights can be also other numerical value, the present embodiment is in this no limit.
In addition, still it should be noted that, for arbitrary website, described computing unit 202 respectively having in the described website of calculatingWhen the key of effect link is spent, can suitably reduce crucial degree relevant parameter (as by the power of the crucial degree relevant parameter without consideringReset and be set to 0 etc.), or, also can increase other parameter relevant to the crucial degree effectively linking by modeling, and can be according to respectivelyThe contribution of key degree relevant parameter to crucial degree, redefines the weight corresponding with each crucial degree relevant parameter. , originallyDefinite device of the crucial page that embodiment provides, parameter and weight all can be carried out flexible combination and tune according to actual conditionsWhole, make this device applied widely, comparatively flexible, and also can be the space that those skilled in the art provide optimization, expand.
Further, after described computing unit 202 calculates the crucial degree of all effective links in website, instituteState determining unit 203 and can carry out the operation of determining the crucial page.
Alternatively, described determining unit 203, can be according to the crucial degree of the each effective link calculating, according to crucial degreeThe order of (or from small to large) taps into line ordering to each active chain from big to small, and (or sequence is last) one before sortingOr multiple effective links, it is the highest as crucial degree that corresponding crucial degree is not less than one or more effective link of setting thresholdEffective link, and the crucial page using the page corresponding the highest effective link of crucial degree of determining as website.
It should be noted that, described setting threshold can be set flexibly according to the result of calculation of described computing unit 202, therebyCan determine flexibly effective link that one or more crucial degree are not less than described setting threshold, and then obtain one or more heavySpend the crucial page of the higher page as website, therefore not to repeat here for the present embodiment.
It will be understood by those skilled in the art that embodiments of the invention can be provided as method, device (equipment) or computer journeyOrder product. Therefore, the present invention can adopt complete hardware implementation example, complete implement software example or in conjunction with software and hardware aspectThe form of embodiment. And the present invention can adopt the calculating that wherein includes computer usable program code one or moreThe upper computer program of implementing of machine usable storage medium (including but not limited to magnetic disc store, CD-ROM, optical memory etc.)The form of product.
The present invention is that reference is according to the flow chart of the method for the embodiment of the present invention, device (equipment) and computer programAnd/or block diagram is described. Should understand can be by each flow process in computer program instructions realization flow figure and/or block diagramAnd/or flow process in square frame and flow chart and/or block diagram and/or the combination of square frame. Can provide these computer programs to refer toOrder is arrived the processor of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing device to produceA machine, produces for realizing the instruction of carrying out by the processor of computer or other programmable data processing deviceThe device of the function of specifying in flow process of flow chart or multiple flow process and/or square frame of block diagram or multiple square frame.
These computer program instructions also can be stored in can vectoring computer or other programmable data processing device with spyDetermine in the computer-readable memory of mode work, the instruction generation that makes to be stored in this computer-readable memory comprises fingerMake the manufacture of device, this command device realize at flow process of flow chart or multiple flow process and/or square frame of block diagram orThe function of specifying in multiple square frames.
These computer program instructions also can be loaded in computer or other programmable data processing device, make at meterOn calculation machine or other programmable device, carry out sequence of operations step to produce computer implemented processing, thus at computer orThe instruction of carrying out on other programmable device is provided for realizing at flow process of flow chart or multiple flow process and/or block diagram oneThe step of the function of specifying in individual square frame or multiple square frame.
Although described the preferred embodiments of the present invention, once obtaining cicada, those skilled in the art substantially createProperty concept, can make other change and amendment to these embodiment. So it is excellent that claims are intended to be interpreted as comprisingSelect embodiment and fall into all changes and the amendment of the scope of the invention.
Obviously, those skilled in the art can carry out various changes and modification and not depart from essence of the present invention the present inventionGod and scope. Like this, if these amendments of the present invention and modification belong to the scope of the claims in the present invention and equivalent technologies thereofWithin, the present invention be also intended to comprise these change and modification interior.

Claims (12)

1. a definite method for the crucial page, is characterized in that, described method comprises:
For arbitrary website, the father and son who obtains between all effective link and the described all effective links in described website is closedSystem;
Effectively link for each getting, according to the set membership between the described all effective links that get, determineDescribed effective link respectively for characterizing the crucial degree relevant parameter of importance degree of described effective link, and according to each pass of determiningKey degree relevant parameter and the corresponding weight of each crucial degree relevant parameter, the mode of employing weighted sum, calculates described effectiveThe crucial degree of link;
According to the crucial degree of the each effective link calculating, determine that the corresponding crucial degree in described website is not less than setting thresholdAt least one of value effectively links, and using the corresponding page that at least one effectively link is distinguished of determining as described websiteThe crucial page.
2. the method for claim 1, is characterized in that, for arbitrary effective link, and the crucial degree of described effective linkRelevant parameter comprises: link density, the link degree of depth, percent continuity and average layout coefficient; , have for each gettingEffect link, according to the set membership between the described all effective links that get, determine described effective link respectively for tableLevy the crucial degree relevant parameter of the importance degree of described effective link, and according to each crucial degree relevant parameter and the each key determinedThe corresponding weight of degree relevant parameter, the mode of employing weighted sum, calculates the crucial degree of described effective link, comprising:
Effectively link for each getting, according to the set membership between the described all effective links that get, determineLink density, the link degree of depth, percent continuity and the average layout coefficient of described effective link, and described effective according to what determineThe link density of link, the link degree of depth, percent continuity, average layout coefficient, and link density, the link of described effective linkThe degree of depth, percent continuity, average layout coefficient are distinguished corresponding weight, adopt the mode of weighted sum, calculate described active chainThe crucial degree connecing.
3. method as claimed in claim 2, is characterized in that, for arbitrary effective link, described in determining by following formulaThe link density of arbitrary effective link:
D e n s i t y ( i ) = c o u n t ( i ) Σ i = 1 N c o u n t ( i ) ;
Wherein, i is that described arbitrary active chain is connected on the numbering in all effective link of described website; N is in described websiteTotal number of all effective links, described i, N are positive integer, and the value of described i is not more than the value of described N; Density(i) be the link density of described arbitrary effective link; Count (i) occurs in described website for described arbitrary active chain is connected onTotal degree.
4. method as claimed in claim 2, is characterized in that, for arbitrary effective link, described in determining by following formulaThe link degree of depth of arbitrary effective link:
D e p t h ( i ) = 1 count i ( ′ / ′ ) + count i ( ′ ? ′ ) ;
Wherein, i is that described arbitrary active chain is connected on the numbering in all effective link of described website; Described i is positive integer,And the value of described i is not more than total number of all effective links in described website; Counti('/') arbitraryly has for describedThe number of times that in effect link, separator occurs, counti('? ') be the number of times that in described arbitrary effective link, question mark occurs.
5. method as claimed in claim 2, is characterized in that, for arbitrary effective link, described in determining by following formulaThe percent continuity of arbitrary effective link:
C o n n e c t i v i t y ( i ) = min ( i n ( i ) , o u t ( i ) ) max ( i n ( i ) , o u t ( i ) ) ;
Wherein, i is that described arbitrary active chain is connected on the numbering in all effective link of described website; Described i is positive integer,And the value of described i is not more than total number of all effective links in described website; In (i) is described arbitrary effective linkBy other total degrees that effectively link is quoted in described website, out (i) quotes in described website for described arbitrary effective linkOther effective total degrees of link.
6. method as claimed in claim 2, is characterized in that, for arbitrary effective link, described in determining by following formulaThe average layout coefficient of arbitrary effective link:
L a y o u t ( i ) = Σ k = 1 M l a y o u t ( i , k ) M ;
Wherein, i is that described arbitrary active chain is connected on the numbering in all effective link of described website, and M is in described websiteQuote total number of all effective pages of described arbitrary effective link; Described i is positive integer, and the value of described i is not more thanTotal number of all effective links in described website; Described k is positive integer, and its value is not more than the value of described M;Layout (i, k) is described arbitrary effective link i being numbered in all effective page of quoting described arbitrary effective link iLayout coefficient in effective page of k;
Wherein, l a y o u t ( i , k ) = 1 - offset k ( i ) Σ j = 1 T offset k ( j ) ;
Wherein, T is total number of quoting the effective link in the effective page that is numbered k of described arbitrary effective link i;offsetk(i) in the described effective page that is numbered k of quoting described arbitrary effective link i, described arbitrary effective linkI is numbered the locus side-play amount of the desired location of effective page of k with respect to this; Offsetk(j) described in quoting described in beingIn the effective page that is numbered k of arbitrary effective link i, j effectively links the institute that is numbered effective page of k with respect to thisState the locus side-play amount of desired location.
7. a definite device for the crucial page, is characterized in that, described device comprises:
Acquiring unit, for for arbitrary website, obtains all effective link and described all active chains in described websiteSet membership between connecing;
Computing unit, for effectively linking for each getting, according between the described all effective links that getSet membership, determine described effective link respectively for characterizing the crucial degree relevant parameter of importance degree of described effective link, andAccording to each crucial degree relevant parameter and the corresponding weight of each crucial degree relevant parameter determined, adopt the side of weighted sumFormula, calculates the crucial degree of described effective link;
Determining unit, for according to the crucial degree of the each effective link calculating, determines the corresponding key in described websiteDegree be not less than setting threshold at least one effectively link, and by corresponding page that at least one effectively link is distinguished of determiningAs the crucial page of described website.
8. device as claimed in claim 7, is characterized in that, for arbitrary effective link, and the crucial degree of described effective linkRelevant parameter comprises: link density, the link degree of depth, percent continuity and average layout coefficient; , described computing unit is specifically usedIn,
Effectively link for each getting, according to the set membership between the described all effective links that get, determineLink density, the link degree of depth, percent continuity and the average layout coefficient of described effective link, and described effective according to what determineThe link density of link, the link degree of depth, percent continuity, average layout coefficient, and link density, the link of described effective linkThe degree of depth, percent continuity, average layout coefficient are distinguished corresponding weight, adopt the mode of weighted sum, calculate described active chainThe crucial degree connecing.
9. device as claimed in claim 8, is characterized in that, described computing unit specifically for, for arbitrary effective link,Determine the link density of described arbitrary effective link by following formula:
D e n s i t y ( i ) = c o u n t ( i ) Σ i = 1 N c o u n t ( i ) ;
Wherein, i is that described arbitrary active chain is connected on the numbering in all effective link of described website; N is in described websiteTotal number of all effective links, described i, N are positive integer, and the value of described i is not more than the value of described N; Density(i) be the link density of described arbitrary effective link; Count (i) occurs in described website for described arbitrary active chain is connected onTotal degree.
10. device as claimed in claim 8, is characterized in that, described computing unit specifically for, for arbitrary active chainConnect, determine the link degree of depth of described arbitrary effective link by following formula:
D e p t h ( i ) = 1 count i ( ′ / ′ ) + count i ( ′ ? ′ ) ;
Wherein, i is that described arbitrary active chain is connected on the numbering in all effective link of described website; Described i is positive integer,And the value of described i is not more than total number of all effective links in described website; Counti('/') arbitraryly has for describedThe number of times that in effect link, separator occurs, counti('? ') be the number of times that in described arbitrary effective link, question mark occurs.
11. devices as claimed in claim 8, is characterized in that, described computing unit specifically for, for arbitrary active chainConnect, determine the percent continuity of described arbitrary effective link by following formula:
C o n n e c t i v i t y ( i ) = min ( i n ( i ) , o u t ( i ) ) max ( i n ( i ) , o u t ( i ) ) ;
Wherein, i is that described arbitrary active chain is connected on the numbering in all effective link of described website; Described i is positive integer,And the value of described i is not more than total number of all effective links in described website; In (i) is described arbitrary effective linkBy other total degrees that effectively link is quoted in described website, out (i) quotes in described website for described arbitrary effective linkOther effective total degrees of link.
12. devices as claimed in claim 8, is characterized in that, described computing unit specifically for, for arbitrary active chainConnect, determine the average layout coefficient of described arbitrary effective link by following formula:
L a y o u t ( i ) = Σ k = 1 M l a y o u t ( i , k ) M ;
Wherein, i is that described arbitrary active chain is connected on the numbering in all effective link of described website, and M is in described websiteQuote total number of all effective pages of described arbitrary effective link; Described i is positive integer, and the value of described i is not more thanTotal number of all effective links in described website; Described k is positive integer, and its value is not more than the value of described M;Layout (i, k) is described arbitrary effective link i being numbered in all effective page of quoting described arbitrary effective link iLayout coefficient in effective page of k;
Wherein, l a y o u t ( i , k ) = 1 - offset k ( i ) Σ j = 1 T offset k ( j ) ;
Wherein, T is total number of quoting the effective link in the effective page that is numbered k of described arbitrary effective link i;offsetk(i) in the described effective page that is numbered k of quoting described arbitrary effective link i, described arbitrary effective linkI is numbered the locus side-play amount of the desired location of effective page of k with respect to this; Offsetk(j) described in quoting described in beingIn the effective page that is numbered k of arbitrary effective link i, j effectively links the institute that is numbered effective page of k with respect to thisState the locus side-play amount of desired location.
CN201510947063.2A 2015-12-16 2015-12-16 A kind of determination method and device of the key page Active CN105608133B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510947063.2A CN105608133B (en) 2015-12-16 2015-12-16 A kind of determination method and device of the key page

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510947063.2A CN105608133B (en) 2015-12-16 2015-12-16 A kind of determination method and device of the key page

Publications (2)

Publication Number Publication Date
CN105608133A true CN105608133A (en) 2016-05-25
CN105608133B CN105608133B (en) 2019-07-02

Family

ID=55988073

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510947063.2A Active CN105608133B (en) 2015-12-16 2015-12-16 A kind of determination method and device of the key page

Country Status (1)

Country Link
CN (1) CN105608133B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6285999B1 (en) * 1997-01-10 2001-09-04 The Board Of Trustees Of The Leland Stanford Junior University Method for node ranking in a linked database
CN1996299A (en) * 2006-12-12 2007-07-11 孙斌 Ranking method for web page and web site
CN103714093A (en) * 2012-09-29 2014-04-09 北京百度网讯科技有限公司 Method and device for mining key pages of website

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6285999B1 (en) * 1997-01-10 2001-09-04 The Board Of Trustees Of The Leland Stanford Junior University Method for node ranking in a linked database
CN1996299A (en) * 2006-12-12 2007-07-11 孙斌 Ranking method for web page and web site
CN103714093A (en) * 2012-09-29 2014-04-09 北京百度网讯科技有限公司 Method and device for mining key pages of website

Also Published As

Publication number Publication date
CN105608133B (en) 2019-07-02

Similar Documents

Publication Publication Date Title
US11651259B2 (en) Neural architecture search for convolutional neural networks
Abbaspour SWAT calibration and uncertainty programs
US10643132B2 (en) Cardinality estimation using artificial neural networks
Bergmann et al. Incremental pattern matching in the VIATRA model transformation system
WO2018099084A1 (en) Method, device, chip and system for training neural network model
EP3602419B1 (en) Neural network optimizer search
CN107705183A (en) Recommendation method, apparatus, storage medium and the server of a kind of commodity
CN103116582B (en) A kind of information retrieval method and related system and device
CN106600344A (en) Method and apparatus for obtaining active user data of target product
CN110276442A (en) A kind of searching method and device of neural network framework
CN105389349A (en) Dictionary updating method and apparatus
CN108280058A (en) Relation extraction method and apparatus based on intensified learning
CN110832509A (en) Black box optimization using neural networks
CN109117380A (en) A kind of method for evaluating software quality, device, equipment and readable storage medium storing program for executing
CN104462399B (en) The processing method and processing device of search result
CN107491434A (en) Text snippet automatic generation method and device based on semantic dependency
CN108388509B (en) Software testing method, computer readable storage medium and terminal equipment
US10537801B2 (en) System and method for decision making in strategic environments
CN107368526A (en) A kind of data processing method and device
CN104268243B (en) A kind of position data processing method and processing device
CN108984735B (en) Label Word library updating method, apparatus and electronic equipment
CN105608133A (en) Key page determination method and device
CN106934007A (en) The method for pushing and device of related information
CN110298690A (en) Object class purpose period judgment method, device, server and readable storage medium storing program for executing
CN107957981A (en) A kind of ternary composite oil-displacing system takes effect a little definite method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 100089 Beijing city Haidian District Road No. 4 North wa Yitai three storey building

Patentee after: NSFOCUS Technologies Group Co.,Ltd.

Patentee after: NSFOCUS TECHNOLOGIES Inc.

Address before: 100089 Beijing city Haidian District Road No. 4 North wa Yitai three storey building

Patentee before: NSFOCUS INFORMATION TECHNOLOGY Co.,Ltd.

Patentee before: NSFOCUS TECHNOLOGIES Inc.