CN106156230A - A kind of method and device generating interior chain - Google Patents

A kind of method and device generating interior chain Download PDF

Info

Publication number
CN106156230A
CN106156230A CN201510202200.XA CN201510202200A CN106156230A CN 106156230 A CN106156230 A CN 106156230A CN 201510202200 A CN201510202200 A CN 201510202200A CN 106156230 A CN106156230 A CN 106156230A
Authority
CN
China
Prior art keywords
page
chain
interior chain
website
interior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510202200.XA
Other languages
Chinese (zh)
Other versions
CN106156230B (en
Inventor
黄华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201510202200.XA priority Critical patent/CN106156230B/en
Publication of CN106156230A publication Critical patent/CN106156230A/en
Application granted granted Critical
Publication of CN106156230B publication Critical patent/CN106156230B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9558Details of hyperlinks; Management of linked annotations

Abstract

This application discloses a kind of method and device generating interior chain, to realize the purpose of chain in reasonable distribution.Such as, described method may include that calculating first page is by the quantity of other page links in website, according to described first page by the quantity of other page links in website, judge that described first page is the need of chain in increasing, the need to, calculate the interior chain quantity that described first page needs to increase, selecting and throw in one or more second pages of chain in respective numbers for described first page, generating be the interior chain recommendation results of chain in described first page input respective numbers in described second page.

Description

A kind of method and device generating interior chain
Technical field
The application relates to internet arena, particularly relates to a kind of method and device generating interior chain.
Background technology
Interior chain, is being linked to each other between the same website domain page under one's name.Reasonably chain structure in website Make, search engine can be improved the page in station is included and weight of website, it is achieved stable SEO effect. The page for example, with reference to the Alibaba Website shown in Fig. 1.In the page shown in Fig. 1, interior chain is to close The form of keyword shows in the page being devoted to." vehicle-mounted mp3 schemes key word shown in the page Sheet ", " plug-in card mp3 picture " thrown in corresponding in chain.User can be by clicking on these key words Enter the page that these key words are linked, or, it is crucial that the reptile of search engine can grab these The page that word is linked.
At present, interior chain is many is carried out according to user's historical search behavior, popular key word or page relevance Throw in.But, this can cause higher with user's historical search behavior, popular key word or page relevance Interior chain repeat in a large number, and in other chain the most do not obtain throw in chance so that search Engine reptile repeats to grab a large amount of same page, causes the waste of reptile resource.
Summary of the invention
In view of this, the purpose of the application is to provide a kind of method and device generating interior chain, to realize The purpose of chain in reasonable distribution.
First aspect in the embodiment of the present application, it is provided that a kind of method generating interior chain.Such as, should Method may include that calculating first page by the quantity of other page links in website, according to described first The page is by the quantity of other page links in website, it is judged that described first page the need of increase in chain, If it is required, calculate the interior chain quantity that described first page needs to increase, select as described page 1 One or more second pages of chain in face input respective numbers, generating is described in described second page The interior chain recommendation results of chain in first page input respective numbers.
Second aspect in the embodiment of the present application, it is provided that a kind of device generating interior chain.Such as, should Device may include that throws interior chain computing unit, may be used for calculating first page by other pages in website The quantity of face link.Interior chain increase judging unit, may be used for according to described first page by website its The quantity of his page link, it is judged that described first page is the need of chain in increasing.Interior chain breach calculates single Unit, if may be used for described interior chain increase judging unit to be judged to needs, calculates described first page Need the interior chain quantity increased.The candidate page chooses unit, may be used for selecting as described first page One or more second pages of chain in input respective numbers.Interior chain signal generating unit, may be used for generating It is the interior chain recommendation results of chain in described first page input respective numbers in described second page.
Visible, the application has the advantages that
Owing to first page is counted by the embodiment of the present application by the quantity of other page links in website Calculate, according to described first page by the quantity of other page links in website, to described first page whether In needing to increase, chain is judged, such that it is able to for having the first page of interior chain breach, calculate It needs the interior chain quantity increased, and selects and throws in the second page of chain in respective numbers into it, generates and exist Be the interior chain recommendation results of chain in described first page input respective numbers in described second page, due to institute The interior chain recommendation results generated may be used for throwing in and interior chain breach for described first page in second page The interior chain of respective numbers, so that the distribution of interior chain is more reasonable, it is to avoid with user behavior, search heat Degree or the higher interior chain of page relevance repeat in a large number, and in other, chain is not the most thrown in The problem of chance, decrease the waste to reptile resource.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present application or technical scheme of the prior art, below will be to reality Execute the required accompanying drawing used in example or description of the prior art to be briefly described, it should be apparent that below, Accompanying drawing in description is only some embodiments described in the application, for those of ordinary skill in the art From the point of view of, on the premise of not paying creative work, it is also possible to obtain the attached of other according to these accompanying drawings Figure.
Fig. 1 is the page schematic diagram having thrown in interior chain;
The method flow schematic diagram generating interior chain that Fig. 2 provides for the application one embodiment;
The method flow schematic diagram generating interior chain that Fig. 3 provides for another embodiment of the application;
The method flow schematic diagram generating interior chain that Fig. 4 provides for the another embodiment of the application;
The method flow schematic diagram generating interior chain that Fig. 5 provides for the application another embodiment;
The method flow schematic diagram generating interior chain that Fig. 6 provides for the application another embodiment;
The apparatus structure schematic diagram generating interior chain that Fig. 7 provides for the application one embodiment.
Detailed description of the invention
For the technical scheme making those skilled in the art be more fully understood that in the application, below in conjunction with Accompanying drawing in the embodiment of the present application, clearly and completely retouches the technical scheme in the embodiment of the present application State, it is clear that described embodiment is only some embodiments of the present application rather than whole enforcement Example.Based on the embodiment in the application, those of ordinary skill in the art are not before making creative work Put the every other embodiment obtained, all should belong to the scope of the application protection.
Below, in generating the application offer with the following Examples, the method for chain describes in detail.
Embodiment one:
See Fig. 2, the method flow schematic diagram generating interior chain provided for the embodiment of the present application.Such as Fig. 2 institute Showing, the method may include that
S210, calculating first page are by the quantity of other page links in website.
In some possible embodiments, the page in website can carry out table with the unique key word of its correspondence Show and carry out interior chain and recommend to calculate.Wherein, the unique key word that the page is corresponding can be by resolving this page URL obtain.
It should be noted that first page described herein can be any one or more pages in website. When multiple page in described first page refers to website, can be utilized respectively for each first page The method that the embodiment of the present application is provided, chain recommendation results in generating for it.Such as, some possible realities Execute in mode, interior chain breach can be carried out for pages all in website and judge and the recommendation of interior chain.At this In embodiment, described first page can be the whole pages in station.Such as, in conjunction with interior chain with key The form of word shows the embodiment at the page being devoted to, can be to whole passes of search in Website application The page that keyword is linked captures, and then calculates whole page respectively by other page chains in website The number of times connect.
In conjunction with above-mentioned embodiment, for the ease of the statistics page by the quantity of other page links, Ke Yixian The URL record rendered in same page A is formed [key word A, URL1, URL2 ... URLn] form Result, then each URL is resolved to correspondence key word, form multiple [key word A, key word B] The two-value pair of form, this result represents has thrown in the internal chain to the key word B page in the key word A page Connect.Follow-up based on two-value, result can be added up.It should be noted that described in this embodiment all Key word can be the receptible all key words of searched application, and the access beyond this key word scope will Can be rejected.It addition, the Grasp Modes of the page is not limited by the embodiment of the present application, for example, it is possible to based on The http client component of increasing income such as httpclient carries out page crawl.Wherein, URL resolves to key word, can With the URL coding strategy de-parsing within use.Such as, the URL within the Alibaba Website is used to compile Code strategy de-parsing, can solve " url:http: //www.1688.com/chanpin/-6D7033.html " Analyse into key word " mp3 ", anti-analytic the implementing of coding strategy is referred to general fashion realization, This no longer describes in detail.
In conjunction with described above, this embodiment can be divided by each page that below scheme calculates in website Not by the quantity of other page links in website, including:
A. the page that whole key word is linked is captured;
B. for any one untreated page A, the URL of this page A self is resolved, obtains The key word A of this page;
URL1 to the URLn of the link thrown in C. extracting page A, record formation [key word A, URL1, URL2 ... URLn] result of form;
D. for [key word A, URL1, URL2 ... URLn] in any one does not generates the URL of two-value pair Resolve, obtain the key word B of this URL, form the two-value pair of [key word A, key word B];
E. the two-value of basis [key word A, key word B] is to quantity, to the page corresponding for key word B by net In standing, the quantity of other page links adds up;
F. judge whether that URL1 to URLn all correspondences generate two-value pair, if it is, return to step B, If it does not, return to step D;
If the page that G. all key word is corresponding respectively has all processed, obtain whole key word respectively The corresponding page is respectively by the quantity of other page links in website.
S220, according to described first page by the quantity of other page links in website, it is judged that described first The page is the need of chain in increasing.
S230 if it is required, calculate described first page need increase interior chain quantity, select into One or more second pages of chain in described first page input respective numbers.
It should be noted that a second page can be thrown in the interior chain of a first page, can throw Put the interior chain of multiple different first page, the interior chain of multiple identical first page can also be thrown in, therefore, Choose how many different second pages, can determining according to the actual requirements, as long as meeting quantity being enough Described first page throws in chain in respective numbers.It addition, the selection range of second page can basis Actual enforcement needs to arrange.Such as, described second page can be any page.The most such as, described Two pages can be there is interior chain clear position and the page in not yet throw in the page of chain in described first page Face.Wherein, each page can be previously provided with the interior chain clear position quantity of correspondence, often in the page Throwing in an interior chain, the interior chain clear position quantity of this page subtracts one, chain number in throwing in this page Add one, when in the page, chain clear position is zero, chain in not allowing to continue to continue to throw in the page.Again Such as, show in the embodiment in the page at interior chain with the form of key word, for described first page The second page of chain in throwing in, can be on above-mentioned two conditioned basic, meet the page simultaneously in have the The page of the key word that one page face is corresponding.
Visible, by controlling the interior chain quantity in the page for chain clear position quantity in page setup, permissible Throw in interior chain in avoiding a page and cause interior chain distribution inequality.Carry it is understood that above To condition be only some possible embodiments of the embodiment of the present application, this is not limited by the application.
S240, to generate be the interior chain of chain in described first page input respective numbers in described second page Recommendation results.
Visible, in the generation that application the embodiment of the present application provides, the method for chain, can throw in based on interior chain Amount, carries out the input of chain in respective numbers so that interior chain distributes more adduction to the page with interior chain breach Reason, it is to avoid the interior chain higher with user behavior, search temperature or page relevance repeats in a large number, And chain does not the most obtain the problem of the chance thrown in other, decrease the waste to reptile resource.
Embodiment two:
In some possible embodiments, it is contemplated that though the interior chain lazy weight of described first page, but Its drain ability is stronger, as searched engine repeatedly captures, by user's repeatedly access etc., then this first The page is not required to continue to recommend interior chain.Therefore, this embodiment to described first page the need of increase Interior chain by the quantity of other page links in website and combines its drain ability with specific reference to described first page Judge.
For example, with reference to Fig. 3, the method flow schematic diagram generating interior chain provided for the embodiment of the present application.As Shown in Fig. 3, the method may include that
S310, calculating first page are by the quantity of other page links in website.
S320, according to described first page by the quantity of other page links in website, and, described Search engine collecting amount that one page face is corresponding and/or user's visit capacity corresponding to described first page, it is judged that Described first page is the need of chain in increasing.
For example, it is possible to judge whether that described first page is by the quantity of other page links in website not up to Interior chain throws in average magnitude, and the most default crawl amount of search engine collecting amount that described first page is corresponding, If it is, judge that described first page needs to increase interior chain, if it is not, then judge described first page Need not increase interior chain.It is understood that described first page is by the number of other page links in website Amount should reach interior chain input average magnitude and be only a kind of possible embodiment of the embodiment of the present application.Described first It is the most permissible that the page is needed to increase interior chain by the quantity of other page links in website under the conditions of which kind of reaches Arranging according to actual needs, this is not limited by the application.
Again for example, it is possible to judge whether that described first page is not reached by the quantity of other page links in website Average magnitude, and the most default visit capacity of user's visit capacity that described first page is corresponding is thrown in interior chain, If it is, judge that described first page needs to increase interior chain.
Again for example, it is possible to judge whether that described first page is not reached by the quantity of other page links in website Average magnitude is thrown in interior chain, and, search engine collecting amount corresponding to described first page is not up to preset and is grabbed Taken amount, and, the most default visit capacity of user's visit capacity that described first page is corresponding, if it is, Judge that described first page needs to increase interior chain.
S330 if it is required, calculate described first page need increase interior chain quantity, select into One or more second pages of chain in described first page input respective numbers.
S340, to generate be the interior chain of chain in described first page input respective numbers in described second page Recommendation results.
It should be noted that in embodiment of above, the search engine collecting that described first page is corresponding The statistical method of amount and user's visit capacity corresponding to described first page can be implemented according to actual needs, This is not limited by the application.Such as, in some possible embodiments, can be by analyzing net Stand daily record to add up search engine collecting amount, user's visit capacity.Wherein, daily record can include search engine Reptile daily record and user access logs.It practice, it is also that a kind of special user accesses that reptile accesses, But, all the two can be distinguished in general station.The record of user access logs is to be opened by browser The page triggers asynchronous JS and records, and search engine reptile daily record is record by the crawler capturing page , therefore, the two can clearly be distinguished.Specifically, such as, in some possible embodiments, institute State search engine collecting amount corresponding to first page and calculate acquisition especially by following steps: obtain search and draw Hold up reptile daily record, the crawler capturing number of times that described first page is corresponding is added up, obtains described first The search engine collecting amount that the page is corresponding.User's visit capacity corresponding to described first page specifically can be passed through Following steps calculate and obtain: obtain user access logs, the user that described first page is corresponding is accessed number Amount is added up, and obtains user's visit capacity that described first page is corresponding.
After being calculated search engine collecting amount corresponding to described first page and/or user's visit capacity, Can be by described first page by the quantity of other page links, corresponding the searching of described first page in website Crawl amount held up in index and/or user's visit capacity correspondence collects, in order in the most comprehensive, chain number, user access Flow, described first page is carried out by search engine collecting amount various dimensions ground the need of chain input in increasing Judge.Visit capacity in view of large-scale website every day is relatively big, needs the daily record quantity analyzed relatively big, permissible Use hadoop Map-Reduce Distributed Calculation cluster add up described first page by website its His quantity of page link, search engine collecting amount and user's visit capacity.Wherein, Map node is permissible The key used when the key word of the page is calculated as Map-Reduce, in order to Reducue node is permissible The statistical result of above-mentioned different dimensions is collected.
Owing to search engine optimization is a dynamic process, need at regular intervals the cycle according to up-to-date The internal chain of data re-start recommendation, therefore, in some possible embodiments, can be every necessarily Time cycle analyzes the reptile behavior in this time cycle and user accesses style of writing, thus counts this time Search engine collecting amount in cycle and user's visit capacity.Correspondingly, the embodiment of the present application is realized In generating the method for chain can at regular intervals the cycle utilize this time cycle in the search engine that counts Crawl amount and user's visit capacity internal chain recommendation results re-start calculating, form new interior chain and recommend knot Really, generally form closed loop, reach the effect of Continuous optimization.
It should be noted that above in association with the drain ability of first page judge first page the need of In increasing, chain is only in a kind of possible embodiment.In other possible embodiments, described first The page, can be according only to described first page by other page links in website the need of chain in increasing Quantity judges.For example, it is possible to judge whether that described first page is by the number of other page links in website In amount is not up to, average magnitude thrown in by chain, if it is, judge that described first page needs to increase interior chain.No Cross, it is to be understood that combine the drain ability of first page to judge page 1 in the embodiment of the present application Face is the need of in the embodiment of chain in increasing, due to comprehensive multiple initial datas such as search engine reptile Whether daily record, user access logs etc. calculate the drain ability of the page, using the drain ability of the page as increasing Add the basis for estimation of interior chain, thus avoid the strong interior chain of drain ability and increase unnecessary interior chain input, Add, for the interior chain that drain ability is weak, the chance that interior chain is thrown in, reach to make full use of the purpose of reptile resource.
Embodiment three:
In some possible embodiments, it is contemplated that number of site is pressed the industrial nature of content of pages and divided page Classification belonging to face, the dependency between the associated class page now is higher, therefore, it can be correlated with classification The dependency recommended with chain in improving as the foundation choosing second page.
For example, with reference to Fig. 4, the method flow schematic diagram generating interior chain provided for the embodiment of the present application.As Shown in Fig. 4, the method may include that
S410, calculating first page are by the quantity of other page links in website.
S420, according to described first page by the quantity of other page links in website, it is judged that described first The page is the need of chain in increasing.
S430 is if it is required, calculate the interior chain quantity that described first page needs to increase, from described the Classification belonging to one page is correlated with in classification to select and is thrown in respective numbers the second of chain into described first page The page.
Wherein, the classification of being correlated with of classification belonging to described first page can include classification belonging to described first page Self and/or, with described first page belonging to other classifications of reaching a certain height of classification degree of association.
Throw in for example, it is possible to select from classification self belonging to described first page for described first page The second page of chain in respective numbers
S440, to generate be the interior chain of chain in described first page input respective numbers in described second page Recommendation results.
It should be noted that can have belonging relation between some classification in website, it is also possible to mutually Independent, this is not limited by the embodiment of the present application.
In some possible embodiments, between some classification of website, layer can be determined according to belonging relation Level also forms classification relational tree.For example, it is assumed that the classification relational tree of number of site has five levels, bag Include: top, one-level, two grades, three grades, bottom leaf classification.Such as, key word " mp3 " is right The page answered can be divided into " consumer goods " class now, and " consumer goods " classification is " digital " class Purpose upper strata classification, " digital " classification is lower floor's classification of " consumer goods " classification.Visible, closer to Leaf classification, the dependency between the same class page now is higher.And for top classification, The dependency of top layer class all pages now reaches minimum.
In conjunction with above-mentioned embodiment, described select from classification belonging to first page is correlated with classification For the second page of chain in described first page input respective numbers, the tree between classification specifically can be followed Shape structure, starts the most therefrom to choose second page from leaf classification belonging to described first page, from And reach preferentially to choose the purpose of the higher page of dependency.
In some possible embodiments, it is contemplated that when successively choosing second page, may be owing to having The reasons such as the page quantity of chain clear position in first page is not enough can be thrown in, cause reaching top classification Time before described top classification the selected all second page deficiencies taken out think that described first page is thrown Put chain in respective numbers, in this case, then random manner can be used to continue to choose insufficient section Second page, thus obtain enough throwing in the second page of chain in respective numbers for described first page.
Embodiment four:
In some possible embodiments, in order to support high performance data, services, make the WEB system of outside System can efficiently show interior chain, have employed and the mode that interior chain recommendation results puts into caching is stored chain pushes away Recommend result.
For example, with reference to Fig. 5, the method flow schematic diagram generating interior chain provided for the embodiment of the present application.As Shown in Fig. 5, the method may include that
S510, calculating first page are by the quantity of other page links in website.
S520, according to described first page by the quantity of other page links in website, it is judged that described first The page is the need of chain in increasing.
S530 if it is required, calculate described first page need increase interior chain quantity, select into One or more second pages of chain in described first page input respective numbers, generation key-value pair, wherein, Described key-value pair describes the first key word of described first page and the second key word of described second page Between one-to-one relationship.
S540, described key-value pair is stored caching in.
S550, when receiving page request, from described caching, read out described key-value pair in real time, will The link of first page corresponding to the first key word that described key-value pair describes, renders to described key-value pair and retouches In the second page that the second key word of stating is corresponding.
In this embodiment, the mode that have employed calculated off line generates interior chain, and interior chain is stored KV Among caching.Thus when a page generates and asks to recommend interior chain when, it is recommended that result has existed In caching, can directly postpone and access to obtain calculated off line interior chain recommendation results out, be shown to the page Among, complete final interior chain and throw in, improve the response efficiency of real-time WEB request.
The cycle at regular intervals mentioned in conjunction with above example two is according to the up-to-date internal chain of data again Carry out the embodiment recommended, the key-value pair that the up-to-date time cycle calculates can be completely covered KV Caching i.e. can reach the purpose redistributing interior chain.
It should be noted that the running environment of the method generating interior chain of application the embodiment of the present application offer is not Limit.Such as, method described in the embodiment of the present application can run on 64 services using linux operating system Under the running environment such as java, hadoop of device.
Embodiment five:
Below, in conjunction with above-mentioned multiple embodiments, possible enforcement a kind of to the embodiment of the present application Mode is described in detail.For example, with reference to Fig. 6, the method generating interior chain provided for the embodiment of the present application Possible schematic flow sheet, as shown in Figure 6, the method may include that
S610, calculating first page are by the quantity of other page links in website.
S620, according to described first page by the quantity of other page links in website, and, described Search engine collecting amount that one page face is corresponding and/or user's visit capacity corresponding to described first page, it is judged that Described first page is the need of chain in increasing.
S630 is if it is required, calculate the interior chain quantity that described first page needs to increase.
S630.1, using leaf classification belonging to described first page as current classification.
For example, it is assumed that classification level includes 0 layer, 1 layer, 2 layers, 3 layers, 4 layers totally five layers, then top class Purpose classification level is 0 layer, and the classification level that bottom leaf classification is is 4 layers, and therefore, it can ought Front classification is entered as 4 layers.
S630.2, judge that interior chain quantity that described first page needs to increase is whether equal to zero.
That is, judge that whether the interior chain breach amount of first page is equal to zero.If it is understood that The interior chain breach amount of one page is zero, then without continuing flow process.
If S630.3 is not equal to zero, it is judged that described current classification has reached top classification the most.
Such as, in conjunction with the above-mentioned classification level example of totally five layers, it can be determined that whether current classification Equal to 0 layer.
S630.4 is if it does not, select and have interior chain clear position and the page from described current classification The most not yet throw in the second page of the interior chain of described first page, in described first page needs to increase Chain quantity is deducted the quantity of chain in throwing in, it then follows the tree between classification to the second page currently chosen Shape structure, if described current classification has parent mesh, is updated to described current class by described current classification Purpose parent mesh, returns to step S530.2, and the described first page of described judgement needs the interior chain number increased Measure the most null step.
For example, it is assumed that current classification is 4 layers, then the current classification after updating is 3 layers.
S630.5, if it is, need the interior chain quantity increased according to described first page, randomly selects out For the second page of chain in described first page input respective numbers.
S640, generation key-value pair, wherein, described key-value pair describes the first key of described first page One-to-one relationship between word and the second key word of described second page.
S650, described key-value pair is stored caching in.
S660, when receiving page request, from described caching, read out described key-value pair in real time, will The link of first page corresponding to the first key word that described key-value pair describes, renders to described key-value pair and retouches In the second page that the second key word of stating is corresponding.
Below, in generating the application offer in conjunction with following example, the device of chain describes in detail.
See Fig. 7, the apparatus structure schematic diagram generating interior chain provided for the embodiment of the present application.Such as Fig. 7 institute Showing, this device may include that
Throw interior chain computing unit 710, may be used for calculating first page by other page links in website Quantity.Interior chain increases judging unit 720, may be used for according to described first page by other pages in website The quantity of link, it is judged that described first page is the need of chain in increasing.Interior chain breach computing unit 730, If may be used for described interior chain increase judging unit to be judged to needs, calculate described first page needs The interior chain quantity increased.The candidate page chooses unit 740, may be used for selecting and throws for described first page Put one or more second pages of chain in respective numbers.Interior chain signal generating unit 750, may be used for generating It is the interior chain recommendation results of chain in described first page input respective numbers in described second page.
Visible, what configuration the embodiment of the present application provided generates the device of interior chain, can throw in based on interior chain Amount, carries out the input of chain in respective numbers so that interior chain distributes more adduction to the page with interior chain breach Reason, it is to avoid the interior chain higher with user behavior, search temperature or page relevance repeats in a large number, And chain does not the most obtain the problem of the chance thrown in other, decrease the waste to reptile resource.
In some possible embodiments, described interior chain increases judging unit 720, may be used for according to described First page by website other page links quantity and, search corresponding to described first page is drawn Hold up crawl amount, it is judged that described first page is the need of chain in increasing.Or, described interior chain increases judgement Unit 720, may be used for according to described first page by website other page links quantity and, User's visit capacity that described first page is corresponding, it is judged that described first page is the need of chain in increasing.Or Person, described interior chain increases judging unit 720, may be used for according to described first page by other pages in website The search engine collecting amount that the quantity of face link, described first page are corresponding, and, described first page Corresponding user's visit capacity, it is judged that described first page is the need of chain in increasing.
In some possible embodiments, described interior chain increases judging unit 720, may be used for judging whether Described first page is thrown in average magnitude by the interior chain of the quantity of other page links in website, and, institute State search engine collecting amount corresponding to first page and not up to preset crawl amount, and, described first page pair The user's visit capacity answered not up to presets visit capacity, if it is, judge that described first page needs to increase Interior chain.
In some possible embodiments, this device can also include: search engine collecting amount statistic unit 760, may be used for obtaining search engine reptile daily record, the crawler capturing number of times corresponding to described first page Add up, obtain the search engine collecting amount that described first page is corresponding.And/or, user's visit capacity Statistic unit 761, may be used for obtaining user access logs, accesses the user that described first page is corresponding Quantity is added up, and obtains user's visit capacity that described first page is corresponding.
In some possible embodiments, described interior chain signal generating unit 750, may be used for generating key-value pair, Wherein, described key-value pair describes the first key word of described first page and the second of described second page One-to-one relationship between key word.And, this device can also include, key-value pair buffer unit 751, May be used for being stored by described key-value pair in caching.Interior chain puts into unit 752, may be used for when receiving During page request, from described caching, read out described key-value pair in real time, described by described key-value pair The link of the first page that one key word is corresponding, renders to the second key word correspondence that described key-value pair describes Second page in.
In some possible embodiments, described second page can be to have interior chain clear position and page The page of chain in described first page is not yet thrown in face.
In some possible embodiments, the described candidate page chooses unit 740, may be used for from described Classification belonging to one page is correlated with in classification to select and is thrown in respective numbers the second of chain into described first page The page.
In some possible embodiments, the described candidate page chooses unit 740, may be used for following classification Between tree structure, start the most therefrom to choose second from leaf classification belonging to described first page The page.In conjunction with in other possible embodiments of this embodiment, the described candidate page chooses unit 740, it is also possible to if the institute being used for reaching top classification and taking out selected by before described top classification Described first page throws in chain in respective numbers to have second page deficiency to think, then use random manner choosing Take the second page of insufficient section.
It should be noted that search engine collecting amount statistic unit 760, described use described in the embodiment of the present application Family visit capacity statistic unit 761, described key-value pair buffer unit 751, described interior chain put into unit 752 at figure With dotted lines in 7, to represent that these unit are not the devices generating interior chain that the embodiment of the present application provides Necessary unit.
For convenience of description, it is divided into various unit to be respectively described with function when describing apparatus above.Certainly, The function of each unit can be realized in same or multiple softwares and/or hardware when implementing the application.
As seen through the above description of the embodiments, those skilled in the art is it can be understood that arrive The application can add the mode of required general hardware platform by software and realize.Based on such understanding, The part that prior art is contributed by the technical scheme of the application the most in other words can be with software product Form embody, this computer software product can be stored in storage medium, as ROM/RAM, Magnetic disc, CD etc., including some instructions with so that computer equipment (can be personal computer, Server, or the network equipment etc.) perform each embodiment of the application or some part institute of embodiment The method stated.
Each embodiment in this specification all uses the mode gone forward one by one to describe, identical between each embodiment Similar part sees mutually, and what each embodiment stressed is different from other embodiments Part.For system embodiment, owing to it is substantially similar to embodiment of the method, so retouching That states is fairly simple, and relevant part sees the part of embodiment of the method and illustrates.
The application can be used in numerous general or special purpose computing system environment or configuration.Such as: Ge Renji Calculation machine, server computer, handheld device or portable set, laptop device, multicomputer system, System based on microprocessor, set top box, programmable consumer-elcetronics devices, network PC, small-sized calculating Machine, mainframe computer, the distributed computing environment including any of the above system or equipment etc..
The application can described in the general context of computer executable instructions, Such as program module.Usually, program module includes performing particular task or realizing specific abstract data class The routine of type, program, object, assembly, data structure etc..Can also be in a distributed computing environment Put into practice the application, in these distributed computing environment, by by communication network connected remotely Reason equipment performs task.In a distributed computing environment, program module may be located at and includes storage device In interior local and remote computer-readable storage medium.
It should be noted that in this article, the relational terms of such as first and second or the like is used merely to One entity or operation are separated with another entity or operating space, and not necessarily requires or imply Relation or the order of any this reality is there is between these entities or operation.And, term " includes ", " comprise " or its any other variant is intended to comprising of nonexcludability, so that include that one is The process of row key element, method, article or equipment not only include those key elements, but also include the brightest Other key elements really listed, or also include intrinsic for this process, method, article or equipment Key element.In the case of there is no more restriction, statement " including ... " key element limited, It is not precluded from there is also in including the process of described key element, method, article or equipment other identical Key element.
The foregoing is only the preferred embodiment of the application, be not intended to limit the protection model of the application Enclose.All any modification, equivalent substitution and improvement etc. made within spirit herein and principle, all It is included in the protection domain of the application.

Claims (10)

1. the method generating interior chain, it is characterised in that including:
Calculate first page by the quantity of other page links in website;
According to described first page by the quantity of other page links in website, it is judged that described first page is No needs increases interior chain;
If it is required, calculate the interior chain quantity that described first page needs to increase, select as described the One or more second pages of chain in one page input respective numbers, generation is in described second page The interior chain recommendation results of chain in described first page input respective numbers.
Method the most according to claim 1, it is characterised in that described according to described first page quilt The quantity of other page links in website, it is judged that described first page the need of increase in chain particularly as follows:
According to described first page by website other page links quantity and, described first page Corresponding search engine collecting amount, it is judged that described first page is the need of chain in increasing;
Or,
According to described first page by website other page links quantity and, described first page Corresponding user's visit capacity, it is judged that described first page is the need of chain in increasing;
Or,
Corresponding by the quantity of other page links in website, described first page according to described first page Search engine collecting amount, and, user's visit capacity that described first page is corresponding, it is judged that described page 1 Face is the need of chain in increasing.
Method the most according to claim 2, it is characterised in that described according to described first page quilt The search engine collecting amount that the quantity of other page links in website, described first page are corresponding, and, User's visit capacity that described first page is corresponding, it is judged that described first page is concrete the need of chain in increasing For:
Judge whether that described first page is thrown in flat by the interior chain of the quantity of other page links in website All measure, and, the most default crawl amount of search engine collecting amount that described first page is corresponding, and, institute State user's visit capacity corresponding to first page and not up to preset visit capacity;
If it is, judge that described first page needs to increase interior chain.
Method the most according to claim 2, it is characterised in that also include:
Obtain search engine reptile daily record, the crawler capturing number of times that described first page is corresponding added up, Obtain the search engine collecting amount that described first page is corresponding;
And/or,
Obtain user access logs, user's access number that described first page is corresponding is added up, To user's visit capacity that described first page is corresponding.
Method the most according to claim 1, it is characterised in that described generation is in described second page In throw in the interior chain recommendation results of chain in respective numbers for described first page and include:
Generating key-value pair, wherein, described key-value pair describes the first key word and the institute of described first page State the one-to-one relationship between the second key word of second page;
And, also include: described key-value pair is stored in caching, when receiving page request, in real time From described caching, read out described key-value pair, the first key word that described key-value pair is described corresponding The link of one page, renders in second page corresponding to the second key word that described key-value pair describes.
Method the most according to claim 1, it is characterised in that described second page is for having interior chain The page of chain in described first page is not yet thrown in clear position and the page.
7. according to the method described in any one of claim 1-6, it is characterised in that described in select as institute State first page to throw in the second page of chain in respective numbers and include: from classification phase belonging to described first page Close and classification selects into the second page of chain in described first page input respective numbers.
Method the most according to claim 7, it is characterised in that described from classification belonging to first page Relevant classification selects and throws in the second page of chain in respective numbers into described first page, specifically follow Tree structure between classification, starts the most therefrom to choose from leaf classification belonging to described first page Second page.
Method the most according to claim 8, it is characterised in that described in select as described page 1 In face input respective numbers, the second page of chain includes:
If all second pages reaching top classification and take out selected by before described top classification Deficiency thinks that described first page throws in chain in respective numbers, then use random manner to choose insufficient section Second page.
10. the device throwing in interior chain, it is characterised in that including:
Throw interior chain computing unit, for calculating first page by the quantity of other page links in website;
Interior chain increases judging unit, is used for according to described first page by the number of other page links in website Amount, it is judged that described first page is the need of chain in increasing;
Interior chain breach computing unit, if increasing judging unit for described interior chain to be judged to needs, calculates Go out the interior chain quantity that described first page needs to increase;
The candidate page chooses unit, for selecting as in described first page input respective numbers the one of chain Individual or multiple second pages;
Interior chain signal generating unit, is that described first page throws in respective counts for generating in described second page The interior chain recommendation results of chain in amount.
CN201510202200.XA 2015-04-24 2015-04-24 The method and device of chain in a kind of generation Active CN106156230B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510202200.XA CN106156230B (en) 2015-04-24 2015-04-24 The method and device of chain in a kind of generation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510202200.XA CN106156230B (en) 2015-04-24 2015-04-24 The method and device of chain in a kind of generation

Publications (2)

Publication Number Publication Date
CN106156230A true CN106156230A (en) 2016-11-23
CN106156230B CN106156230B (en) 2019-11-08

Family

ID=57346363

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510202200.XA Active CN106156230B (en) 2015-04-24 2015-04-24 The method and device of chain in a kind of generation

Country Status (1)

Country Link
CN (1) CN106156230B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106528861A (en) * 2016-11-30 2017-03-22 福建中金在线信息科技有限公司 Method and device for adding internal chain
CN107908767A (en) * 2017-11-29 2018-04-13 链家网(北京)科技有限公司 Chain processing method and processing device in the bottom of website
CN108153797A (en) * 2016-12-02 2018-06-12 北京国双科技有限公司 The recognition methods of target object and device
CN108345615A (en) * 2017-01-23 2018-07-31 阿里巴巴集团控股有限公司 A kind of dispensing of page link and launch method of adjustment and system
CN108681469A (en) * 2018-05-03 2018-10-19 武汉斗鱼网络科技有限公司 Page cache method, device, equipment based on android system and storage medium
CN110287444A (en) * 2019-07-02 2019-09-27 郑州悉知信息科技股份有限公司 Website detection method, device and storage medium
CN111611508A (en) * 2020-05-28 2020-09-01 江苏易安联网络技术有限公司 Identification method and device for actual website access of user
CN116910392A (en) * 2023-09-04 2023-10-20 杭州阿里巴巴海外网络科技有限公司 Inner chain generation method, data processing method, search method and computing device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101471801A (en) * 2007-12-28 2009-07-01 上海亿动信息技术有限公司 Advertisement delivery method and apparatus for preventing fault advertisement in wireless internet
CN101820366A (en) * 2010-01-27 2010-09-01 南京邮电大学 Pre-fetching-based phishing web page detection method
KR20110105290A (en) * 2010-03-18 2011-09-26 김일 A method of backlink to connection and a recording medium for the same
CN103024148A (en) * 2012-11-26 2013-04-03 广东欧珀移动通信有限公司 Information identification method and system of mobile terminal
CN103279492A (en) * 2013-04-28 2013-09-04 乐视网信息技术(北京)股份有限公司 Method and device for catching webpage
CN103761343A (en) * 2014-02-21 2014-04-30 魏新成 Website navigation method through classification navigation sidebar and classification navigation window
WO2014201197A1 (en) * 2013-06-13 2014-12-18 Groom John System and method for searching, organizing, exploring and relating online content

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101471801A (en) * 2007-12-28 2009-07-01 上海亿动信息技术有限公司 Advertisement delivery method and apparatus for preventing fault advertisement in wireless internet
CN101820366A (en) * 2010-01-27 2010-09-01 南京邮电大学 Pre-fetching-based phishing web page detection method
KR20110105290A (en) * 2010-03-18 2011-09-26 김일 A method of backlink to connection and a recording medium for the same
CN103024148A (en) * 2012-11-26 2013-04-03 广东欧珀移动通信有限公司 Information identification method and system of mobile terminal
CN103279492A (en) * 2013-04-28 2013-09-04 乐视网信息技术(北京)股份有限公司 Method and device for catching webpage
WO2014201197A1 (en) * 2013-06-13 2014-12-18 Groom John System and method for searching, organizing, exploring and relating online content
CN103761343A (en) * 2014-02-21 2014-04-30 魏新成 Website navigation method through classification navigation sidebar and classification navigation window

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
云颖,等: "基于网页内容相似度和链接关系的社区发现及动态添加", 《郑州大学学报(理学版)》 *
刘媞媞: "基于链接分析的山东高校网站评价研究", 《泰山医学院学报》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106528861A (en) * 2016-11-30 2017-03-22 福建中金在线信息科技有限公司 Method and device for adding internal chain
CN108153797A (en) * 2016-12-02 2018-06-12 北京国双科技有限公司 The recognition methods of target object and device
CN108345615A (en) * 2017-01-23 2018-07-31 阿里巴巴集团控股有限公司 A kind of dispensing of page link and launch method of adjustment and system
CN107908767A (en) * 2017-11-29 2018-04-13 链家网(北京)科技有限公司 Chain processing method and processing device in the bottom of website
CN108681469A (en) * 2018-05-03 2018-10-19 武汉斗鱼网络科技有限公司 Page cache method, device, equipment based on android system and storage medium
CN110287444A (en) * 2019-07-02 2019-09-27 郑州悉知信息科技股份有限公司 Website detection method, device and storage medium
CN110287444B (en) * 2019-07-02 2021-06-25 郑州悉知信息科技股份有限公司 Website detection method and device and storage medium
CN111611508A (en) * 2020-05-28 2020-09-01 江苏易安联网络技术有限公司 Identification method and device for actual website access of user
CN116910392A (en) * 2023-09-04 2023-10-20 杭州阿里巴巴海外网络科技有限公司 Inner chain generation method, data processing method, search method and computing device
CN116910392B (en) * 2023-09-04 2024-01-09 杭州阿里巴巴海外网络科技有限公司 Inner chain generation method, data processing method, search method and computing device

Also Published As

Publication number Publication date
CN106156230B (en) 2019-11-08

Similar Documents

Publication Publication Date Title
CN106156230A (en) A kind of method and device generating interior chain
Henzinger Hyperlink analysis for the web
CN102930059B (en) Method for designing focused crawler
CN111708740A (en) Mass search query log calculation analysis system based on cloud platform
US20100287152A1 (en) System, method and computer readable medium for web crawling
Prajapati A survey paper on hyperlink-induced topic search (HITS) algorithms for web mining
US20120226685A1 (en) System for User Driven Ranking of Web Pages
CN102968591B (en) Malicious-software characteristic clustering analysis method and system based on behavior segment sharing
CN106126648A (en) A kind of based on the distributed merchandise news reptile method redo log
JP2009048380A5 (en)
CN110069693A (en) Method and apparatus for determining target pages
CN102930041A (en) Retrieval result real-time updating method based on user behavior information and system thereof
CN106202108A (en) Web crawlers captures method for allocating tasks and device and data grab method and device
CN103279492B (en) A kind of method and apparatus capturing webpage
CN104281619A (en) System and method for ordering search results
CN106897313B (en) Mass user service preference evaluation method and device
CN101763392A (en) Retrieval architecture and retrieval method
Leng et al. PyBot: an algorithm for web crawling
CN103258019B (en) Method and device for providing query result
CN104408156B (en) Website page includes the detection method and device of quantity in a search engine
Mahar et al. A comparative study on web crawling for searching hidden web
JP6510452B2 (en) Search server, search system, search information distribution system, search program, search information distribution program
JP5165717B2 (en) Dead link determination apparatus and method
CN105740255B (en) Network search method and device
Gupta et al. A novel user trend‐based priority assigner and URL scheduler for dynamic incremental crawling

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant