CN106156230B - The method and device of chain in a kind of generation - Google Patents

The method and device of chain in a kind of generation Download PDF

Info

Publication number
CN106156230B
CN106156230B CN201510202200.XA CN201510202200A CN106156230B CN 106156230 B CN106156230 B CN 106156230B CN 201510202200 A CN201510202200 A CN 201510202200A CN 106156230 B CN106156230 B CN 106156230B
Authority
CN
China
Prior art keywords
page
chain
interior chain
website
interior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510202200.XA
Other languages
Chinese (zh)
Other versions
CN106156230A (en
Inventor
黄华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201510202200.XA priority Critical patent/CN106156230B/en
Publication of CN106156230A publication Critical patent/CN106156230A/en
Application granted granted Critical
Publication of CN106156230B publication Critical patent/CN106156230B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9558Details of hyperlinks; Management of linked annotations

Abstract

This application discloses a kind of method and devices of chain in generation, to realize the purpose of chain in reasonable distribution.Such as, the method may include: first page is calculated by the quantity of other page links in website, according to the first page by the quantity of other page links in website, judge whether the first page needs to increase interior chain, if necessary, it calculates the first page and needs increased interior chain quantity, one or more second pages that chain in respective numbers is launched for the first page are selected, generation is the interior chain recommendation results that the first page launches chain in respective numbers in the second page.

Description

The method and device of chain in a kind of generation
Technical field
This application involves a kind of method and devices of chain in internet area more particularly to generation.
Background technique
Interior chain is being linked to each other between the page under same website domain name.Chain construction, can improve in reasonable website Search engine to stand in the page include and weight of website, realize stable SEO effect.For example, with reference to Arriba shown in FIG. 1 The page of bar website.In the page shown in Fig. 1, interior chain is shown in the page being devoted in the form of keyword.The page In shown keyword " vehicle-mounted mp3 picture ", " plug-in card mp3 picture " launched chain in corresponding.User can pass through click These keywords enter the page that these keywords are linked, alternatively, the crawler of search engine can grab these keywords The page linked.
Currently, interior chain is mostly launched according to user's history search behavior, popular keyword or page relevance.But It is that this will cause largely to repeat with user's history search behavior, popular keyword or the higher interior chain of page relevance, and The chance that other interior chains are never launched is made so that search engine crawler be made to repeat to grab a large amount of same pages At the waste of crawler resource.
Summary of the invention
In view of this, the method and device for being designed to provide chain in a kind of generation of the application, to realize reasonable distribution The purpose of interior chain.
In the first aspect of the embodiment of the present application, a kind of method of chain in generation is provided.For example, this method can wrap It includes: calculating first page by the quantity of other page links in website, according to the first page by other page chains in website The quantity connect, judges whether the first page needs to increase interior chain, if it is desired, calculates the first page and needs to increase Interior chain quantity, select for the first page launch respective numbers in chain one or more second pages, generate in institute State the interior chain recommendation results for launching chain in respective numbers in second page for the first page.
In the second aspect of the embodiment of the present application, a kind of device of chain in generation is provided.For example, the device can wrap It includes: having thrown interior chain computing unit, can be used for calculating first page by the quantity of other page links in website.Interior chain increase is sentenced Disconnected unit, can be used for being judged that the first page is by the quantity of other page links in website according to the first page No needs increase interior chain.Interior chain notch computing unit is judged to needing, count if can be used for the interior chain and increase judging unit It calculates the first page and needs increased interior chain quantity.Candidate page selection unit, can be used for selecting is described first The page launches one or more second pages of chain in respective numbers.Interior chain generation unit can be used for generating described second The interior chain recommendation results of chain in respective numbers are launched in the page for the first page.
As it can be seen that the application has the following beneficial effects:
Since the embodiment of the present application is calculated first page by the quantity of other page links in website, according to institute First page is stated by the quantity of other page links in website, is sentenced to whether the first page needs to increase interior chain It is disconnected, increased interior chain quantity is needed so as to for the first page with interior chain notch, calculate it, selects and is thrown for it The second page of chain in respective numbers is put, generating in the second page is that the first page launches chain in respective numbers Interior chain recommendation results, due to interior chain recommendation results generated can be used for be in second page the first page launch with The interior chain of interior chain notch respective numbers avoids and user behavior, search temperature or page so that the distribution of interior chain is more reasonable Correlation higher interior chain in face largely repeats, and the problem of chance that other interior chains are never launched, it reduces Waste to crawler resource.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The some embodiments recorded in application, for those of ordinary skill in the art, in the premise of not making the creative labor property Under, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is the page schematic diagram for having launched interior chain;
The method flow schematic diagram of chain in the generation that Fig. 2 provides for one embodiment of the application;
The method flow schematic diagram of chain in the generation that Fig. 3 provides for another embodiment of the application;
The method flow schematic diagram of chain in the generation that Fig. 4 provides for the another embodiment of the application;
The method flow schematic diagram of chain in the generation that Fig. 5 provides for the application another embodiment;
The method flow schematic diagram of chain in the generation that Fig. 6 provides for the application another embodiment;
The apparatus structure schematic diagram of chain in the generation that Fig. 7 provides for one embodiment of the application.
Specific embodiment
In order to make those skilled in the art better understand the technical solutions in the application, below in conjunction with the application reality The attached drawing in example is applied, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described implementation Example is merely a part but not all of the embodiments of the present application.Based on the embodiment in the application, this field is common The application protection all should belong in technical staff's every other embodiment obtained without creative efforts Range.
In the following, describing in detail with the following Examples to the method for chain in generation provided by the present application.
Embodiment one:
It referring to fig. 2, is the method flow schematic diagram of chain in generation provided by the embodiments of the present application.As shown in Fig. 2, this method May include:
S210, first page is calculated by the quantity of other page links in website.
In some possible embodiments, the page in website can be indicated and be carried out with its corresponding unique key word Interior chain is recommended to calculate.Wherein, the corresponding unique key word of the page can be obtained by parsing the URL of the page.
It should be noted that first page described herein can be any one or more pages in website.When described When first page refers to multiple pages in website, it can be directed to each first page, the embodiment of the present application is utilized respectively and is provided Method, for its generate in chain recommendation results.For example, in some possible embodiments, it can be for all pages in website The judgement of chain notch and interior chain are recommended in carrying out.In this embodiment, the first page can be whole pages in station. For example, the embodiment for the page being devoted to is shown in the form of keyword in conjunction with interior chain, it can be to search in Website application The page that is linked of whole keywords grabbed, and then calculate whole pages respectively by other page links in website Number.
First dispensing can be arrived for the ease of the statistics page by the quantity of other page links in conjunction with above embodiment URL in same page A record to be formed [keyword A, URL1, URL2 ... URLn] format as a result, each URL is parsed again At corresponding keyword, the two-value pair of multiple [keyword A, keyword B] formats is formed, this result is indicated in the keyword A page The internal links to the keyword B page are inside launched.It is subsequent result to be counted based on two-value.It should be noted that Whole keyword described in the embodiment can be searched using receptible all keywords, the visit beyond the keyword range Asking will be rejected.In addition, the embodiment of the present application is unlimited to the grasp mode of the page, for example, can be based on httpclient etc. Http client component of increasing income carries out page crawl.Wherein, URL is parsed into keyword, and internal URL coding strategy can be used De-parsing.For example, can be incited somebody to action using the URL coding strategy de-parsing inside the Alibaba Website " url:http: // Www.1688.com/chanpin/-6D7033.html " is parsed into keyword " mp3 ", the specific implementation of coding strategy de-parsing It is referred to general fashion realization, this will not be detailed here.
In conjunction with above description, which can calculate each page in website respectively by website by following below scheme The quantity of other interior page links, comprising:
A. the page that whole keywords are linked is grabbed;
B. it is directed to any one untreated page A, the URL of page A itself is parsed, the key of the page is obtained Word A;
C. URL1 to the URLn for the link launched in page A is extracted, record forms [keyword A, URL1, URL2 ... URLn] format result;
D. it parses, obtains for any of [keyword A, URL1, URL2 ... URLn] URL for not generating two-value pair To the keyword B of the URL, the two-value pair of [keyword A, keyword B] is formed;
E. according to the two-value of [keyword A, keyword B] to quantity, to the corresponding page of keyword B by other pages in website The quantity of face link adds up;
F. judge whether that URL1 to URLn correspondence generates two-value pair, if so, step B is returned to, if not, returning To step D;
G. if the corresponding page of whole keywords handles completion, the corresponding page of whole keywords is obtained Respectively by the quantity of other page links in website.
S220, according to the first page by the quantity of other page links in website, whether judge the first page Need to increase interior chain.
S230, if desired, calculating the first page needs increased interior chain quantity, select as the first page Launch one or more second pages of chain in respective numbers in face.
It should be noted that can be launched in a second page the interior chain an of first page, can launch it is multiple not With the interior chain of first page, the interior chains of multiple identical first pages can also be launched, therefore, chooses how many different second pages Face can determine according to actual needs, as long as meeting quantity enough is that the first page launches chain in respective numbers.Separately Outside, the selection range of second page can need to be arranged according to actual implementation.For example, the second page can be any page Face.For another example the second page can be for not yet launching chain in the first page in interior chain clear position and the page The page.Wherein, each page can be previously provided with corresponding interior chain clear position quantity, every to launch in one into the page The interior chain clear position quantity of chain, the page subtracts one, and into the page, chain number adds one in dispensing, when chain clear position in the page When being zero, do not allow to continue to continue chain in dispensing into the page.For another example being shown in the page in the form of keyword in interior chain Embodiment in, for the first page launch in chain second page, can on above-mentioned two conditioned basic simultaneously Meet the page in the page with the corresponding keyword of first page.
As it can be seen that by controlling the interior chain quantity in the page for chain clear position quantity in page setup, it can be to avoid one Chain in excessive is launched in the page and causes interior chain distribution uneven.It is understood that above-mentioned condition is only the application reality The some possible embodiments of example are applied, the application is not limited this.
S240, generation are the interior chain recommendation knot that the first page launches chain in respective numbers in the second page Fruit.
It, can be based on interior chain injected volume, to having as it can be seen that using the method for chain in generation provided by the embodiments of the present application The page of interior chain notch carries out the dispensing of chain in respective numbers so that the distribution of interior chain is more reasonable, avoid with user behavior, search Rope temperature or the higher interior chain of page relevance largely repeat, and the chance that other interior chains are never launched Problem reduces the waste to crawler resource.
Embodiment two:
In some possible embodiments, it is contemplated that though the interior chain lazy weight of the first page, but it drains energy Power is stronger, and such as searched engine repeatedly grabs, and is repeatedly accessed by user, then the first page does not need to continue in recommendation Chain.Therefore, the embodiment to the first page whether need to increase interior chain with specific reference to the first page by website its The quantity of his page link simultaneously judges in conjunction with its drainage ability.
It is the method flow schematic diagram of chain in generation provided by the embodiments of the present application for example, with reference to Fig. 3.As shown in figure 3, This method may include:
S310, first page is calculated by the quantity of other page links in website.
S320, according to the first page by the quantity of other page links in website, and, the first page is corresponding Search engine collecting amount and/or the corresponding user's amount of access of the first page, judge whether the first page needs Chain in increasing.
For example, can decide whether that the first page is launched by the not up to interior chain of the quantity of other page links in website Average magnitude, and the not up to default crawl amount of the corresponding search engine collecting amount of the first page, if it is, determining described the One page needs to increase interior chain, if it is not, then determining that the first page does not need to increase interior chain.It is understood that described First page by the quantity of other page links in website should reach interior chain launch average magnitude be only the embodiment of the present application one kind can The embodiment of energy.The first page is needed to increase under the conditions of which kind of reaches interior by the quantity of other page links in website Chain specifically can be set according to actual needs, and the application is to this and is not limited.
For another example can decide whether that the first page is thrown by the not up to interior chain of the quantity of other page links in website Average magnitude, and the not up to default amount of access of the corresponding user's amount of access of the first page are put, if it is, determining described first The page needs to increase interior chain.
In another example, it can be determined that whether the first page is thrown by the not up to interior chain of the quantity of other page links in website Average magnitude is put, and, the not up to default crawl amount of the corresponding search engine collecting amount of the first page, and, the first page The not up to default amount of access of corresponding user's amount of access, if it is, determining that the first page needs to increase interior chain.
S330, if desired, calculating the first page needs increased interior chain quantity, select as the first page Launch one or more second pages of chain in respective numbers in face.
S340, generation are the interior chain recommendation knot that the first page launches chain in respective numbers in the second page Fruit.
It should be noted that in embodiment of above, the corresponding search engine collecting amount of the first page and institute The statistical method for stating the corresponding user's amount of access of first page can be implemented according to actual needs, and the application is to this and without limit System.For example, search engine collecting amount, Yong Hufang can be counted by analyzing web site log in some possible embodiments The amount of asking.Wherein, log may include the log of search engine crawler and user access logs.In fact, crawler access is also one The special user's access of kind, still, can all distinguish the two in general station.The record of user access logs is to pass through browser The page is opened to trigger asynchronous JS and record, and search engine crawler log is recorded by the crawler capturing page, therefore, The two can be distinguished clearly.Specifically, for example, in some possible embodiments, the corresponding search engine of the first page Crawl amount is calculated especially by following steps and is obtained: obtaining the log of search engine crawler, crawler corresponding to the first page Crawl number is counted, and the corresponding search engine collecting amount of the first page is obtained.The corresponding user of the first page Amount of access can specifically be calculated by following steps and be obtained: obtain user access logs, user corresponding to the first page Access number is counted, and the corresponding user's amount of access of the first page is obtained.
After the corresponding search engine collecting amount of the first page and/or user's amount of access is calculated, it can incite somebody to action The first page by the corresponding search engine collecting amount of the quantity of other page links, the first page in website and/or User's amount of access correspondence summarizes, so as to chain number, user's flowing of access in next comprehensive, search engine collecting amount various dimensions it is right Whether the first page, which needs to increase interior chain dispensing, is judged.In view of the daily amount of access of large-scale website is larger, need The log quantity of analysis is larger, and the first page can be counted using the Map-Reduce distributed computing cluster of hadoop By the quantity of other page links, search engine collecting amount and user's amount of access in website.Wherein, Map node can be by page The keyword in face used key when being calculated as Map-Reduce, so that Reducue node can be by above-mentioned different dimensions Statistical result is summarized.
Since search engine optimization is a dynamic process, need at regular intervals the period according to newest data pair Interior chain re-starts recommendation, therefore, can the time cycle of cycle analysis at regular intervals in some possible embodiments Interior crawler behavior and user access style of writing, to count the search engine collecting amount and user's access in the time cycle Amount.Correspondingly, in the generation that the embodiment of the present application is realized the method for chain can at regular intervals the period using the time week The internal chain recommendation results of search engine collecting amount and user's amount of access counted in phase re-start calculating, are formed new interior Chain recommendation results, generally form closed loop, achieve the effect that Continuous optimization.
It should be noted that judging whether first page needs to increase interior chain in conjunction with the drainage ability of first page above In only a kind of possible embodiment.In other possible embodiments, whether the first page needs to increase interior chain, It can be judged according only to the first page by the quantity of other page links in website.For example, can decide whether described First page launches average magnitude by the not up to interior chain of the quantity of other page links in website, if it is, determining described first The page needs to increase interior chain.However it should be appreciated that judging in the drainage ability of the embodiment of the present application combination first page Whether first page needs to increase in the embodiment of interior chain, due to such as search engine crawler log of the multiple initial data of synthesis, User access logs etc. calculate the drainage ability of the page, using the drainage ability of the page as whether increasing the judgment basis of interior chain, Increase unnecessary interior chain so as to avoid the strong interior chain of drainage ability to launch, increases interior chain for the weak interior chain of drainage ability and throw The chance put achievees the purpose that make full use of crawler resource.
Embodiment three:
In some possible embodiments, it is contemplated that number of site presses the affiliated class of industrial nature paging of content of pages Mesh, the correlation between the page of associated class now is higher, therefore, can be using classification correlation as the foundation for choosing second page To improve the correlation that interior chain is recommended.
It is the method flow schematic diagram of chain in generation provided by the embodiments of the present application for example, with reference to Fig. 4.As shown in figure 4, This method may include:
S410, first page is calculated by the quantity of other page links in website.
S420, according to the first page by the quantity of other page links in website, whether judge the first page Need to increase interior chain.
S430, if desired, calculating the first page needs increased interior chain quantity, belonging to the first page The second page that chain in respective numbers is launched for the first page is selected in classification correlation classification.
Wherein, the affiliated classification correlation classification of the first page may include the affiliated classification of the first page itself and/ Alternatively, other classifications reaching a certain height with the affiliated classification degree of correlation of the first page.
Respective numbers are launched for example, can select from the affiliated classification of the first page itself for the first page The second page of interior chain
S440, generation are the interior chain recommendation knot that the first page launches chain in respective numbers in the second page Fruit.
It should be noted that can have belonging relation between certain classifications in website, and it can also be mutually indepedent, this Shen Please embodiment to this and be not limited.
Level can be determined in some possible embodiments, between certain classifications of website according to belonging relation and is formed Classification relational tree.For example, it is assumed that there are five levels for the classification relational tree tool of number of site, comprising: top, level-one, second level, three Grade, bottom leaf classification.For example, the corresponding page of keyword " mp3 " can be divided into " consumer goods " class now, and " disappear Fei Pin " classification is the upper layer classification of " number " classification, and " number " classification is lower layer's classification of " consumer goods " classification.As it can be seen that more connecing Nearly leaf classification, correlation of the same class now between the page are higher.And for top classification, top class is now The correlation of all pages reaches minimum.
In conjunction with above-mentioned embodiment, described select from the affiliated classification correlation classification of first page is described One page launches the second page of chain in respective numbers, the tree structure between classification can be specifically followed, from the first page The affiliated leaf classification in face starts successively therefrom to choose second page upwards, to reach the preferential selection higher page of correlation Purpose.
In some possible embodiments, it is contemplated that when successively choosing second page, may can launch the due to having The reasons such as page quantity deficiency of chain clear position in one page, when leading to reach top classification the top classification it The preceding selected all second page deficiencies taken out think that the first page launches chain in respective numbers, in this case, then It can continue the second page of selection insufficient section using random manner, to obtain launching phase enough for the first page Answer the second page of chain in quantity.
Example IV:
In some possible embodiments, in order to support high performance data service, keep external WEB system efficient Interior chain is shown, interior chain recommendation results are stored in such a way that interior chain recommendation results are put into caching.
It is the method flow schematic diagram of chain in generation provided by the embodiments of the present application for example, with reference to Fig. 5.As shown in figure 5, This method may include:
S510, first page is calculated by the quantity of other page links in website.
S520, according to the first page by the quantity of other page links in website, whether judge the first page Need to increase interior chain.
S530, if desired, calculating the first page needs increased interior chain quantity, select as the first page One or more second pages of chain in respective numbers are launched in face, generate key-value pair, wherein the key-value pair describes described the One-to-one relationship between first keyword of one page and the second keyword of the second page.
S540, the key-value pair is stored into caching.
S550, when receiving page request, the key-value pair is read out from the caching in real time, by the key-value pair The second keyword corresponding the to key-value pair description is launched in the link of the corresponding first page of the first keyword of description In two pages.
In this embodiment, interior chain is generated by the way of off-line calculation, and will be among interior chain storage to KV caching. To which when a page generates and requests to recommend interior chain, recommendation results are already present in caching, can directly be postponed The interior chain recommendation results of off-line calculation out are accessed to obtain, are shown among the page, final interior chain is completed and launches, improve in real time The response efficiency of WEB request.
The period at regular intervals mentioned in conjunction with above embodiments two re-starts according to the internal chain of newest data to push away The embodiment recommended, calculated key-value pair of newest time cycle can be completely covered can reach in KV caching divides again Purpose with interior chain.
It should be noted that unlimited using the running environment of the method for chain in generation provided by the embodiments of the present application.For example, The embodiment of the present application the method can be run on is transported using java, hadoop of 64 servers of linux operating system etc. Under row environment.
Embodiment five:
In the following, possible embodiment a kind of to the embodiment of the present application carries out in conjunction with above-mentioned multiple embodiments It is described in detail.For example, with reference to Fig. 6, it is the possible flow diagram of method of chain in generation provided by the embodiments of the present application, such as schemes Shown in 6, this method may include:
S610, first page is calculated by the quantity of other page links in website.
S620, according to the first page by the quantity of other page links in website, and, the first page is corresponding Search engine collecting amount and/or the corresponding user's amount of access of the first page, judge whether the first page needs Chain in increasing.
S630, if desired, calculating the first page needs increased interior chain quantity.
S630.1, using the affiliated leaf classification of the first page as current classification.
For example, it is assumed that classification level includes 0 layer, 1 layer, 2 layers, 3 layers, 4 layers totally five layers, then the classification level of top classification It is 0 layer, the classification level that bottom leaf classification is is 4 layers, therefore, current classification can be assigned a value of 4 layers.
S630.2, judge that the first page needs whether increased interior chain quantity is equal to zero.
That is, judging whether the interior chain notch amount of first page is equal to zero.It is understood that if first page it is interior Chain notch amount is zero, then without continuing process.
If S630.3, being not equal to zero, judge whether the current classification has reached top classification.
For example, in conjunction with above-mentioned classification level totally five layers of example, it can be determined that whether current classification is equal to 0 layer.
S630.4, if not, from being selected in the current classification with not yet being launched in interior chain clear position and the page The second page of the interior chain of the first page, needing to deduct in increased interior chain quantity from the first page will be to current choosing The second page taken launches the quantity of interior chain, it then follows the tree structure between classification, if the current classification has parent mesh, The current classification is updated to the parent mesh of the current classification, returns to step S530.2, the judgement first page Face needs the increased interior whether null step of chain quantity.
For example, it is assumed that current classification is 4 layers, then updated current classification is 3 layers.
S630.5, if so, needing increased interior chain quantity according to the first page, randomly selecting out is described first The page launches the second page of chain in respective numbers.
S640, key-value pair is generated, wherein the key-value pair describes the first keyword and described the of the first page One-to-one relationship between second keyword of two pages.
S650, the key-value pair is stored into caching.
S660, when receiving page request, the key-value pair is read out from the caching in real time, by the key-value pair The second keyword corresponding the to key-value pair description is launched in the link of the corresponding first page of the first keyword of description In two pages.
In the following, describing in detail in conjunction with device of the following embodiment to chain in generation provided by the present application.
It is the apparatus structure schematic diagram of chain in generation provided by the embodiments of the present application referring to Fig. 7.As shown in fig. 7, the device May include:
Interior chain computing unit 710 has been thrown, can be used for calculating first page by the quantity of other page links in website.It is interior Chain increases judging unit 720, can be used for according to the first page by the quantity of other page links in website, described in judgement Whether first page needs to increase interior chain.Interior chain notch computing unit 730, if can be used for the interior chain increases judging unit It is judged to needing, calculates the first page and need increased interior chain quantity.Candidate page selection unit 740, can be used for Select one or more second pages that chain in respective numbers is launched for the first page.Interior chain generation unit 750, can be with It is interior chain recommendation results that the first page launches chain in respective numbers in the second page for generating.
It, can be based on interior chain injected volume, to having as it can be seen that configure the device of chain in generation provided by the embodiments of the present application The page of interior chain notch carries out the dispensing of chain in respective numbers so that the distribution of interior chain is more reasonable, avoid with user behavior, search Rope temperature or the higher interior chain of page relevance largely repeat, and the chance that other interior chains are never launched Problem reduces the waste to crawler resource.
In some possible embodiments, the interior chain increases judging unit 720, can be used for according to the first page And, the corresponding search engine collecting amount of the first page judges described first by the quantity of other page links in website Whether the page needs to increase interior chain.Alternatively, the interior chain increases judging unit 720, can be used for according to the first page quilt In website the quantity of other page links and, the corresponding user's amount of access of the first page judges that the first page is No needs increase interior chain.Alternatively, the interior chain increases judging unit 720, can be used for according to the first page by website The corresponding search engine collecting amount of the quantity of other page links, the first page, and, the corresponding use of the first page Family amount of access, judges whether the first page needs to increase interior chain.
In some possible embodiments, the interior chain increases judging unit 720, can be used for judging whether described first The page by the quantity of other page links in website not up in chain launch average magnitude, and, the corresponding search of the first page The not up to default crawl amount of engine crawl amount, and, the corresponding not up to default amount of access of user's amount of access of the first page, such as Fruit is then to determine that the first page needs to increase interior chain.
In some possible embodiments, which can also include: search engine collecting amount statistic unit 760, can be with For obtaining search engine crawler log, the corresponding crawler capturing number of the first page is counted, obtains described The corresponding search engine collecting amount in one page face.And/or user's amount of access statistic unit 761, it can be used for obtaining user's access Log counts the corresponding user's access number of the first page, obtains the corresponding user's access of the first page Amount.
In some possible embodiments, the interior chain generation unit 750 can be used for generating key-value pair, wherein described Key-value pair describes the one-to-one correspondence between the first keyword of the first page and the second keyword of the second page Relationship.And the device can also include, key-value pair cache unit 751, can be used for storing the key-value pair into caching. Interior chain puts into unit 752, can be used for when receiving page request, reads out the key-value pair from the caching in real time, The link for the corresponding first page of the first keyword that the key-value pair is described is launched to the second of key-value pair description and is closed In the corresponding second page of keyword.
In some possible embodiments, the second page can for in interior chain clear position and the page not yet Launch the page of chain in the first page.
In some possible embodiments, the candidate page selection unit 740 can be used for from the first page institute Belong to the second page for selecting in classification correlation classification and launching chain in respective numbers for the first page.
In some possible embodiments, the candidate page selection unit 740 can be used for following the tree between classification Shape structure successively therefrom chooses second page upwards since the affiliated leaf classification of the first page.In conjunction with the embodiment Other possible embodiments in, the candidate page selection unit 740, if can be also used for reaching top classification And all second page deficiencies of selected taking-up think that the first page launches respective numbers before the top classification Interior chain then chooses the second page of insufficient section using random manner.
It should be noted that the embodiment of the present application described search engine crawl amount statistic unit 760, user's amount of access Statistic unit 761, the key-value pair cache unit 751, the interior chain investment unit 752 are drawn with a dashed line in Fig. 7, to indicate These units are not the necessary units of the device of chain in generation provided by the embodiments of the present application.
For convenience of description, it is divided into various units when description apparatus above with function to describe respectively.Certainly, implementing this The function of each unit can be realized in the same or multiple software and or hardware when application.
As seen through the above description of the embodiments, those skilled in the art can be understood that the application can It realizes by means of software and necessary general hardware platform.Based on this understanding, the technical solution essence of the application On in other words the part that contributes to existing technology can be embodied in the form of software products, the computer software product It can store in storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions are used so that a computer equipment (can be personal computer, server or the network equipment etc.) executes the certain of each embodiment of the application or embodiment Method described in part.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method Part explanation.
The application can be used in numerous general or special purpose computing system environments or configuration.Such as: personal computer, service Device computer, handheld device or portable device, laptop device, multicomputer system, microprocessor-based system, top set Box, programmable consumer-elcetronics devices, network PC, minicomputer, mainframe computer, including any of the above system or equipment Distributed computing environment etc..
The application can describe in the general context of computer-executable instructions executed by a computer, such as program Module.Generally, program module includes routines performing specific tasks or implementing specific abstract data types, programs, objects, group Part, data structure etc..The application can also be practiced in a distributed computing environment, in these distributed computing environments, by Task is executed by the connected remote processing devices of communication network.In a distributed computing environment, program module can be with In the local and remote computer storage media including storage equipment.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.
The foregoing is merely the preferred embodiments of the application, are not intended to limit the protection scope of the application.It is all Any modification, equivalent replacement, improvement and so within spirit herein and principle are all contained in the protection scope of the application It is interior.

Claims (10)

1. a kind of method of chain in generation characterized by comprising
First page is calculated by the quantity of other page links in website;
According to the first page by the quantity of other page links in website, it is interior to judge whether the first page needs to increase Chain;
If desired, calculating the first page needs increased interior chain quantity, selects and launch phase for the first page One or more second pages of chain in quantity are answered, generating in the second page is that the first page launches respective numbers The interior chain recommendation results of interior chain;
Wherein, judge whether the first page needs by the quantity of other page links in website according to the first page Chain in increasing, comprising: it is flat to judge whether that the first page is launched by the not up to interior chain of the quantity of other page links in website Measure, and, the not up to default crawl amount of the corresponding search engine collecting amount of the first page.
2. the method according to claim 1, wherein it is described according to the first page by other pages in website The quantity of link, judges whether the first page needs to increase interior chain specifically:
According to the first page by the quantity of other page links in website and, the corresponding user of the first page visits The amount of asking, judges whether the first page needs to increase interior chain;
Alternatively,
According to the first page by the quantity of other page links, the corresponding search engine collecting of the first page in website Amount, and, the corresponding user's amount of access of the first page judges whether the first page needs to increase interior chain.
3. according to the method described in claim 2, it is characterized in that, it is described according to the first page by other pages in website The corresponding search engine collecting amount of the quantity of link, the first page, and, the corresponding user's access of the first page Amount, judges whether the first page needs to increase interior chain specifically:
Judge whether that the first page launches average magnitude by the not up to interior chain of the quantity of other page links in website, and, institute The not up to default crawl amount of the corresponding search engine collecting amount of first page is stated, and, the corresponding user's access of the first page The not up to default amount of access of amount;
If it is, determining that the first page needs to increase interior chain.
4. according to the method described in claim 2, it is characterized by further comprising:
The log of search engine crawler is obtained, the corresponding crawler capturing number of the first page is counted, obtains described the The corresponding search engine collecting amount in one page face;
And/or
User access logs are obtained, the corresponding user's access number of the first page is counted, the first page is obtained The corresponding user's amount of access in face.
5. the method according to claim 1, wherein described generate is the first page in the second page The interior chain recommendation results of chain include: in face dispensing respective numbers
Generate key-value pair, wherein the key-value pair describe the first page the first keyword and the second page One-to-one relationship between second keyword;
And, further includes: by key-value pair storage into caching, when receiving page request, read from the caching in real time The key-value pair is taken out, the link for the corresponding first page of the first keyword that the key-value pair is described launches and arrives the key Value is in the corresponding second page of the second keyword of description.
6. the method according to claim 1, wherein the second page is with interior chain clear position and page The page of chain in the first page is not yet launched in face.
7. method according to claim 1-6, which is characterized in that described to select as first page dispensing The second page of chain includes: to select from the affiliated classification correlation classification of the first page as the first page in respective numbers Launch the second page of chain in respective numbers in face.
8. the method according to the description of claim 7 is characterized in that selected from the affiliated classification correlation classification of first page for The first page launches the second page of chain in respective numbers, the tree structure between classification is specifically followed, from described first The affiliated leaf classification of the page starts successively therefrom to choose second page upwards.
9. according to the method described in claim 8, it is characterized in that, described select launches respective numbers for the first page The second page of interior chain includes:
If reaching top classification and all second page deficiencies of selected taking-up thinking institute before the top classification It states first page and launches chain in respective numbers, then choose the second page of insufficient section using random manner.
10. the device of chain in a kind of dispensing characterized by comprising
Interior chain computing unit is thrown, for calculating first page by the quantity of other page links in website;
Interior chain increases judging unit, for according to the first page by the quantity of other page links in website, described in judgement Whether first page needs to increase interior chain;
Interior chain notch computing unit is judged to needing, calculates the first page if increasing judging unit for the interior chain Face needs increased interior chain quantity;
Candidate page selection unit, for selecting the one or more second for launching chain in respective numbers for the first page The page;
Interior chain generation unit is interior chain that the first page launches chain in respective numbers in the second page for generating Recommendation results;
Wherein, the interior chain increases judging unit, specifically for according to the first page by other page links in website Quantity, judges whether the first page needs to increase interior chain, comprising: judges whether the first page by other pages in website The quantity of face link not up in chain launch average magnitude, and, the corresponding search engine collecting amount of the first page is not up to pre- If crawl amount.
CN201510202200.XA 2015-04-24 2015-04-24 The method and device of chain in a kind of generation Active CN106156230B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510202200.XA CN106156230B (en) 2015-04-24 2015-04-24 The method and device of chain in a kind of generation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510202200.XA CN106156230B (en) 2015-04-24 2015-04-24 The method and device of chain in a kind of generation

Publications (2)

Publication Number Publication Date
CN106156230A CN106156230A (en) 2016-11-23
CN106156230B true CN106156230B (en) 2019-11-08

Family

ID=57346363

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510202200.XA Active CN106156230B (en) 2015-04-24 2015-04-24 The method and device of chain in a kind of generation

Country Status (1)

Country Link
CN (1) CN106156230B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106528861A (en) * 2016-11-30 2017-03-22 福建中金在线信息科技有限公司 Method and device for adding internal chain
CN108153797A (en) * 2016-12-02 2018-06-12 北京国双科技有限公司 The recognition methods of target object and device
CN108345615B (en) * 2017-01-23 2022-03-25 阿里巴巴集团控股有限公司 Method and system for releasing and adjusting page links
CN107908767B (en) * 2017-11-29 2020-07-07 贝壳找房(北京)科技有限公司 Website bottom internal link processing method and device
CN108681469B (en) * 2018-05-03 2021-07-30 武汉斗鱼网络科技有限公司 Page caching method, device, equipment and storage medium based on Android system
CN110287444B (en) * 2019-07-02 2021-06-25 郑州悉知信息科技股份有限公司 Website detection method and device and storage medium
CN111611508B (en) * 2020-05-28 2020-12-15 江苏易安联网络技术有限公司 Identification method and device for actual website access of user
CN116910392B (en) * 2023-09-04 2024-01-09 杭州阿里巴巴海外网络科技有限公司 Inner chain generation method, data processing method, search method and computing device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101471801A (en) * 2007-12-28 2009-07-01 上海亿动信息技术有限公司 Advertisement delivery method and apparatus for preventing fault advertisement in wireless internet
CN101820366A (en) * 2010-01-27 2010-09-01 南京邮电大学 Pre-fetching-based phishing web page detection method
KR20110105290A (en) * 2010-03-18 2011-09-26 김일 A method of backlink to connection and a recording medium for the same
CN103024148A (en) * 2012-11-26 2013-04-03 广东欧珀移动通信有限公司 Information identification method and system of mobile terminal
CN103279492A (en) * 2013-04-28 2013-09-04 乐视网信息技术(北京)股份有限公司 Method and device for catching webpage
CN103761343A (en) * 2014-02-21 2014-04-30 魏新成 Website navigation method through classification navigation sidebar and classification navigation window
WO2014201197A1 (en) * 2013-06-13 2014-12-18 Groom John System and method for searching, organizing, exploring and relating online content

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101471801A (en) * 2007-12-28 2009-07-01 上海亿动信息技术有限公司 Advertisement delivery method and apparatus for preventing fault advertisement in wireless internet
CN101820366A (en) * 2010-01-27 2010-09-01 南京邮电大学 Pre-fetching-based phishing web page detection method
KR20110105290A (en) * 2010-03-18 2011-09-26 김일 A method of backlink to connection and a recording medium for the same
CN103024148A (en) * 2012-11-26 2013-04-03 广东欧珀移动通信有限公司 Information identification method and system of mobile terminal
CN103279492A (en) * 2013-04-28 2013-09-04 乐视网信息技术(北京)股份有限公司 Method and device for catching webpage
WO2014201197A1 (en) * 2013-06-13 2014-12-18 Groom John System and method for searching, organizing, exploring and relating online content
CN103761343A (en) * 2014-02-21 2014-04-30 魏新成 Website navigation method through classification navigation sidebar and classification navigation window

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于网页内容相似度和链接关系的社区发现及动态添加;云颖,等;《郑州大学学报(理学版)》;20110331;第43卷(第1期);全文 *
基于链接分析的山东高校网站评价研究;刘媞媞;《泰山医学院学报》;20111231;第32卷(第12期);第921-924页第1-3节 *

Also Published As

Publication number Publication date
CN106156230A (en) 2016-11-23

Similar Documents

Publication Publication Date Title
CN106156230B (en) The method and device of chain in a kind of generation
US9940391B2 (en) System, method and computer readable medium for web crawling
CN103607496B (en) A method and an apparatus for deducting interests and hobbies of handset users and a handset terminal
CN104486350A (en) Network content acceleration method based on user behavior
JP2009015407A5 (en)
Prajapati A survey paper on hyperlink-induced topic search (HITS) algorithms for web mining
CN106776693A (en) A kind of website data acquisition method and device
CN101046806B (en) Search engine system and method
CN109672757A (en) File access method and file access processing unit
CN107291826A (en) File search processing method and processing device
Nigam et al. Analysis of Markov model on different web Prefetching and caching schemes
Rizvi et al. A preliminary review of web-page recommendation in information retrieval using domain knowledge and web usage mining
CN103460205A (en) Method and apparatus for web page prefetching
Nasser et al. Clustering web users for reductions the internet traffic load and users access cost based on K-means algorithm
CN107436940A (en) The method of web front-end Dynamic Display data based on user profile behavioural analysis
Ramanathan et al. Creating user profiles using wikipedia
Liu et al. Web crawling
Leng et al. PyBot: an algorithm for web crawling
Lin et al. A novel website structure optimization model for more effective web navigation
CN103455483A (en) Collecting and processing method and system for on-site search data
CN108255868A (en) Check the method and apparatus linked in website
Gupta et al. A novel user trend‐based priority assigner and URL scheduler for dynamic incremental crawling
Amini et al. Data sets for offline evaluation of scholar’s recommender system
Thwe Web page access prediction based on integrated approach
Hui et al. Application and research of heuristic search algorithm in crawler field

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant