Summary of the invention
Technical matters to be solved by this invention provides a kind of Websites navigation method and system, and is too many to solve existing Web side navigation pattern classification level, and the problem of how fast and effeciently to carry out information filtering and extraction in the mass data.
For solving the problems of the technologies described above, the invention provides a kind of Websites navigation method, comprising:
At least a ordering rule for user's selection is provided;
Regularly read list of websites;
Corresponding described ordering rule sorts the website in the described list of websites;
Select or the ordering rule of default setting according to the user, the ranking results of correspondence is shown according to the one-level classification.
Wherein, described ordering rule is carried out following ordered steps for to sort according to website homepage content change degree: grasp the website homepage content; Described content is filtered the processing of html label and separation content; The latest data of result and preservation is compared, calculate rate of change, and preserve described result; Size according to described rate of change sorts from high to low.
Wherein, described ordering rule sorts for webpage rank (PR) rank of calculating according to Google, carries out following ordered steps: according to the network address in the described list of websites, search the PR value of corresponding website; Size according to described PR value sorts from high to low.
Wherein, described ordering rule is carried out following ordered steps for to sort according to the Alexa rank: according to the network address in the described list of websites, search the Alexa value of corresponding website; Size according to described Alexa value sorts from low to high.
Preferably: show the ranking results in the preset range.
The present invention also provides a kind of Web side navigation system, comprising:
Storage unit is used to preserve list of websites and ranking results;
Data-reading unit is used for regularly reading list of websites;
Sequencing unit is used to provide at least a ordering rule for user's selection; Corresponding described ordering rule sorts the website in the described list of websites;
Display unit is used for the ordering rule according to user's selection or default setting, and the ranking results of correspondence is classified according to one-level to be shown.
Wherein, described ordering rule is for to sort according to website homepage content change degree, and sequencing unit is carried out following ordered steps: grasp the website homepage content; Described content is filtered the processing of html label and separation content; The latest data of result and storage unit preservation is compared, calculate rate of change, and described result is kept at storage unit; Size according to described rate of change sorts from high to low.
Wherein, described ordering rule sorts for webpage rank (PR) rank of calculating according to Google, and sequencing unit is carried out following ordered steps: according to the network address in the described list of websites, search the PR value of corresponding website; Size according to described PR value sorts from high to low.
Wherein, described ordering rule is for to sort according to the Alexa rank, and sequencing unit is carried out following ordered steps: according to the network address in the described list of websites, search the Alexa value of corresponding website; Size according to described Alexa value sorts from low to high.
Preferably: described display unit shows the ranking results in the preset range.
Compared with prior art, the present invention has the following advantages:
At first, the present invention regularly calculates navigation type website score value now according to certain rule, and the height according to score value comes the foremost demonstration with only website then.Because the arrangement of website is regularly to change, and the rule of ordering represented the user certain browse demand, so only the one-level navigation directory need be set, under the situation that does not increase the classification level, the website that the user can be wanted visit directly shows up front, thereby avoids the user need visit the problem of multistage classification.Especially for score value variation website greatly, can recommend the user according to the operation situation of website reality.And, because the classification range of first class catalogue is relatively large, so the problem that can avoid the user that websites collection is understood differently and causes.The method of the invention can be brought better experience to the user, and the user just can select the website fast when browsing the navigation website first class catalogue.
Secondly, the invention provides multiple sortord, satisfy user's multiple demand for user's selection.And, webpage rank (PR) ordering of sorting, calculating according to Google according to homepage content change degree that sortord is not limited to put down in writing among the present invention and the mode that sorts according to the Alexa rank, can also be according to user's different demands, increase other ordering rule, offer the more choice of user.
Once more, when the website after the ordering is shown to the user, set different threshold values according to different sortords, only the website of score value in threshold range shown, thereby give the user with significant recommendation of websites under the various sortords, and demonstration is just given up owing to there is not too big selection meaning in the website beyond some threshold ranges.
Embodiment
For above-mentioned purpose of the present invention, feature and advantage can be become apparent more, the present invention is further detailed explanation below in conjunction with the drawings and specific embodiments.
Core concept of the present invention is: the one-level navigation directory only is set, multiple sortord is provided, every kind of mode is calculated the score value of website according to certain rule, the website that ordering back is wanted the user to visit under the one-level split catalog is pushed to the front and shows, and score value is not shown in the website beyond the threshold range.
The present invention proposes a kind of novel Web side navigation station, with reference to Fig. 1, is the process flow diagram that the user uses Web side navigation of the present invention station.
Step 101, Website login select classification to check.Listed the classification of each website under first class catalogue in the homepage of Web side navigation of the present invention website and shown that described first class catalogue is to be classified according to the classification of maximum magnitude in all websites that show in the webpage, as is divided into amusement, news, finance and economics etc.After the user capture website enters homepage, a certain class content that selection will be checked.For enlarging the same class range of choice of listed website now, the user can click the classification keyword, enters first class catalogue.As clicking " amusement ", the page demonstrates the web site url of all amusement classes that navigation website includes.
Step 102, the user judges whether the current page content meets the custom of checking of oneself.When the user logined navigation website first, arrange the website under the first class catalogue was the arrangement of carrying out according to the system default setting.The invention provides three kinds of arrangement modes: arrange according to homepage content change degree, arrange, arrange according to the visit situation of website according to " importance " of website evaluation and test.If the arrangement mode that shows in the page meets user's the demand of browsing, then execution in step 103; Otherwise, execution in step 104.
Step 103 is clicked website links and is jumped to selected website, checks more contents.If the user wishes according to homepage content change degree, perhaps " importance " of website evaluation and test, perhaps various objectives such as page access amount is selected the website, and current arrangement mode meets user's demand, then the user directly clicks website links, can in numerous websites, select homepage content change maximum, perhaps most worthy, the perhaps the highest website of visit capacity fast.And when the website quantity that provides under the first class catalogue exceeded one page scope, the link that the user also can click down one page continued to check.
Step 104 is selected other arrangement modes.If current website arrangement mode can not satisfy user's the demand of browsing, then the user can select another kind of arrangement mode in drop-down list.For example, current page is the ordering of carrying out according to the website visiting amount, and the user wishes the website of checking that the homepage content change is bigger then can select by the multiple arrangement mode that navigation website provides.Can also can not use the mode of drop-down list for the arrangement mode that the user selects, and a plurality of rank button or link directly are set in the page.
Step 105, system is shown to the user according to the arrangement mode that the user selects with rank results.As above routine, system can offer the user with the website of arranging according to homepage content change degree.Return step 102 again after the step 105, the user checks whether the web page contents of new demonstration meets the purpose of browsing of oneself, so the user can also carry out repeatedly sequencing selection.And in meeting the webpage of user's request, directly having listed the web site url of a plurality of similar contents, the user can select the website that will visit by the one-level navigation directory.
Said process is the front stage operation explanation at user capture Web side navigation station, for the data handling procedure on backstage, website, with reference to shown in Figure 2, is the system handles process flow diagram of Websites navigation method of the present invention.
Step 201 regularly reads pending list of websites.Listed the network address of all websites that navigation website includes in the described list of websites, background system regularly (being generally every day) reads a list of websites, and the score value that carries out listed website upgrades.
Step 202 judges whether the PR value of listed website is upgraded.Described PR (PageRank, the webpage rank) is a kind of method that the Google search engine is used to evaluate and test a webpage " importance ", integrated identify such as Title (title) sign and Keywords (keyword) wait all other factorses after, Google adjusts the result by PR, make those webpages that have more " importance " in Search Results, make the website rank obtain to promote, thereby improve relevance of search results and quality.Therefore, Google calculates for each website according to many index score value (being the PR value) is high more, and the expression website is valuable more.The invention provides a kind of sortord of selecting to the user, the PR value of utilizing Google to calculate exactly sorts.If someone passes through other modes have just been upgraded the website before system upgrades automatically PR value, then system's judgement PR value is upgraded, continues execution in step 205; If the PR value on the same day is not also upgraded, then execution in step 203, and system upgrades the PR value automatically.
Step 203 is according to Web site query PR value.Providing on the website of inquiry, as the online query of Google PageRank (PR value): http://www.123cha.com/google_pagerank/, the website URL that input will be inquired about can inquire the PR value rank of corresponding website.As previously mentioned, the score value of finding is high more, and the website is just valuable more.The website PR value rank that system every day includes in all can the referral web site tabulation.
Step 204 is if successful inquiring then write down up-to-date PR value in database, and update mode is successfully; If owing to reasons such as the system failure or network interruption cause the inquiry failure, then update mode is failure.For the situation of inquiry failure, the chance that can provide several times (as three times) to inquire about again, if success is not yet then abandoned continuing inquiry, state is designated as failure.Behind the completing steps 204, continue execution in step 205.
Step 205 judges whether the Alexa rank value of listed website is upgraded.The invention provides the another kind of sortord of selecting to the user, utilize the Alexa rank to sort exactly.Described Alexa is a noticeable website with issue website, world rank, is a website that search is provided, the classification navigation is provided, and the Alexa rank is an index that is used for estimating a certain website visiting amount of often quoting at present.Different with above-mentioned PR value, the rank value that the Alexa rank inquires is more little, and corresponding website visiting amount is just big more.In the present invention, if there is the people to pass through other modes have just been upgraded the website before system upgrades automatically Alexa rank, then system's judgement Alexa rank is upgraded, continues execution in step 208; If the Alexa rank on the same day is not also upgraded, then execution in step 206, and system upgrades automatically.
Step 206 is according to Web site query Alexa rank value.As previously mentioned, providing on the website of inquiry, as online query: http://www.123cha.com/alexa/, the website URL that input will be inquired about can inquire the Alexa rank value of corresponding website.The website Alexa rank value rank that system every day includes in all can the referral web site tabulation.
Step 207, as previously mentioned, if successful inquiring then write down up-to-date Alexa rank value in database, and update mode is successfully; If owing to reasons such as the system failure or network interruption cause the inquiry failure, then update mode is failure.For the situation of inquiry failure, the chance that can provide several times (as three times) to inquire about again, if success is not yet then abandoned continuing inquiry, state is designated as failure.Behind the completing steps 207, continue execution in step 208.
Step 208 judges whether the homepage intensity of variation of listed website is added up, if add up, then execution in step 215; If do not add up, then execution in step 209.The third sortord provided by the invention is and describedly sorts according to website homepage content change degree, and described method can count the website homepage situation of change of every day.
Step 209 reads the homepage content.System can utilize existing webpage gripping tool to grasp the homepage content of website every day, and the routine interface that also can utilize Java to provide for the system framework based on Java language is by transmitting the direct access websites homepage of URL, the homepage code that obtains returning.Then, system judges whether the page grasps successful, if success, execution in step 210; Otherwise, execution in step 214.
Step 210 is filtered the html label.Behind the homepage code that obtains returning, utilize predefined html tally set, find all html codes and remove, promptly filter out the content of html.
Step 211 is removed the space to remaining content after filtering the html label, and is separated described content according to punctuation mark.
Step 212 reads the last data and compares the statistics variations rate.The data of described the last time are meant the homepage content of the final updating of preservation, and for example, if upgraded yesterday successfully, then Zui Jinyici data are the data of yesterday; If upgraded failure yesterday, then Zui Jinyici data are and upgraded successful data the day before yesterday.At first, the content after the above-mentioned separation is sought in the content of the last time of preserving, if find, then corresponding content does not change; If do not find, then corresponding content is content newly-increased or that revise.For the deleted content of comparing with the last data, do not take statistics.Then,, the line number of variation and total line number of today are divided by, obtain the rate of change of today according to the result who compares line by line.
Step 213 is preserved the homepage content after this filtration is separated, and uses for contrast next time.
Step 214, as previously mentioned, if the aforesaid operations success then write down up-to-date rate of change in database, and update mode is successfully; If owing to reasons such as the system failure or network interruption cause the inquiry failure, then update mode is failure.For the situation of inquiry failure, the chance that can provide several times (as three times) to inquire about again, if success is not yet then abandoned continuing inquiry, state is designated as failure.
The invention described above provides the multiple sortord of selecting for the user satisfies user's multiple demand.But sortord is not limited to above-mentioned record, can also increase other ordering rule according to user's different demands, offers the more choice of user.
Step 215 according to the score value that obtains, sorts according to different sortords.Background system is given database processing with sorting operation after finishing the website score value renewal of said process.When the user selects a kind of sortord, system carries out the accessing database operation, and database will be according to the score value that has upgraded, and the website rank results of corresponding sortord is shown to the user.For saving system resource, database only carries out a minor sort every day, then ranking results is preserved, and directly ordering website is shown when user capture.
For the mode that the PR value of calculating according to Google sorts, according to PR value order from high to low, under the classification of first class catalogue, to arrange respectively, the website that the PR value is maximum comes the foremost; For the mode that sorts according to the Alexa rank, according to Alexa rank value order from low to high, under the classification of first class catalogue, arrange respectively, the website of Alexa rank minimum is come the foremost; For the mode that sorts according to homepage content change degree, according to rate of change order from high to low, under the classification of first class catalogue, arrange respectively, the website of rate of change maximum is come the foremost.
The ranking results of described regular update will be presented under the first class catalogue according to user's different choice or default setting.If a class listed website now is too many, do not have too big meaning owing to come the website of back, show but also multipage need be set, so a threshold value is set usually, only give the user, and the website that will come beyond the threshold value does not show with significant recommendation of websites.For example, for the PR value, but setting threshold is 2, and the PR value does not show less than 2 website; For the Alexa rank value, but setting threshold is 50, and rank 50 later websites will not show.
As from the foregoing, the arrangement of among the present invention because website is regularly to change, and the rule of ordering represented the user certain browse demand, so only the one-level navigation directory need be set, under the situation that does not increase the classification level, the website that the user can be wanted visit directly shows up front, thereby avoids the user need visit the problem of multistage classification.Especially for score value variation website greatly, can recommend the user according to the operation situation of website reality.And, because the classification range of first class catalogue is relatively large, so the problem that can avoid the user that websites collection is understood differently and causes.The method of the invention can be brought better experience to the user, and the user just can select the website fast when browsing the navigation website first class catalogue.
The present invention also provides a kind of Web side navigation system, with reference to Fig. 3, is the structural drawing of Web side navigation of the present invention system.Described system comprises: storage unit 301, data-reading unit 302, sequencing unit 303, display unit 304.
Storage unit 301 is used to preserve the score value and the data such as update mode and ranking results of list of websites, corresponding website.Listed the network address of all websites that navigation website includes in the described list of websites, described data-reading unit 302 regularly reads a list of websites, and the score value that carries out listed website upgrades.The score value of described website calculates according to certain rule, is used for according to the height of score value being sorted in the website, and only website is come the foremost.Whether described update mode has identified the regular update of the score value in the list of websites, is divided into success and two states of failure.Described ranking results is that arrange the website after sorting according to different sortords in the website in the list of websites.
Data-reading unit 302 is used for regularly reading the list of websites of described storage unit 301.Usually read once every day, trigger the processing of sorting of 303 pairs of network address that read of sequencing unit then.Therefore, the arrangement of website is regularly to change among the present invention.
Sequencing unit 303 is used for according to being sorted in the website that list of websites is included for the ordering rule of user's selection.The invention provides three kinds of arrangement modes: the PR value of calculating according to Google sorts, and sorts according to the Alexa rank, sorts according to homepage content change degree.For preceding two kinds of orderings, can be directly inquiry obtains PR value or Alexa rank value on the website of inquiry providing according to the URL of website, then according to from high to low series arrangement PR value, according to series arrangement Alexa rank value from low to high, because the high more website of PR value is valuable more, and the visit capacity of the more little website of Alexa rank value is high more.
For the third sortord, at first need to utilize existing webpage gripping tool regularly to grasp the homepage content of website, the routine interface that also can utilize Java to provide for the system framework based on Java language is by transmitting the direct access websites homepage of URL, the homepage code that obtains returning; Utilize predefined html tally set then, find all html codes and remove, promptly filter out the content of html; Remaining content is removed the space, and separate described content according to punctuation mark; Dividing good content to compare with the data of the last time line by line, obtain the line number that change today at last, be divided by with total line number of today again, obtain the rate of change of today.After counting the rate of change of homepage, also in described storage unit 301, preserve this after filtration with the content of separating, for contrast use next time.Size according to rate of change during ordering is arranged from high to low, the big more front more that comes of rate of change.
Described sequencing unit 303 regularly all will carry out the renewal and the rearrangement of score value to above-mentioned three kinds of sortords, and preserve ranking results in described storage unit 301, is shown to the user with the up-to-date ranking results that obtains the same day.Behind the EO that upgrades every kind of score value, all to write down up-to-date score value in storage unit 301, and update mode.And described sorting operation all is the ordering of carrying out under the first class catalogue of navigation website, i.e. the ordering of amusement class, the ordering of sport category, or the like.Because the arrangement of website is regularly to change, and the rule of ordering represented the user certain browse demand, so the present invention only need be provided with the one-level navigation directory, under the situation that does not increase the classification level, the website that the user can be wanted to visit comes the foremost demonstration, thereby brings better experience to the user.In addition, sortord is not limited to the record among the present invention, can also increase other ordering rule according to user's different demands, offers the more choice of user.
Display unit 304 is used for the ordering rule according to user's selection or default setting, and the corresponding ranking results that sequencing unit 303 is handled shows.After the user logins navigation website and checks classification, according to default setting a kind of ranking results is shown that according to the one-level classification user can be according to the different purposes of browsing then earlier, the multiple sortord that selective system provides carries out the selection of website.In described ranking results, also be provided with different threshold values, do not have too big meaning, show but also multipage need be set owing to come the website of back according to different sortords, so only give the user usually, and the website that will come beyond the threshold value does not show with significant recommendation of websites.
More than to a kind of Websites navigation method provided by the present invention and system, be described in detail, used specific case herein principle of the present invention and embodiment are set forth, the explanation of above embodiment just is used for helping to understand method of the present invention and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to thought of the present invention, part in specific embodiments and applications all can change.In sum, this description should not be construed as limitation of the present invention.