CN103514232A - Web site resource management method and device - Google Patents

Web site resource management method and device Download PDF

Info

Publication number
CN103514232A
CN103514232A CN201210227112.1A CN201210227112A CN103514232A CN 103514232 A CN103514232 A CN 103514232A CN 201210227112 A CN201210227112 A CN 201210227112A CN 103514232 A CN103514232 A CN 103514232A
Authority
CN
China
Prior art keywords
page
web website
crumbs
browse path
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201210227112.1A
Other languages
Chinese (zh)
Inventor
王正华
李伟刚
薛晶晶
王佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201210227112.1A priority Critical patent/CN103514232A/en
Publication of CN103514232A publication Critical patent/CN103514232A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Abstract

The invention provides a web site resource management method and device. The method comprises the following steps that a navigation tree-shaped structure of web sites is obtained; a breadcrumb browse path structure of the web sites is obtained; a url hierarchical relation of the web sites is obtained; an index browse path of the web sites is generated according to the navigation tree-shaped structure, the breadcrumb browse path structure and the url hierarchical relation. According to the web site resource management method, the index browse path of the web sites is generated according to the breadcrumb browse path structure and the url hierarchical relation, and a user can browse the web sites on a high-end computer conveniently.

Description

A kind of web site resource management method and device
Technical field
The present invention relates to web appization technical field, particularly a kind of web site resource management method and device.
Background technology
Nowadays, along with entering the web2.0 epoch, sensing between PC website becomes more complicated, web app(web application, by using Web and Web browser technology, spanning network completes the application program of one or more tasks, conventionally need to use Web browser) be that traditional web webpage is converted into a kind of technology that user browses on high-end machine equipment of being convenient to, allow the similar native app(native of the experience application of user's effect of browsing web page on high terminal can only equipment, local application) represent effect.
Web appization technology is divided into page app and site app, and page app is mainly page structure technology, from the analysis reconstruct web page of single-page, represents and is suitable for high terminal browsing apparatus; Site app is mainly the online effect on high-end machine equipment for pc website reconstruct web website.
Site at present all technology builds and at least has following shortcoming:
(1) cannot be by excavating and build resource structures figure under line, inconvenient user browses on high terminal;
(2) cannot guarantee that the page on browse path can both be by page app structuring.
Summary of the invention
The present invention is intended at least one of solve the problems of the technologies described above.
For this reason, first object of the present invention is to propose a kind of web site resource management method.
Second object of the present invention is to propose a kind of web site resource management devices.
To achieve these goals, the web site resource management method of embodiment comprises the following steps according to a first aspect of the invention: the navigation tree structure that obtains described web website; Obtain the crumbs browse path structure of described web website; Obtain the url hierarchical relationship of described web website; And the index browse path that generates described web website according to described navigation tree first class gauge structure, described crumbs browse path structure and described url hierarchical relationship.
According to the web site resource management method of the embodiment of the present invention, by the crumbs browse path structure to web website and URL hierarchical relationship generating indexes browse path, facilitated user's browsing web website on high terminal.
For achieving the above object, the web site resource management devices of the embodiment of second aspect present invention comprises: the first acquisition module, and described the first acquisition module is for obtaining the navigation tree structure of described web website; The second acquisition module, described the second acquisition module is for obtaining the crumbs browse path structure of described web website; The 3rd acquisition module, described the 3rd acquisition module is for obtaining the url hierarchical relationship of described web website; And generation module, described generation module is for generating the index browse path of described web website according to described navigation tree first class gauge structure, described crumbs browse path structure and described url hierarchical relationship.
According to the web site resource management devices of the embodiment of the present invention, by three acquisition modules, obtain the navigation tree structure of website, the url hierarchical relationship of crumbs browse path structure and web website generates the index browse path of web website again by generation module, facilitated user's browsing web website on high terminal.
Additional aspect of the present invention and advantage in the following description part provide, and part will become obviously from the following description, or recognize by practice of the present invention.
Accompanying drawing explanation
Above-mentioned and/or the additional aspect of the present invention and advantage will become from the following description of the accompanying drawings of embodiments and obviously and easily understand, wherein:
Fig. 1 is a kind of according to an embodiment of the invention process flow diagram of web site resource management method;
Fig. 2 is a kind of according to an embodiment of the invention process flow diagram of web site resource management method;
Fig. 3 is a kind of according to an embodiment of the invention process flow diagram of web site resource management method;
Fig. 4 is a kind of according to an embodiment of the invention structural representation of web site resource management devices;
Fig. 5 is a kind of according to an embodiment of the invention structural representation of web site resource management devices; And
Fig. 6 is a kind of according to an embodiment of the invention structural representation of web site resource management devices.
Embodiment
Describe embodiments of the invention below in detail, the example of described embodiment is shown in the drawings, and wherein same or similar label represents same or similar element or has the element of identical or similar functions from start to finish.Below by the embodiment being described with reference to the drawings, be exemplary, only for explaining the present invention, and can not be interpreted as limitation of the present invention.
With reference to description and accompanying drawing below, these and other aspects of embodiments of the invention will be known.These describe and accompanying drawing in, specifically disclose some particular implementation in embodiments of the invention, represent to implement some modes of the principle of embodiments of the invention, still should be appreciated that the scope of embodiments of the invention is not limited.On the contrary, embodiments of the invention comprise spirit and all changes within the scope of intension, modification and the equivalent that falls into additional claims.
Below with reference to Figure of description, describe according to the web site resource management method of the embodiment of the present invention
A method, comprises the following steps: the navigation tree structure that obtains web website; Obtain the crumbs browse path structure of web website; Obtain the url hierarchical relationship of web website; And the index browse path that generates web website according to navigation tree first class gauge structure, crumbs browse path structure and url hierarchical relationship.
Fig. 1 is the process flow diagram of the web site resource management method of one embodiment of the invention.
As shown in Figure 1, according to the web site resource management method of the embodiment of the present invention, comprise the steps.
Step S101: the navigation tree structure that obtains web website.
Particularly, first from the homepage of web website, start to calculate the link sensing in navigation block; Then from the homepage of web website, along the link in navigation block, point to directed excavation with generation navigation tree structure.
More specifically, from the homepage of web website, the link of calculating in navigation block according to each navigation link page location pointed in navigation block is pointed to, then from web website homepage, according to the link in navigation block, point to and carry out orientation excavation, from these links, excavate the concrete page that navigation block is pointed to, and according to the page of excavating, from web page contents, Extracting Information is set up navigation tree structure.
Step S102: the crumbs browse path structure of obtaining web website.
Particularly, the resource page that first excavates web website from user search daily record is to calculate crumbs; Then according to crumbs, generate crumbs browse path structure.
More specifically, by the search daily record of digging user, according to user access logs record, excavate web website underlying resource page and calculate crumbs, from user, browse record and extract crumbs simultaneously, according to this crumbs record, build crumbs browse path structure.
Step S103: the url hierarchical relationship that obtains web website.
Particularly, web website is carried out to the relationship analysis of url hierarchical structure, obtain the url hierarchical relationship of web website.
Step S104: the index browse path that generates web website according to navigation tree first class gauge structure, crumbs browse path structure and url hierarchical relationship.
Particularly, the tree-shaped path structure of navigation, crumbs browse path structure and url hierarchical relationship analyze in web website and arrive hierarchical relationship between each resource Ye path and each page, generate the index browse path of this web website.
According to the web site resource management method of the embodiment of the present invention, by the crumbs browse path structure to web website and URL hierarchical relationship generating indexes browse path, facilitated user's browsing web website on high terminal.
Fig. 2 is the process flow diagram of the web site resource management method of another embodiment of the present invention.
As shown in Figure 2, according to the web site resource management method of the embodiment of the present invention, comprise the steps.
Step S201: the navigation tree structure that obtains web website.
Particularly, first from the homepage of web website, start to calculate the link sensing in navigation block; Then from the homepage of web website, along the link in navigation block, point to directed excavation with generation navigation tree structure.
More specifically, from the homepage of web website, the link of calculating in navigation block according to each navigation link page location pointed in navigation block is pointed to, then from web website homepage, according to the link in navigation block, point to and carry out orientation excavation, from these links, excavate the concrete page that navigation block is pointed to, and according to the page of excavating, from web page contents, Extracting Information is set up navigation tree structure.
Step S202: the crumbs browse path structure of obtaining web website.
Particularly, the resource page that first excavates web website from user search daily record is to calculate crumbs; Then according to crumbs, generate crumbs browse path structure.
More specifically, by the search daily record of digging user, according to user access logs record, excavate web website underlying resource page and calculate crumbs, from user, browse record and extract crumbs simultaneously, according to this crumbs record, build crumbs browse path structure.
Step S203: the url hierarchical relationship that obtains web website.
Particularly, web website is carried out to the relationship analysis of url hierarchical structure, obtain the url hierarchical relationship of web website.
Step S204: the index browse path that generates web website according to navigation tree first class gauge structure, crumbs browse path structure and url hierarchical relationship.
Particularly, the tree-shaped path structure of navigation, crumbs browse path structure and url hierarchical relationship analyze in web website and arrive hierarchical relationship between each resource Ye path and each page, generate the index browse path of this web website.
Step S205: the structure rate of the page type that the rear chain of the node on index browse path is pointed to is identified.
Particularly, first the node on index browse path is carried out to rear chain excavation; All pages of the page type then rear chain being pointed to are identified, to determine whether each page has predetermined structured type; Last according to the page shared number percent in all pages of page type with predetermined structured type determine obtain after the structure rate of chain page type.
More specifically, the page that the rear chain of the node on the web site index browse path generating is pointed to carries out page excavation, obtain the page type that this rear chain points to, according to this page type, all pages that belong to this page type are carried out to structured type identification, be subordinated in all pages of this page type and obtain the page that can be structured, the number percent of the page shared quantity in all pages of this page type that finally can be structured according to these is determined the structure rate of the page type that this rear chain points to.
In one embodiment of the invention, page type comprises novel read page, novel cover page, novel list page, headline page, news content page, video title page and video-see page.
According to the web site resource management method of the embodiment of the present invention, structure rate by the page on chain after the node of the web site index browse path generating judges, the conveniently filtration to page type, determines the page that can be structured, and improves user's experience sense.
Fig. 3 is the process flow diagram of the web site resource management method of another embodiment of the present invention.
As shown in Figure 3, according to the web site resource management method of the embodiment of the present invention, comprise the steps.
Step S301: the navigation tree structure that obtains web website.
Particularly, first from the homepage of web website, start to calculate the link sensing in navigation block; Then from the homepage of web website, along the link in navigation block, point to directed excavation with generation navigation tree structure.
More specifically, from the homepage of web website, the link of calculating in navigation block according to each navigation link page location pointed in navigation block is pointed to, then from web website homepage, according to the link in navigation block, point to and carry out orientation excavation, from these links, excavate the concrete page that navigation block is pointed to, and according to the page of excavating, from web page contents, Extracting Information is set up navigation tree structure.
Step S302: the crumbs browse path structure of obtaining web website.
Particularly, the resource page that first excavates web website from user search daily record is to calculate crumbs; Then according to crumbs, generate crumbs browse path structure.
More specifically, by the search daily record of digging user, according to user access logs record, excavate web website underlying resource page and calculate crumbs, from user, browse record and extract crumbs simultaneously, according to this crumbs record, build crumbs browse path structure.
Step S303: the url hierarchical relationship that obtains web website.
Particularly, web website is carried out to the relationship analysis of url hierarchical structure, obtain the url hierarchical relationship of web website.
Step S304: the index browse path that generates web website according to navigation tree first class gauge structure, crumbs browse path structure and url hierarchical relationship.
Particularly, the tree-shaped path structure of navigation, crumbs browse path structure and url hierarchical relationship analyze in web website and arrive hierarchical relationship between each resource Ye path and each page, generate the index browse path of this web website.
Step S305: the structure rate of the page type that the rear chain of the node on index browse path is pointed to is identified.
Particularly, first the node on index browse path is carried out to rear chain excavation; All pages of the page type then rear chain being pointed to are identified, to determine whether each page has predetermined structured type; Last according to the page shared number percent in all pages of page type with predetermined structured type determine obtain after the structure rate of chain page type.
More specifically, the page that the rear chain of the node on the web site index browse path generating is pointed to carries out page excavation, obtain the page type that this rear chain points to, according to this page type, all pages that belong to this page type are carried out to structured type identification, be subordinated in all pages of this page type and obtain the page that can be structured, the number percent of the page shared quantity in all pages of this page type that finally can be structured according to these is determined the structure rate of the page type that this rear chain points to.
Step S306: page type is filtered to filter out structure rate lower than the page type of predetermined threshold.
Particularly, the structure rate of the page type pointing to according to chain after fixed node and predetermined threshold value contrast, structure rate is labeled as and needs cancellation lower than the page type of threshold value, structure rate is labeled as and needs to retain higher than the page type of threshold value, then according to mark, page type is filtered, filter out the page type that needs cancellation.
In one embodiment of the invention, predetermined threshold is 80%.
In one embodiment of the invention, page type comprises novel read page, novel cover page, novel list page, headline page, news content page, video title page and video-see page.
According to the web site resource management method of the embodiment of the present invention, by the filtration to page type, page type or not treatable page type that structure rate is not high filter out, and can facilitate page structure, make website structure more clear, improve user's experience sense.
Below with reference to Figure of description, describe according to the web site resource management devices of the embodiment of the present invention.
Site resource management devices comprises: first acquisition module, for obtaining the navigation tree structure of web website; The second acquisition module, for obtaining the crumbs browse path structure of web website; The 3rd acquisition module, for obtaining the url hierarchical relationship of web website; And generation module, for generate the index browse path of web website according to navigation tree first class gauge structure, crumbs browse path structure and url hierarchical relationship.
Fig. 4 is the structural representation of the web site resource management devices of one embodiment of the invention.
As shown in Figure 4, the web site resource management devices according to the embodiment of the present invention, comprising: the first acquisition module 110, the second acquisition module 120, the three acquisition modules 130 and generation modules 140.
Particularly, the first acquisition module 110 is for obtaining the navigation tree structure of web website; The second acquisition module 120 is for obtaining the crumbs browse path structure of web website; The 3rd acquisition module 130 is for obtaining the url hierarchical relationship of web website; And generation module 140 is for generating the index browse path of web website according to navigation tree first class gauge structure, crumbs browse path structure and url hierarchical relationship.
More specifically, the first acquisition module 110 for: the link that starts to calculate in navigation block from the homepage of web website is pointed to; And along the link in navigation block, point to directed excavation with generation navigation tree structure from the homepage of web website.The second acquisition module is used for: from the daily record of web website, excavate resource page to calculate crumbs; And generate crumbs browse path structure according to crumbs.
For example, the first acquisition module 110 is from the homepage of web website, the link of calculating in navigation block according to each navigation link page location pointed in navigation block is pointed to, then from web website homepage, according to the link in navigation block, point to and carry out orientation excavation, from these links, excavate the concrete page that navigation block is pointed to, and according to the page of excavating, from web page contents, Extracting Information is set up navigation tree structure.The second acquisition module 120 is by the search daily record of digging user, according to user access logs record, excavate web website underlying resource page and calculate crumbs, from user, browse record and extract crumbs simultaneously, according to this crumbs record, build crumbs browse path structure.
According to the web site resource management devices of the embodiment of the present invention, cross the navigation tree structure that three acquisition modules obtain website, the url hierarchical relationship of crumbs browse path structure and web website generates the index browse path of web website again by generation module, facilitated user's browsing web website on high terminal.
Fig. 5 is the structural representation of the web site resource management devices of another embodiment of the present invention.
As shown in Figure 5, the web site resource management devices according to the embodiment of the present invention, comprising: the first acquisition module 110, the second acquisition module 120, the three acquisition modules 130, generation module 140 and identification module 150.
Particularly, the first acquisition module 110 is for obtaining the navigation tree structure of web website; The second acquisition module 120 is for obtaining the crumbs browse path structure of web website; The 3rd acquisition module 130 is for obtaining the url hierarchical relationship of web website; And generation module 140 is for generating the index browse path of web website according to navigation tree first class gauge structure, crumbs browse path structure and url hierarchical relationship;
The page type of the rear chain that identification module 150 points to for the node on index browse path is identified.
More specifically, the first acquisition module 110 for: the link that starts to calculate in navigation block from the homepage of web website is pointed to; And along the link in navigation block, point to directed excavation with generation navigation tree structure from the homepage of web website.The second acquisition module is used for: from the daily record of web website, excavate resource page to calculate crumbs; And generate crumbs browse path structure according to crumbs;
Identification module 150 carries out rear chain excavation for the node on index browse path; All pages of the page type that rear chain is pointed to are identified, to determine whether each page has predetermined structured type; And according to the shared number percent in all pages of page type of the page with predetermined structured type determine obtain after the structure rate of chain page type.
For example, the first acquisition module 110 is from the homepage of web website, the link of calculating in navigation block according to each navigation link page location pointed in navigation block is pointed to, then from web website homepage, according to the link in navigation block, point to and carry out orientation excavation, from these links, excavate the concrete page that navigation block is pointed to, and according to the page of excavating, from web page contents, Extracting Information is set up navigation tree structure.The second acquisition module 120 is by the search daily record of digging user, according to user access logs record, excavate web website underlying resource page and calculate crumbs, from user, browse record and extract crumbs simultaneously, according to this crumbs record, build crumbs browse path structure.The page that the rear chain of the node on the web site index browse path of 150 pairs of generations of identification module points to carries out page excavation, obtain the page type that this rear chain points to, according to this page type, all pages that belong to this page type are carried out to structured type identification, be subordinated in all pages of this page type and obtain the page that can be structured, the number percent of the page shared quantity in all pages of this page type that finally can be structured according to these is determined the structure rate of the page type that this rear chain points to.
In one embodiment of the invention, page type comprises novel read page, novel cover page, novel list page, headline page, news content page, video title page and video-see page.
According to the web site resource management devices of the embodiment of the present invention, the structure rate by the page on chain after the node of the web site index browse path generating judges that the conveniently filtration to page type improves user's experience sense.
Fig. 6 is the structural representation of the web site resource management devices of another embodiment of the present invention.
As shown in Figure 6, the web site resource management devices according to the embodiment of the present invention, comprising: the first acquisition module 110, the second acquisition module 120, the three acquisition modules 130, generation module 140, identification module 150 and labeling module 160.
Particularly, the first acquisition module 110 is for obtaining the navigation tree structure of web website; The second acquisition module 120 is for obtaining the crumbs browse path structure of web website; The 3rd acquisition module 130 is for obtaining the url hierarchical relationship of web website; And generation module 140 is for generating the index browse path of web website according to navigation tree first class gauge structure, crumbs browse path structure and url hierarchical relationship.The page type of the rear chain that identification module 150 points to for the node on index browse path is identified;
Labeling module 160 is for page type is marked, and the page type to structure rate lower than predetermined threshold is labeled as and need to removes.
More specifically, the first acquisition module 110 for: the link that starts to calculate in navigation block from the homepage of web website is pointed to; And along the link in navigation block, point to directed excavation with generation navigation tree structure from the homepage of web website.The second acquisition module is used for: from the daily record of web website, excavate resource page to calculate crumbs; And generate crumbs browse path structure according to crumbs.Identification module 150 carries out rear chain excavation for the node on index browse path; All pages of the page type that rear chain is pointed to are identified, to determine whether each page has predetermined structured type; And according to the shared number percent in all pages of page type of the page with predetermined structured type determine obtain after the structure rate of chain page type;
The structure rate of the page type that labeling module 160 goes out according to identification module 150 marks, structure rate is labeled as and needs cancellation lower than the page type of predetermined threshold, structure rate is labeled as and needs to retain higher than the page type of predetermined threshold, and then will need the page type of cancellation to filter out according to mark.
For example, the first acquisition module 110 is from the homepage of web website, the link of calculating in navigation block according to each navigation link page location pointed in navigation block is pointed to, then from web website homepage, according to the link in navigation block, point to and carry out orientation excavation, from these links, excavate the concrete page that navigation block is pointed to, and according to the page of excavating, from web page contents, Extracting Information is set up navigation tree structure.The second acquisition module 120 is by the search daily record of digging user, according to user access logs record, excavate web website underlying resource page and calculate crumbs, from user, browse record and extract crumbs simultaneously, according to this crumbs record, build crumbs browse path structure.The page that the rear chain of the node on the web site index browse path of 150 pairs of generations of identification module points to carries out page excavation, obtain the page type that this rear chain points to, according to this page type, all pages that belong to this page type are carried out to structured type identification, be subordinated in all pages of this page type and obtain the page that can be structured, the number percent of the page shared quantity in all pages of this page type that finally can be structured according to these is determined the structure rate of the page type that this rear chain points to.
In one embodiment of the invention, predetermined threshold is 80%.
In one embodiment of the invention, page type comprises novel read page, novel cover page, novel list page, headline page, news content page, video title page and video-see page.
According to the web site resource management devices of the embodiment of the present invention, pass through labeling module, to marking of page type, to then by being labeled as, need the page type of cancellation or not treatable page type to filter out, can facilitate page structure, make website structure more clear, improve user's experience sense.
In the description of this instructions, the description of reference term " embodiment ", " some embodiment ", " example ", " concrete example " or " some examples " etc. means to be contained at least one embodiment of the present invention or example in conjunction with specific features, structure, material or the feature of this embodiment or example description.In this manual, the schematic statement of above-mentioned term is not necessarily referred to identical embodiment or example.And the specific features of description, structure, material or feature can be with suitable mode combinations in any one or more embodiment or example.
Although illustrated and described embodiments of the invention, for the ordinary skill in the art, be appreciated that without departing from the principles and spirit of the present invention and can carry out multiple variation, modification, replacement and modification to these embodiment, scope of the present invention is by claims and be equal to and limit.

Claims (16)

1. a web site resource management method, is characterized in that, comprises the following steps:
Obtain the navigation tree structure of described web website;
Obtain the crumbs browse path structure of described web website;
Obtain the url hierarchical relationship of described web website; And
According to described navigation tree first class gauge structure, described crumbs browse path structure and described url hierarchical relationship, generate the index browse path of described web website.
2. method according to claim 1, is characterized in that, further comprises step: the structure rate of the page type that the rear chain of the node on described index browse path is pointed to is identified.
3. method according to claim 2, is characterized in that, further comprises step:
Described page type is filtered to filter out structure rate lower than the page type of predetermined threshold.
4. according to the method in claim 2 or 3, it is characterized in that, the structure rate of the page type that the rear chain of the node on described index browse path is pointed to is identified and is comprised:
Node on described index browse path is carried out to rear chain excavation;
All pages of the page type that described rear chain is pointed to are identified, to determine whether each page has predetermined structured type; And
According to the page shared number percent in all pages of described page type with predetermined structured type, determine the structure rate of obtaining described rear chain page type.
5. according to the method described in claim 3 or 4, it is characterized in that, described predetermined threshold is 80%.
6. according to the method in claim 2 or 3, it is characterized in that, described page type comprises novel read page, novel cover page, novel list page, headline page, news content page, video title page and video-see page.
7. according to the method described in any one in claim 1-3, it is characterized in that, the step of obtaining the navigation tree structure of described web website comprises:
From the homepage of described web website, start to calculate the link sensing in navigation block; And
From the homepage of described web website, along the link in described navigation block, point to directed excavation to generate described navigation tree structure.
8. according to the method described in any one in claim 1-3, it is characterized in that, the step of obtaining the crumbs browse path structure of described web website comprises:
From user search daily record, excavate the resource page of described web website to calculate crumbs; And
According to described crumbs, generate described crumbs browse path structure.
9. a web site resource management devices, is characterized in that, comprises the following steps:
The first acquisition module, described the first acquisition module is for obtaining the navigation tree structure of described web website;
The second acquisition module, described the second acquisition module is for obtaining the crumbs browse path structure of described web website;
The 3rd acquisition module, described the 3rd acquisition module is for obtaining the url hierarchical relationship of described web website; And
Generation module, described generation module is for generating the index browse path of described web website according to described navigation tree first class gauge structure, described crumbs browse path structure and described url hierarchical relationship.
10. device according to claim 9, is characterized in that, further comprises:
Identification module, the page type of the rear chain that described identification module points to for the node on described index browse path is identified.
11. devices according to claim 10, is characterized in that, further comprise:
Labeling module, marks described page type, and the page type to structure rate lower than predetermined threshold is labeled as and need to removes.
12. according to the device described in claim 10 or 11, it is characterized in that, described identification module is used for:
Node on described index browse path is carried out to rear chain excavation;
All pages of the page type that described rear chain is pointed to are identified, to determine whether each page has predetermined structured type; And
According to the page shared number percent in all pages of described page type with predetermined structured type, determine the structure rate of obtaining described rear chain page type.
13. according to the device described in claim 11 or 12, it is characterized in that, described predetermined threshold is 80%.
14. according to the device described in claim 11 or 12, it is characterized in that, described page type comprises novel read page, novel cover page, novel list page, headline page, news content page, video title page and video-see page.
15. according to the device described in any one in claim 9-10, it is characterized in that, described the first acquisition module is used for:
From the homepage of described web website, start to calculate the link sensing in navigation block; And
From the homepage of described web website, along the link in described navigation block, point to directed excavation to generate described navigation tree structure.
16. according to the device described in any one in claim 9-10, it is characterized in that, described the second acquisition module is used for:
From the daily record of described web website, excavate resource page to calculate crumbs; And
According to described crumbs, generate described crumbs browse path structure.
CN201210227112.1A 2012-06-29 2012-06-29 Web site resource management method and device Pending CN103514232A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210227112.1A CN103514232A (en) 2012-06-29 2012-06-29 Web site resource management method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210227112.1A CN103514232A (en) 2012-06-29 2012-06-29 Web site resource management method and device

Publications (1)

Publication Number Publication Date
CN103514232A true CN103514232A (en) 2014-01-15

Family

ID=49896963

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210227112.1A Pending CN103514232A (en) 2012-06-29 2012-06-29 Web site resource management method and device

Country Status (1)

Country Link
CN (1) CN103514232A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105302878A (en) * 2015-10-09 2016-02-03 北京奇虎科技有限公司 Method and apparatus for recording webpage links in cross index page
CN106933915A (en) * 2015-12-31 2017-07-07 北京国双科技有限公司 The generation method and device of web page navigation
CN109063051A (en) * 2018-07-19 2018-12-21 佛山科学技术学院 A kind of storage method of industry big data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090083244A1 (en) * 2007-09-25 2009-03-26 Nec (China) Co., Ltd. Method and system for subject relevant web page filtering based on navigation paths information
CN101630330A (en) * 2009-08-14 2010-01-20 苏州锐创通信有限责任公司 Method for webpage classification

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090083244A1 (en) * 2007-09-25 2009-03-26 Nec (China) Co., Ltd. Method and system for subject relevant web page filtering based on navigation paths information
CN101630330A (en) * 2009-08-14 2010-01-20 苏州锐创通信有限责任公司 Method for webpage classification

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张怿: "时尚网站设计分析与策略研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
聂应高等: "图书馆网站的优化设计", 《情报探索》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105302878A (en) * 2015-10-09 2016-02-03 北京奇虎科技有限公司 Method and apparatus for recording webpage links in cross index page
CN105302878B (en) * 2015-10-09 2021-02-02 北京奇虎科技有限公司 Method and device for recording webpage links in cross index page
CN106933915A (en) * 2015-12-31 2017-07-07 北京国双科技有限公司 The generation method and device of web page navigation
CN109063051A (en) * 2018-07-19 2018-12-21 佛山科学技术学院 A kind of storage method of industry big data

Similar Documents

Publication Publication Date Title
CN102054028B (en) Method for implementing web-rendering function by using web crawler system
CN102270331B (en) Network shopping navigating method based on visual search
CN102567494B (en) Website classification method and device
CN102043805A (en) Method and device for generating Internet navigation page
CN102591992A (en) Webpage classification identifying system and method based on vertical search and focused crawler technology
CN104182482B (en) A kind of news list page determination methods and the method for screening news list page
CN104462547A (en) Configurable webpage data acquisition method and system
CN104899220A (en) Application program recommendation method and system
CN104991904A (en) Page data acquisition method of dynamic webpage
CN102096705A (en) Article acquisition method
CN105117159B (en) A kind of character processing method and terminal
CN102486799A (en) World wide web (WWW) page processing method and device
CN105808417A (en) Automated testing method and proxy server
CN103077250A (en) Method and device for capturing webpage content
CN108804469A (en) A kind of web page identification method and electronic equipment
CN104572934A (en) Webpage key content extracting method based on DOM
CN104298780A (en) Method and system for pre-obtaining browser webpage information
CN103514232A (en) Web site resource management method and device
CN105808076A (en) Setting method and device of browser bookmark, and terminal
CN103605514A (en) Front-end template processing method and device
CN103365961A (en) Accurate search-oriented website structurization labeling method and system
CN101763432A (en) Method for constructing lightweight webpage dynamic view
CN103164438B (en) The acquisition method of a kind of network comment and system
CN101807187A (en) Browsing information-based instant search method
CN104750812A (en) Automatic data collecting method based on webpage label analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20140115