CN103514221B - A kind of web site resource management method and device - Google Patents
A kind of web site resource management method and device Download PDFInfo
- Publication number
- CN103514221B CN103514221B CN201210222539.2A CN201210222539A CN103514221B CN 103514221 B CN103514221 B CN 103514221B CN 201210222539 A CN201210222539 A CN 201210222539A CN 103514221 B CN103514221 B CN 103514221B
- Authority
- CN
- China
- Prior art keywords
- web site
- page
- index
- resource
- index page
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The present invention proposes a kind of web site resource management method and device, and wherein method includes step, checks the number of web site;If the number of web site is one, check whether web site has index page the most further;If there being index page, then it is optimized index page to generate the first index page;Without index page, then according to structural generation second index page of web site;And if the number of web site is two or more, then setting up cross-site index page based on semanteme.Web site resource management method according to embodiments of the present invention, by web site is detected according to whether the three all situations having index page different from the number of web site carry out the index page set up according to distinct methods, it is possible to fully excavate the degree of polymerization to site resource membership credentials and improving site resource and improve site page bandwagon effect.
Description
Technical field
The present invention relates to web site resource membership credentials analysis mining field, particularly to a kind of web site resource management
Method and device.
Background technology
Nowadays, web appization technology is the most increasingly common, and web site is converted to app and needs to provide this website
Resource membership credentials, it is therefore desirable to be analyzed excavating to the domestic-investment source tissue in the station relation of web site, obtain structurized resource
Membership credentials data.
At present the resource membership credentials to web site are excavated and are mainly excavated by manually checking, do not have into
Ripe prior art, therefore has the disadvantage in that
(1) excavation to the resource membership credentials of website does not just use distinct methods according to classification difference, excavates not
Comprehensively, and the degree of polymerization is the highest;
(2) not having fixing method for digging, the resource membership credentials obtained are the most clearly and more chaotic, it is impossible to more conveniently
Structuring.
Summary of the invention
It is contemplated that at least solve one of above-mentioned technical problem.
To this end, the first of the present invention purpose is to propose a kind of web site resource management method.
Second object of the present invention is to propose a kind of web site resource managing device.
To achieve these goals, the web site resource management method of embodiment includes according to the first aspect of the invention
Following steps: check the number of described web site;If the number of described web site is one, then check described web further
Whether website has index page;If there being index page, then it is optimized to generate the first index page to described index page;Without
Index page, then according to structural generation second index page of described web site;And if the number of described web site is two
Above, then cross-site index page is set up based on semanteme.
Web site resource management method according to embodiments of the present invention, by the detection of web site according to whether there is rope
Draw the index page that the page three all situations different from the number of web site are set up according to distinct methods, it is possible to fully excavate
The degree of polymerization to site resource membership credentials and improving site resource and raising site page bandwagon effect.
For achieving the above object, the web site resource managing device of the embodiment of second aspect present invention includes: the first inspection
Looking into module, described first checks that module is for checking the number of described web site;Second checks module, and described second checks mould
Block, in the case of the number of described web site is one, checks whether described web site has index page;Optimize module, institute
State optimization module in the case of having index page in described web site, be optimized to generate the first rope to described index page
Draw page;Generation module, described first generation module is not in the case of described web site has index page, according to described web
Structural generation second index page of website;And set up module, described module of setting up is for being two at the number of described web site
In the case of more than individual, set up cross-site index page based on semanteme.
Web site resource managing device according to embodiments of the present invention, by the detection of web site according to whether there is rope
Draw the index page that the page three kind situations different from the number of web site are set up according to distinct methods, it is possible to fully excavate and arrive
Site resource membership credentials also improve the degree of polymerization of site resource and improve site page bandwagon effect.
The additional aspect of the present invention and advantage will part be given in the following description, and part will become from the following description
Obtain substantially, or recognized by the practice of the present invention.
Accompanying drawing explanation
The present invention above-mentioned and/or that add aspect and advantage will become from the following description of the accompanying drawings of embodiments
Substantially with easy to understand, wherein:
Fig. 1 is the flow chart of a kind of web site resource management method according to one embodiment of the invention;
Fig. 2 is the flow chart of a kind of web site resource management method according to one embodiment of the invention;
Fig. 3 is the flow chart of a kind of web site resource management method according to one embodiment of the invention;
Fig. 4 is the flow chart of a kind of web site resource management method according to one embodiment of the invention;
Fig. 5 is the structural representation of the web site resource managing device according to one embodiment of the invention;
Fig. 6 is the structural representation of the web site resource managing device according to one embodiment of the invention;
Fig. 7 is the structural representation of the web site resource managing device according to one embodiment of the invention;And
Fig. 8 is the structural representation of the web site resource managing device according to one embodiment of the invention.
Detailed description of the invention
Embodiments of the invention are described below in detail, and the example of described embodiment is shown in the drawings, the most from start to finish
Same or similar label represents same or similar element or has the element of same or like function.Below with reference to attached
The embodiment that figure describes is exemplary, is only used for explaining the present invention, and is not construed as limiting the claims.
With reference to explained below and accompanying drawing, it will be clear that these and other aspects of embodiments of the invention.Describe at these
With in accompanying drawing, specifically disclose some particular implementation in embodiments of the invention, represent the enforcement implementing the present invention
Some modes of the principle of example, but it is to be understood that the scope of embodiments of the invention is not limited.On the contrary, the present invention
All changes, amendment and equivalent in the range of spirit that embodiment includes falling into attached claims and intension.
Below with reference to Figure of description, web site resource management method according to embodiments of the present invention is described
A kind of web site resource management method, comprises the following steps: check the number of web site;If web site
Number is one, checks whether web site has index page the most further;If there being index page, then index page is optimized with life
Become the first index page;Without index page, then according to structural generation second index page of web site;And if web site
Number be two or more, then set up cross-site index page based on semanteme.
Fig. 1 is the flow chart of the web site resource management method of one embodiment of the invention.
As it is shown in figure 1, web site resource management method according to embodiments of the present invention comprises the steps:
Step S101: check the number of web site.
Specifically, detection needs to obtain the number of the web site of structurized resource membership credentials.
Step S102: if the number of web site is one, checks whether web site has index page the most further.
Specifically, if checked, to obtain the web site of structurized resource membership credentials be one, then starts to dig
Dig the page of this web site, check for the index page of the index information comprising this web site.
Step S103: if there being index page, then be optimized index page to generate the first index page.
Specifically, if this web site has index page, then obtain the page info in this index page, this page is believed
Breath is optimized acquisition and sets up the information needed for the first index page, generates the first index page of this web site.
Step S104: without index page, then according to structural generation second index page of web site.
Specifically, if this web site does not has index page, then the resource page of this web site is excavated, according to resource
The structure of the acquisition of information web site of page, generates the second index page of this web site according to the structural information of web site.
Step S105: if the number of web site is two or more, then set up cross-site index page based on semanteme.
Specifically, if checked, to obtain the web site of structurized resource membership credentials be two or more, then basis
The semantic dependency of resource page contacts these web site, crosses over station for acquiring resource index organizational information, and according to getting
Resource index organizational information generate index page.
Web site resource management method according to embodiments of the present invention, by the detection of web site according to whether there is rope
Draw the index page that the page three kind situations different from the number of web site are set up according to distinct methods, it is possible to fully excavate and arrive
Site resource membership credentials also improve the degree of polymerization of site resource and improve site page bandwagon effect.
Fig. 2 is the flow chart of the web site resource management method of another embodiment of the present invention.
As in figure 2 it is shown, web site resource management method according to embodiments of the present invention comprises the steps.
Step S201: check the number of web site.
Specifically, detection needs to obtain the number of the web site of structurized resource membership credentials.
Step S202: if the number of web site is one, checks whether web site has index page the most further.
Specifically, if checked, to obtain the web site of structurized resource membership credentials be one, then starts to dig
Dig the page of this web site, check for the index page of the index information comprising this web site.
Step S203: if there being index page, then delete the non-index information in index page.
Specifically, if this web site has index page, then obtain the full detail of this index pages and to this index pages
Information is analyzed, and non-index information therein is deleted.
Step S204: delete the index entry of the resource page that not can connect in index page in web site.
Specifically, index entry information remaining in index page information is checked, sees whether these index entries can connect
Resource page in the web site pointed to it, deletes and not can connect to the rope of resource page in himself pointed web site
Draw item.
Step S205: the effective index entry in extraction index page is to generate the first index page.
Specifically, in extraction index page, remaining effective index information is incorporated on a page, generates the first index
The page.
Step S206: without index page, then according to structural generation second index page of web site.
Specifically, if this web site does not has index page, then the resource page of this web site is excavated, according to resource
The structure of the acquisition of information web site of page, generates the second index page of this web site according to the structural information of web site.
Step S207: if the number of web site is two or more, then set up cross-site index page based on semanteme.
Specifically, if checked, to obtain the web site of structurized resource membership credentials be two or more, then basis
The semantic dependency of resource page contacts these web site, crosses over station for acquiring resource index organizational information, and according to getting
Resource index organizational information generate index page.
In one embodiment of the invention, non-index information includes advertisement and animation.
Web site resource management method according to embodiments of the present invention, by by the non-index information of index pages and nothing
Effect index is deleted, and generates index page according to effectively index, it is possible to effectively obtain the resource membership credentials of web site, and relation is clear
Clear, the degree of polymerization is higher.
Fig. 3 is the flow chart of the web site resource management method of another embodiment of the present invention.
As it is shown on figure 3, web site resource management method according to embodiments of the present invention comprises the steps.
Step S301: check the number of web site.
Specifically, detection needs to obtain the number of the web site of structurized resource membership credentials.
Step S302: if the number of web site is one, checks whether web site has index page the most further.
Specifically, if checked, to obtain the web site of structurized resource membership credentials be one, then starts to dig
Dig the page of this web site, check for the index page of the index information comprising this web site.
Step S303: if there being index page, then delete the non-index information in index page.
Specifically, if this web site has index page, then obtain the full detail of this index pages and to this index pages
Information is analyzed, and non-index information therein is deleted.
Step S304: delete the index entry of the resource page that not can connect in index page in web site.
Specifically, index entry information remaining in index page information is checked, sees whether these index entries can connect
Resource page in the web site pointed to it, deletes and not can connect to the rope of resource page in himself pointed web site
Draw item.
Step S305: the effective index entry in extraction index page is to generate the first index page.
Specifically, in extraction index page, remaining effective index information is incorporated on a page, generates the first index
The page.
Step S306: without index page, it is judged that whether the resource page in web site has title.
Specifically, if this web site does not has index page, then start to excavate the resource page of this web site from homepage, obtain
The information of the resource page of this web site, it is judged that whether resource page has heading message.
Step S307: if it is, the title of extraction resource page is as index entry.
Specifically, if resource page has heading message, then extract the title in this resource page as index entry.
Step S308: if it is not, then generate the summary info of resource page as index entry.
Specifically, if resource page does not has heading message, then the main information comprised according to resource page generates summary letter
Breath, and using this summary info as index entry.
Step S309: generate the second index page according to index entry.
Specifically, the index entry obtaining all resource pages is incorporated on a page, generates the second index page.
Step S310: if the number of web site is two or more, then set up cross-site index page based on semanteme.
Specifically, if checked, to obtain the web site of structurized resource membership credentials be two or more, then basis
The semantic dependency of resource page contacts these web site, crosses over station for acquiring resource index organizational information, and according to getting
Resource index organizational information generate index page.
In one embodiment of the invention, non-index information includes advertisement and animation.
Web site resource management method according to embodiments of the present invention, by by the heading message of resource page or summary
Information generate index entry, generate index pages further according to these index entries, improve resource membership credentials the degree of polymerization and
Definition between relation.
Fig. 4 is the flow chart of the web site resource management method of another embodiment of the present invention.
As shown in Figure 4, web site resource management method according to embodiments of the present invention comprises the steps.
Step S401: check the number of web site.
Specifically, detection needs to obtain the number of the web site of structurized resource membership credentials.
Step S402: if the number of web site is one, checks whether web site has index page the most further.
Specifically, if checked, to obtain the web site of structurized resource membership credentials be one, then starts to dig
Dig the page of this web site, check for the index page of the index information comprising this web site.
Step S403: if there being index page, then delete the non-index information in index page.
Specifically, if this web site has index page, then obtain the full detail of this index pages and to this index pages
Information is analyzed, and non-index information therein is deleted.
Step S404: delete the index entry of the resource page that not can connect in index page in web site.
Specifically, index entry information remaining in index page information is checked, sees whether these index entries can connect
Resource page in the web site pointed to it, deletes and not can connect to the rope of resource page in himself pointed web site
Draw item.
Step S405: the effective index entry in extraction index page is to generate the first index page.
Specifically, in extraction index page, remaining effective index information is incorporated on a page, generates the first index
The page.
Step S406: without index page, it is judged that whether the resource page in web site has title.
Specifically, if this web site does not has index page, then start to excavate the resource page of this web site from homepage, obtain
The information of the resource page of this web site, it is judged that whether resource page has heading message.
Step S407: if it is, the title of extraction resource page is as index entry.
Specifically, if resource page has heading message, then extract the title in this resource page as index entry
Step S408: if it is not, then generate the summary info of resource page as index entry.
Specifically, if resource page does not has heading message, then the main information comprised according to resource page generates summary letter
Breath, and using this summary info as index entry.
Step S409: generate the second index page according to index entry.
Specifically, the index entry obtaining all resource pages is incorporated on a page, generates the second index page.
Step S410: if the number of web site is two or more, predefined and different semantic corresponding multiple moulds
Plate.
Specifically, if checked, to obtain the web site of structurized resource membership credentials be two or more, then basis
Semantic dependency presets relevant template corresponding to this semanteme.
Step S411: the resource page in plural web site is classified according to semantic dependency and organizes
In one web site.
Specifically, obtain the information included in the resource page in each web site, and according to the language in resource page information
The resource page of each web site is classified by justice dependency, relevant information semantic in each resource page is organized
In first web site.
Step S412: find the first template corresponding to the first web site according to the first web site semantic dependency.
Specifically, according to the semantic dependency of the resource page of tissue in the first web site to predefined semantic template
In make a look up, obtain first template corresponding with the semanteme of the first web site.
Step S413: the first mould will be filled into respectively according to the resource page of semantic dependency classification in the first web site
In the sub-column of difference of plate.
Specifically, according to the attribute of key word, by the information of each resource page in the first web site according to semanteme
Dependency is filled in the first template, according to the difference of column, adds corresponding resource page information.
Step S414: set up cross-site index page according to the information in different templates.
Specifically, inserting relevant information according in different semantic templates, the key word integrating each template is made with semantic
For index entry, it is established that cross-site index page.
The specific implementation process of step S410 to S414 is exemplified below.
Such as, the semantic template of one books information of definition, the inside includes books essential information, popular comment, businessman's ratio
The sub-columns such as valency, e-sourcing and other modules;It is then assumed that each site resource page has a book big talk design pattern,
Then the relevant information talking about design pattern in each resource page is incorporated in the first web site as a class;Then basis
The keyword lookup of the information of big talk design this this book of pattern is to books information model, according to the sub-column module in template by the
Each the sub-column that relevant information in one web site is filled in template, then extracts the key message in template as rope
Draw item, so set up multiple page and extract index entry, index entry can be integrated and set up cross-site index page.
In one embodiment of the invention, different semantemes includes novel title, news title, video name and commodity
Title etc..
In one embodiment of the invention, non-index information includes advertisement and animation.
Web site resource management method according to embodiments of the present invention, by filling out the information classification tissue of multiple websites
It is charged in template be indexed the generation of item, improves the definition between the degree of polymerization of resource membership credentials and relation.
Below with reference to Figure of description, web site resource managing device according to embodiments of the present invention is described.
A kind of web site resource managing device includes: first checks module, and first checks that module is used for checking web site
Number;Second checks module, and second checks that module, in the case of the number of web site is one, checks that web site is
No have index page;Optimize module, optimize module in the case of web site has index page, index page is optimized with
Generate the first index page;Generation module, the first generation module, in the case of web site does not has index page, is stood according to web
Structural generation second index page of point;And set up module, set up module for being plural feelings at the number of web site
Under condition, set up cross-site index page based on semanteme..
Fig. 5 is the structural representation of the web site resource managing device of one embodiment of the invention.
As it is shown in figure 5, web site resource managing device according to embodiments of the present invention, including: first checks module 110,
Second checks module 120, optimizes module 130, generation module 140 and set up module 150.
Specifically, first check that module 110 is for checking the number of web site;Second checks that module 120 is at web
In the case of the number of website is one, check whether web site has index page;Optimize module 130 for having index in web site
In the case of Ye, it is optimized index page to generate the first index page;Generation module 140 is not for indexing in web site
In the case of Ye, according to structural generation second index page of web site;And set up module 150 for the number in web site
In the case of two or more, set up cross-site index page based on semanteme.
More specifically, first checks that module 110 needs to obtain the web site of structurized resource membership credentials for detection
Number;If second checks that module 120 to obtain structurized resource tissue pass for checking that module 110 checks first
In the case of the web site of system is one, starts to excavate the page of this web site, check for comprising this web site
The index page of index information;If optimizing module 130 for checking that module 120 checks this web site and has index page second
In the case of, obtain the page info in this index page, this page info is optimized, and the first index page institute is set up in acquisition
The information needed, generates the first index page of this web site;If second, generation module 140 is for checking that 120 module inspections should
In the case of web site does not has index page, the resource page of this web site is excavated, according to the acquisition of information web of resource page
The structure of website, generates the second index page of this web site according to the structural information of web site;And set up module 150 for
If checking that module 110 checks that to obtain the web site of structurized resource membership credentials be plural feelings first
Under condition, contact these web site according to the semantic dependency of resource page, cross over station for acquiring resource index organizational information, and
Resource index organizational information according to getting generates index page.
Web site resource managing device according to embodiments of the present invention, checks the module detection to web site by two
According to whether the three kinds of situations having index page different from the number of web site are come according to distinct methods by three different moulds
The index page that block is set up, it is possible to fully excavate the degree of polymerization to site resource membership credentials and improving site resource and improve station
Point page presentation effect.
Fig. 6 is the structural representation of the web site resource managing device of another embodiment of the present invention.
As shown in Figure 6, web site resource managing device according to embodiments of the present invention, including: first checks module 110,
Second checks module 120, optimizes module 130, generation module 140 and set up module 150, wherein optimizes module 130 and includes deleting
Except unit 131 and the first extracting unit 132.
Specifically, first check that module 110 is for checking the number of web site;Second checks that module 120 is at web
In the case of the number of website is one, check whether web site has index page;Optimize module 130 for having index in web site
In the case of Ye, it is optimized index page to generate the first index page;Generation module 140 is not for indexing in web site
In the case of Ye, according to structural generation second index page of web site;And set up module 150 for the number in web site
In the case of two or more, set up cross-site index page based on semanteme.Wherein delete unit 131 for deleting in index page
Non-index information and index page not can connect to the index entry of resource page in web site;First extracting unit 132 is used for
Effective index entry in extraction index page is to generate the first index page.
More specifically, first checks that module 110 needs to obtain the web site of structurized resource membership credentials for detection
Number;If second checks that module 120 to obtain structurized resource tissue pass for checking that module 110 checks first
In the case of the web site of system is one, starts to excavate the page of this web site, check for comprising this web site
The index page of index information;If optimizing module 130 for checking that module 120 checks this web site and has index page second
In the case of, obtain the page info in this index page, this page info is optimized, and the first index page institute is set up in acquisition
The information needed, generates the first index page of this web site;If second, generation module 140 is for checking that 120 module inspections should
In the case of web site does not has index page, the resource page of this web site is excavated, according to the acquisition of information web of resource page
The structure of website, generates the second index page of this web site according to the structural information of web site;And set up module 150 for
If checking that module 110 checks that to obtain the web site of structurized resource membership credentials be plural feelings first
Under condition, contact these web site according to the semantic dependency of resource page, cross over station for acquiring resource index organizational information, and
Resource index organizational information according to getting generates index page.Optimizing module this page info is optimized, and
Obtain and set up the information needed for the first index page, when generating the first index page of this web site, obtained by removing module 131
This index pages information is also analyzed by the full detail of this index pages, non-index information therein is deleted, with
Time index entry information remaining in index page information is checked, see these index entries whether can connect to its point to web
Resource page in website, deletes and not can connect to the index entry of resource page in himself pointed web site;Then pass through
First abstraction module 132 extracts remaining effective index information in index page and is incorporated on a page, generates the first index
The page.
In one embodiment of the invention, non-index information includes advertisement and animation.
Web site resource managing device according to embodiments of the present invention, by removing module by the non-index of index pages
Information and invalid index are deleted, and generate index page by abstraction module according to effectively index, it is possible to effectively obtain web site
Resource membership credentials, and relation is clear, the degree of polymerization is higher.
Fig. 7 is the structural representation of the web site resource managing device of another embodiment of the present invention.
As it is shown in fig. 7, web site resource managing device according to embodiments of the present invention, including: first checks module 110,
Second checks module 120, optimizes module 130, generation module 140 and set up module 150, wherein optimizes module 130 and includes deleting
Except unit 131 and the first extracting unit 132, generation module 140 includes judging unit 141, the second extracting unit 142 and generation
Unit 143.
Specifically, first check that module 110 is for checking the number of web site;Second checks that module 120 is at web
In the case of the number of website is one, check whether web site has index page;Optimize module 130 for having index in web site
In the case of Ye, it is optimized index page to generate the first index page;Generation module 140 is not for indexing in web site
In the case of Ye, according to structural generation second index page of web site;And set up module 150 for the number in web site
In the case of two or more, set up cross-site index page based on semanteme.Wherein delete unit 131 for deleting in index page
Non-index information and index page not can connect to the index entry of resource page in web site;First extracting unit 132 is used for
Effective index entry in extraction index page is to generate the first index page.The wherein judging unit 141 money in judging web site
Whether the source page has title;In the case of the resource page that second extracting unit 142 is used in web site has title,
The title of extraction resource page is as index entry;And signal generating unit 143 does not have mark for the resource page in web site
In the case of topic, the summary info of generation resource page is as index entry, and generates the second index page according to index entry.
More specifically, first checks that module 110 needs to obtain the web site of structurized resource membership credentials for detection
Number;If second checks that module 120 to obtain structurized resource tissue pass for checking that module 110 checks first
In the case of the web site of system is one, starts to excavate the page of this web site, check for comprising this web site
The index page of index information;If optimizing module 130 for checking that module 120 checks this web site and has index page second
In the case of, obtain the page info in this index page, this page info is optimized, and the first index page institute is set up in acquisition
The information needed, generates the first index page of this web site;If second, generation module 140 is for checking that 120 module inspections should
In the case of web site does not has index page, the resource page of this web site is excavated, according to the acquisition of information web of resource page
The structure of website, generates the second index page of this web site according to the structural information of web site;And set up module 150 for
If checking that module 110 checks that to obtain the web site of structurized resource membership credentials be plural feelings first
Under condition, contact these web site according to the semantic dependency of resource page, cross over station for acquiring resource index organizational information, and
Resource index organizational information according to getting generates index page.Optimizing module this page info is optimized, and
Obtain and set up the information needed for the first index page, when generating the first index page of this web site, obtained by removing module 131
This index pages information is also analyzed by the full detail of this index pages, non-index information therein is deleted, with
Time index entry information remaining in index page information is checked, see these index entries whether can connect to its point to web
Resource page in website, deletes and not can connect to the index entry of resource page in himself pointed web site;Then pass through
First abstraction module 132 extracts remaining effective index information in index page and is incorporated on a page, generates the first index
The page.At generation module 140 in the case of web site does not has index page, according to structural generation second index page of web site
In, start to excavate the resource page of this web site especially by judging unit 141 from homepage, obtain the resource page of this web site
Information, it is judged that whether resource page has heading message;If wherein resource page has heading message, then by the second extraction
Unit 142 obtains index entry, extracts the title in this resource page as index entry, if resource page does not has heading message, then
The main information comprised according to resource page by signal generating unit 143 generates summary info, and using this summary info as index
, the index entry obtaining all resource pages is incorporated on a page, generates the second index page.
In one embodiment of the invention, non-index information includes advertisement and animation.
Web site resource managing device according to embodiments of the present invention, is believed the title of resource page by generation module
Breath or summary info generate index entry, generate index pages further according to these index entries, improve the poly-of resource membership credentials
Definition between right and relation.
Fig. 7 is the structural representation of the web site resource managing device of another embodiment of the present invention.
As it is shown in fig. 7, web site resource managing device according to embodiments of the present invention, including: first checks module 110,
Second checks module 120, optimizes module 130, generation module 140 and set up module 150, wherein optimizes module 130 and includes deleting
Except unit 131 and the first extracting unit 132;Generation module 140 includes judging unit 141, the second extracting unit 142 and generation
Unit 143;Set up module 150 and include definition unit 151, taxon 152, retrieval unit 153, fill unit 154 and build
Vertical unit 155.
Specifically, first check that module 110 is for checking the number of web site;Second checks that module 120 is at web
In the case of the number of website is one, check whether web site has index page;Optimize module 130 for having index in web site
In the case of Ye, it is optimized index page to generate the first index page;Generation module 140 is not for indexing in web site
In the case of Ye, according to structural generation second index page of web site;And set up module 150 for the number in web site
In the case of two or more, set up cross-site index page based on semanteme.The deletion unit 131 optimized in module 130 is used for deleting
Index entry except the resource page that not can connect in the non-index information in index page and index page in web site;First takes out
Take unit 132 for extract in index page effective index entry to generate the first index page.Judging unit in generation module 140
141 for judging whether the resource page in web site has title;Second extracting unit 142 is for the money in web site
In the case of the source page has title, the title of extraction resource page is as index entry;And signal generating unit 143 is at web
In the case of resource page in website does not has title, generate resource page summary info as index entry, and according to
Index entry generates the second index page.Set up the definition unit 151 in module 150 corresponding many for predefined and different semanteme
Individual template;Taxon 152 is for classifying the resource page in plural web site according to semantic dependency and organizing
In the first web site;Retrieval unit 153 for according to the semantic dependency of the first web site find one corresponding
Template;Fill unit 154 for correspondence will be filled into respectively according to the resource page of semantic dependency classification in the first web site
In the different columns of template;And set up unit 155 for setting up cross-site index page according to the information in different templates.
More specifically, first checks that module 110 needs to obtain the web site of structurized resource membership credentials for detection
Number;If second checks that module 120 to obtain structurized resource tissue pass for checking that module 110 checks first
In the case of the web site of system is one, starts to excavate the page of this web site, check for comprising this web site
The index page of index information;If optimizing module 130 for checking that module 120 checks this web site and has index page second
In the case of, obtain the page info in this index page, this page info is optimized, and the first index page institute is set up in acquisition
The information needed, generates the first index page of this web site;If second, generation module 140 is for checking that 120 module inspections should
In the case of web site does not has index page, the resource page of this web site is excavated, according to the acquisition of information web of resource page
The structure of website, generates the second index page of this web site according to the structural information of web site;And set up module 150 for
If checking that module 110 checks that to obtain the web site of structurized resource membership credentials be plural feelings first
Under condition, contact these web site according to the semantic dependency of resource page, cross over station for acquiring resource index organizational information, and
Resource index organizational information according to getting generates index page.Optimizing module this page info is optimized, and
Obtain and set up the information needed for the first index page, when generating the first index page of this web site, obtained by removing module 131
This index pages information is also analyzed by the full detail of this index pages, non-index information therein is deleted, with
Time index entry information remaining in index page information is checked, see these index entries whether can connect to its point to web
Resource page in website, deletes and not can connect to the index entry of resource page in himself pointed web site;Then pass through
First abstraction module 132 extracts remaining effective index information in index page and is incorporated on a page, generates the first index
The page.At generation module 140 in the case of web site does not has index page, according to structural generation second index page of web site
In, start to excavate the resource page of this web site especially by judging unit 141 from homepage, obtain the resource page of this web site
Information, it is judged that whether resource page has heading message;If wherein resource page has heading message, then by the second extraction
Unit 142 obtains index entry, extracts the title in this resource page as index entry, if resource page does not has heading message, then
The main information comprised according to resource page by signal generating unit 143 generates summary info, and using this summary info as index
, the index entry obtaining all resource pages is incorporated on a page, generates the second index page.Set up module 150 based on
When cross-site index page set up in semanteme, by semantic corresponding multiple templates that definition unit 151 is predefined and different, then lead to
Cross the information included in the resource page that taxon 152 obtains in each web site, and according to the semanteme in resource page information
The resource page of each web site is classified by dependency, and information semantic relevant in each resource page is organized the
In one web site, then arrived according to the semantic dependency of the resource page of tissue in the first web site by retrieval unit 153
Predefined semantic template makes a look up, obtains first template corresponding with the semanteme of the first web site, then pass through and fill out
Fill the unit 154 attribute according to key word, by the information of each resource page in the first web site according to semantic dependency
It is filled in the first template, according to the difference of column, adds corresponding resource page information, finally by setting up unit 155
Insert relevant information according in different semantic templates, integrate the key word of each template with semantic as index entry, it is established that
Cross-site index page.
The specific implementation process of setting up module 150 is exemplified below.
Such as, the semantic template of one books information of definition, the inside includes books essential information, popular comment, businessman's ratio
The sub-columns such as valency, e-sourcing and other modules;It is then assumed that each site resource page has a book big talk design pattern,
Then the relevant information talking about design pattern in each resource page is incorporated in the first web site as a class;Then basis
The keyword lookup of the information of big talk design this this book of pattern is to books information model, according to the sub-column module in template by the
Each the sub-column that relevant information in one web site is filled in template, then extracts the key message in template as rope
Draw item, so set up multiple page and extract index entry, index entry can be integrated and set up cross-site index page.
In one embodiment of the invention, different semantemes includes novel title, news title, video name and commodity
Title etc..
In one embodiment of the invention, non-index information includes advertisement and animation.
Web site resource managing device according to embodiments of the present invention, by filling out the information classification tissue of multiple websites
It is charged in template be indexed the generation of item, improves the definition between the degree of polymerization of resource membership credentials and relation.
In the description of this specification, reference term " embodiment ", " some embodiments ", " example ", " specifically show
Example " or the description of " some examples " etc. means to combine this embodiment or example describes specific features, structure, material or spy
Point is contained at least one embodiment or the example of the present invention.In this manual, to the schematic representation of above-mentioned term not
Necessarily refer to identical embodiment or example.And, the specific features of description, structure, material or feature can be any
One or more embodiments or example in combine in an appropriate manner.
Although an embodiment of the present invention has been shown and described, for the ordinary skill in the art, permissible
Understand and these embodiments can be carried out multiple change without departing from the principles and spirit of the present invention, revise, replace
And modification, the scope of the present invention is limited by claims and equivalent thereof.
Claims (12)
1. a web site resource management method, it is characterised in that comprise the following steps:
Check the number of described web site;
If the number of described web site is one, then check whether described web site has index page further;
If there being index page, then it is optimized to generate the first index page to described index page;
Without index page, then according to structural generation second index page of described web site;And
If the number of described web site is two or more, then set up cross-site index page based on semanteme.
Web site resource management method the most according to claim 1, it is characterised in that described index page is optimized
Include generating the step of the first index page:
Delete the non-index information in described index page;
Delete the index entry of the resource page that not can connect in described index page in described web site;And
Extract the effective index entry in described index page to generate described first index page.
Web site resource management method the most according to claim 1 and 2, it is characterised in that according to described web site
The step of structural generation the second index page includes:
Judge whether the resource page in described web site has title;
If it is, extract the title of described resource page as index entry;
If it is not, then generate the summary info of described resource page as index entry;And
Described second index page is generated according to described index entry.
Web site resource management method the most according to claim 1 and 2, it is characterised in that set up cross-site based on semanteme
The step of index page includes:
Predefined and different semantic corresponding multiple templates;
Resource page in web site more than said two is classified according to semantic dependency and organizes the first web site
In;
The first template corresponding to the first web site is found according to described first web site semantic dependency;
The different sons of the first template will be filled into respectively according to the resource page of semantic dependency classification in described first web site
In column;And
Described cross-site index page is set up according to the information in different templates.
Web site resource management method the most according to claim 4, it is characterised in that described different semanteme includes little
Say title, news title, video name and trade name.
Web site resource management method the most according to claim 2, it is characterised in that described non-index information includes extensively
Accuse and animation.
7. a web site resource managing device, it is characterised in that including:
First checks module, and described first checks that module is for checking the number of described web site;
Second checks module, and described second checks that module, in the case of the number of described web site is one, checks described
Whether web site has index page;
Optimizing module, described index page, in the case of described web site has index page, is carried out excellent by described optimization module
Change to generate the first index page;
Generation module, described generation module is not in the case of described web site has index page, according to described web site
Structural generation the second index page;And
Setting up module, described module of setting up, in the case of being two or more at the number of described web site, is built based on semanteme
Vertical cross-site index page.
Web site resource managing device the most according to claim 7, it is characterised in that described optimization module includes:
Deleting unit, described deletion unit is for deleting in the non-index information in described index page and described index page and can not connect
Receive the index entry of resource page in described web site;And
First extracting unit, described first extracting unit is for extracting effective index entry in described index page to generate described the
One index page.
9. according to the web site resource managing device described in claim 7 or 8, it is characterised in that described generation module includes:
Judging unit, described judging unit is for judging whether the resource page in described web site has title;
Second extracting unit, described second extracting unit has headed situation for the resource page in described web site
Under, extract the title of described resource page as index entry;And
Signal generating unit, described signal generating unit, in the case of the resource page in described web site does not has title, generates
The summary info of described resource page is as index entry, and generates described second index page according to described index entry.
10. according to the web site resource managing device described in claim 7 or 8, it is characterised in that set up module and include:
Definition unit, described definition unit is for predefined and different semantic corresponding multiple templates;
Taxon, described taxon is for being correlated with the resource page in web site more than said two according to semanteme
In property is classified and is organized the first web site;
Retrieval unit, finds a corresponding template according to the semantic dependency of the first web site;
Filling unit, described filling unit is for the resource page will classified according to semantic dependency in described first web site
It is filled into respectively in the different columns of corresponding templates;And
Setting up unit, described unit of setting up is for setting up described cross-site index page according to the information in different templates.
11. web site resource managing devices according to claim 10, it is characterised in that described different semanteme includes
Novel title, news title and video name.
12. web site resource managing devices according to claim 8, it is characterised in that described non-index information includes extensively
Accuse and animation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210222539.2A CN103514221B (en) | 2012-06-28 | 2012-06-28 | A kind of web site resource management method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210222539.2A CN103514221B (en) | 2012-06-28 | 2012-06-28 | A kind of web site resource management method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103514221A CN103514221A (en) | 2014-01-15 |
CN103514221B true CN103514221B (en) | 2016-12-28 |
Family
ID=49896954
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210222539.2A Active CN103514221B (en) | 2012-06-28 | 2012-06-28 | A kind of web site resource management method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103514221B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1732459A (en) * | 2002-11-01 | 2006-02-08 | Lg电子株式会社 | Web content transcoding system and method for small display device |
CN101097578A (en) * | 2007-06-07 | 2008-01-02 | 北京金山软件有限公司 | Network resource searching method and system |
CN101887422A (en) * | 2009-05-13 | 2010-11-17 | 北京博越世纪科技有限公司 | Technique for keeping synchronous update of data of web site and wap site |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AUPQ680300A0 (en) * | 2000-04-10 | 2000-05-11 | Alexsi Pty Ltd | A method |
US20070143283A1 (en) * | 2005-12-09 | 2007-06-21 | Stephan Spencer | Method of optimizing search engine rankings through a proxy website |
US20080275877A1 (en) * | 2007-05-04 | 2008-11-06 | International Business Machines Corporation | Method and system for variable keyword processing based on content dates on a web page |
-
2012
- 2012-06-28 CN CN201210222539.2A patent/CN103514221B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1732459A (en) * | 2002-11-01 | 2006-02-08 | Lg电子株式会社 | Web content transcoding system and method for small display device |
CN101097578A (en) * | 2007-06-07 | 2008-01-02 | 北京金山软件有限公司 | Network resource searching method and system |
CN101887422A (en) * | 2009-05-13 | 2010-11-17 | 北京博越世纪科技有限公司 | Technique for keeping synchronous update of data of web site and wap site |
Also Published As
Publication number | Publication date |
---|---|
CN103514221A (en) | 2014-01-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102279894A (en) | Method for searching, integrating and providing comment information based on semantics and searching system | |
CN104462501A (en) | Knowledge graph construction method and device based on structural data | |
JP5587989B2 (en) | Providing patent maps by viewpoint | |
CN106203761A (en) | Extract and manifest the user job attribute from data source | |
CN105893551A (en) | Method and device for processing data and knowledge graph | |
CN103810212A (en) | Automated database index creation method and system | |
CN112131449A (en) | Implementation method of cultural resource cascade query interface based on elastic search | |
CN104462508A (en) | Character relation search method and device based on knowledge graph | |
CN106155769A (en) | A kind of workflow processing method, device and workflow engine | |
CN103324622A (en) | Method and device for automatic generating of front page abstract | |
CN104094278A (en) | Pattern matching engine | |
CN102542061A (en) | Intelligent product classification method | |
CN105138538A (en) | Cross-domain knowledge discovery-oriented topic mining method | |
CN103186523A (en) | Electronic device and natural language analyzing method thereof | |
CN104866527A (en) | Dynamic webpage template matching method and device | |
CN104778238A (en) | Video saliency analysis method and video saliency analysis device | |
CN104462504A (en) | Method and device for providing reasoning process data in search | |
CN104102733A (en) | Search content providing method and search engine | |
CN106055546A (en) | Optical disk library full-text retrieval system based on Lucene | |
CN103235821A (en) | Original content searching method and searching server | |
CN108520065A (en) | Name construction method, system, equipment and the storage medium of Entity recognition corpus | |
CN103377225A (en) | Method and device for building knowledge base system | |
CN103412880A (en) | Method and device for determining implicit associated information between multimedia resources | |
CN106055641A (en) | Human-computer interaction method and device oriented to intelligent robot | |
CN107391684A (en) | A kind of method and system for threatening information generation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |