CN104035940A - Webpage link storage method and server - Google Patents

Webpage link storage method and server Download PDF

Info

Publication number
CN104035940A
CN104035940A CN201310073553.5A CN201310073553A CN104035940A CN 104035940 A CN104035940 A CN 104035940A CN 201310073553 A CN201310073553 A CN 201310073553A CN 104035940 A CN104035940 A CN 104035940A
Authority
CN
China
Prior art keywords
web page
page interlinkage
piece
webpage
analyzed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310073553.5A
Other languages
Chinese (zh)
Other versions
CN104035940B (en
Inventor
蔡兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201310073553.5A priority Critical patent/CN104035940B/en
Publication of CN104035940A publication Critical patent/CN104035940A/en
Application granted granted Critical
Publication of CN104035940B publication Critical patent/CN104035940B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9558Details of hyperlinks; Management of linked annotations

Abstract

The invention provides a webpage link storage method and a server. The webpage link storage method comprises obtaining a to-be-analyzed webpage according to an index tag; processing the to-be-analyzed webpage in a blocking mode to form into at least one webpage link block; judging whether the webpage link block confirms to the presupposed correlation standard which is used for judging whether correlation exists between the webpage link block and the index tag which is corresponding to the to-be-analyzed webpage or not; obtaining a webpage link of the webpage link block and saving the obtained webpage link to corresponding storage space of the index tag if the webpage link block confirms to the presupposed correlation standard. According to the webpage link storage method, indexes are created for webpage link blocks of the same category after the webpage link blocks are stored and the correlated webpage link storage efficiency and the index efficiency are improved.

Description

The storage means of web page interlinkage and server
Technical field
The invention belongs to internet technique field, relate in particular to a kind of storage means and server of web page interlinkage.
Background technology
Along with constantly popularizing of internet, user is more and more higher to the requirement of internet function.
In prior art, in order to improve index efficiency, background server is generally to store after webpage is set up to index, to relevant webpage is directly sent to front end when user accesses, and is shown to user by front end.And described background server is webpage while setting up index, generally to access after a certain web page interlinkage user, index is collected and set up in web page interlinkage after access, and all news web pages of for example user being browsed carry out real time indexing, fast web page interlinkage is stored.
Existing internet web page links each other, often in a webpage, has a plurality of other web page interlinkages, and user can open another corresponding webpage by web page interlinkage in a webpage.For example news web page generally on text right side or bottom have corresponding related news or news to recommend piece (being web page interlinkage), novel is introduced the page generally can provide the web page interlinkage of similar or popular novel in both sides.
But due to reasons such as user's access habits, above-mentioned web page interlinkage often can be left in the basket, can not be by corresponding web displaying, once not have user to access, the server of prior art cannot be in time for webpage corresponding to described web page interlinkage be set up index storage.
To sum up, there is following technical matters in prior art: for the web page interlinkage that there is no clicked access in webpage, server cannot be identified and index timely and effectively, causes index efficiency low, a large amount of web page interlinkages can not be identified index timely, causes server running efficiency low.
Summary of the invention
The object of the embodiment of the present invention is to provide a kind of storage means and server of web page interlinkage, be intended to solve in prior art for the web page interlinkage that there is no clicked access in webpage, server cannot be identified and index timely and effectively, cause index efficiency low, a large amount of web page interlinkages can not be identified index timely, causes the technical matters that server running efficiency is low.
For solving the problems of the technologies described above, the embodiment of the present invention provides following technical scheme:
A storage means for web page interlinkage, said method comprising the steps of:
According to index sign, obtain webpage to be analyzed;
Described webpage to be analyzed is carried out to piecemeal processing, form the web page interlinkage piece of at least one;
Judging whether described web page interlinkage piece meets default associated standard, there is correlativity for judging the index sign whether described web page interlinkage piece is corresponding with described webpage to be analyzed in wherein said associated standard;
If described web page interlinkage piece meets default associated standard, the web page interlinkage of obtaining described web page interlinkage piece, and the web page interlinkage of obtaining is saved to the storage space that manipulative indexing identifies.
For solving the problems of the technologies described above, the embodiment of the present invention also provides following technical scheme:
A server, described server comprises:
Webpage acquisition module, for obtaining webpage to be analyzed according to index sign;
Piecemeal module, for described webpage to be analyzed is carried out to piecemeal processing, forms the web page interlinkage piece of at least one;
Judge module, for judging whether described web page interlinkage piece meets default associated standard, there is correlativity for judging the index sign whether described web page interlinkage piece is corresponding with described webpage to be analyzed in wherein said associated standard;
Web page interlinkage acquisition module, when judging that at described judge module described web page interlinkage piece meets default associated standard, the web page interlinkage of obtaining described web page interlinkage piece; And
Link memory module, is saved to for the web page interlinkage that described web page interlinkage acquisition module is obtained the storage space that manipulative indexing identifies.
The embodiment of the present invention is utilized the feature interlinking for webpage fully, treat analyzing web page and carry out piecemeal, and each web page interlinkage piece after piecemeal is identified, once recognize the web page interlinkage piece similar to webpage to be analyzed, using this web page interlinkage piece as setting up index storage with described other webpage of webpage same class to be analyzed, therefore the embodiment of the present invention can be stored fast the web page interlinkage of accumulation identical category and be set up for it index, improved the index efficiency of related web page link, and because linking corresponding webpage, related web page sets up the demonstration that can access larger probability after index, therefore avoided the wasting of resources, improved the running efficiency of server.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of the storage means of first embodiment of the invention web page interlinkage;
Fig. 2 is the schematic flow sheet of the storage means of second embodiment of the invention web page interlinkage;
Fig. 3 is the schematic flow sheet of the storage means of third embodiment of the invention web page interlinkage;
Fig. 4 is the schematic flow sheet of the storage means of fourth embodiment of the invention web page interlinkage;
Fig. 5 is the webpage piecemeal schematic diagram to be analyzed that the embodiment of the present invention provides;
Fig. 6 is the example structure schematic diagram of first embodiment of the invention server;
Fig. 7 is the example structure schematic diagram of second embodiment of the invention server;
Fig. 8 is the example structure schematic diagram of third embodiment of the invention server;
Fig. 9 is the example structure schematic diagram of fourth embodiment of the invention server.
Embodiment
The explanation of following embodiment is graphic with reference to what add, can be in order to the specific embodiment of implementing in order to illustration the present invention.
Refer to Fig. 1, Fig. 1 is the schematic flow sheet of the storage means of first embodiment of the invention web page interlinkage.
In step S101, according to index sign, obtain webpage to be analyzed.
The index sign of embodiment of the present invention indication is for example the signs such as scientific and technological class, novel class, amusement class, and wherein a certain webpage to be analyzed can belong to scientific and technological class webpage, also can belong to novel class webpage etc.Wherein each webpage to be analyzed, all to there being one or more index sign, has the web storage of same index sign in same storage space, to index fast related web page, recommends to show.
In step S102, described webpage to be analyzed is carried out to piecemeal processing, form the web page interlinkage piece of at least one.
The embodiment of the present invention is preferably used webpage block algorithm to carry out piecemeal to described webpage to be analyzed, forms a plurality of web page interlinkage pieces on described webpage to be analyzed, in view of described webpage block algorithm is known technology, is not described in detail herein.
In step S103, judge whether described web page interlinkage piece meets default associated standard, if so, carry out step S104, otherwise proceed step S101.
For judging the index sign whether described web page interlinkage piece is corresponding with described webpage to be analyzed, there is correlativity in wherein said associated standard, for example the index of described webpage to be analyzed is designated scientific and technological class, judge whether described web page interlinkage piece is the content relevant to scientific and technological class, if so, judge that described web page interlinkage piece meets default associated standard.About described associated standard, describe more specifically, refer to the description of the storage means of second, third and the 4th embodiment web page interlinkage, wouldn't describe in detail herein.
In step S104, obtain the web page interlinkage of the web page interlinkage piece that meets associated standard.
The embodiment of the present invention is preferably used HTML (Hypertext Markup Language) (Hypertext Markup Language, HTML) source code, to meeting the web page interlinkage piece of associated standard, carry out web page interlinkage, to obtain the web page interlinkage of web page interlinkage piece, for example obtain the web page address (Uniform Resource Locator, URL) of web page interlinkage piece.
In step S105, judge whether described web page interlinkage has been present in corresponding storage space, if so, carry out step S101, otherwise proceed step S106.
In step S106, the web page interlinkage of obtaining is saved to the storage space of manipulative indexing sign.For example the web page interlinkage of scientific and technological class is saved to for storing the storage space of scientific and technological class webpage.
The embodiment of the present invention is carried out piecemeal by treating analyzing web page, in webpage to be analyzed, form the web page interlinkage piece of at least one, judge afterwards whether each web page interlinkage piece belongs to same classification with described webpage to be analyzed, if so, by web storage to be analyzed to the storage space under this classification.Obviously, the present invention can utilize the feature interlinking for webpage fully, webpage is carried out to piecemeal, and each web page interlinkage piece after piecemeal is identified, once recognize the web page interlinkage piece similar to webpage to be analyzed,, using this web page interlinkage piece as storing with described other webpage of webpage same class to be analyzed, reach the demand of Rapid Accumulation homogeneous data resource, improve the display efficiency of related web page chained block, and then improved the running efficiency of server.
Refer to Fig. 2, Fig. 2 is the schematic flow sheet of the storage means of the web page interlinkage that provides of second embodiment of the invention.
In step S201, according to index sign, obtain webpage to be analyzed.
In step S202, described webpage to be analyzed is carried out to piecemeal processing, form the web page interlinkage piece of at least one.
Wherein equal step S101 and the step S102 in corresponding the first embodiment of the step S201 in the embodiment of the present invention and step S202 no longer describes in detail herein.
In step S203, obtain the first banner of described webpage to be analyzed.
Wherein said the first banner is preferably webpage network address, and for example described the first banner is: http:// www.alibuybuy.com/posts/78920.html.
In step S204, obtain the second banner of described web page interlinkage piece.
Corresponding described first banner of the second banner of wherein said web page interlinkage piece, if for example described the first banner is web page address, described the second banner is also web page address, for example described the second banner is http:// www.alibuybuy.com/posts/78958.html.
In step S205, described the second banner and described the first banner are contrasted, if described the second banner and described the first banner are within the scope of default similarity, judge that described web page interlinkage piece meets default associated standard, and carry out step S206, otherwise carry out step S201.
In specific implementation process, described similarity scope is for example 80.0% to 99.9% when the first banner and the second banner are contrasted, can preferentially from the main link identifiers of banner, contrast, afterwards the sublink identifier of banner is contrasted, if for example described the first banner is: http://www.alibuybuy.com/posts/78920.html, and described the second banner is: http://www.alibuybuy.com/posts/78958.html, first contrast the main link identifiers of the first banner and the second banner, because both main link identifiers is http://www.alibuybuy.com/posts/, can judge that the first banner and the second banner are basically identical, continue afterwards judgement the first banner and the second banner sublink symbol, wherein the sublink of the first banner symbol is 78920.html, and the sublink of the second banner symbol is 78958.html, both sublink symbols only differ two numerals, based on above-mentioned analysis, the similarity that can judge described the first banner and the second banner approaches 98%, within the scope of similarity, therefore judge that described the first banner is relevant with the webpage of the second banner correspondence, be that the web page interlinkage piece to be analyzed webpage corresponding with described the first banner that described the second banner is corresponding has correlativity.
In step S206, described the second banner is stored to the storage space identical with described webpage to be analyzed.
Be the storage space that described web storage to be analyzed and mark have scientific and technological class index sign, described the second banner be stored to equally to the storage space of scientific and technological class index sign.
Refer to Fig. 3, Fig. 3 is the schematic flow sheet of the storage means of the web page interlinkage that provides of third embodiment of the invention.
In step S301, according to index sign, obtain webpage to be analyzed.
In step S302, described webpage to be analyzed is carried out to piecemeal processing, form the web page interlinkage piece of at least one.
Wherein equal step S101 and the step S102 in corresponding the first embodiment of the step S201 in the embodiment of the present invention and step S202 no longer describes in detail herein.
In step S303, obtain the content of web page interlinkage piece.
Concrete, obtain the word in web page interlinkage piece, and the word segment obtaining is summarized, draw the general contents of this web page interlinkage piece, such as this web page interlinkage piece article Space Science and Technology content.
In step S304, judge that whether the content of web page interlinkage piece is consistent with the index sign of described webpage to be analyzed, if so, judge that described web page interlinkage piece meets default associated standard, and carry out step S305, otherwise carry out step S301.
For example the content of described web page interlinkage piece is the introduction about Space Science and Technology, and the index of described webpage to be analyzed is designated scientific and technological class, can judge that both contents are consistent.
In step S305, obtain the web page interlinkage of the web page interlinkage piece that meets associated standard.
In step S306, the web page interlinkage of obtaining is saved to the storage space of manipulative indexing sign.
Refer to Fig. 4, Fig. 4 is the schematic flow sheet of the storage means of fourth embodiment of the invention web page interlinkage.
In step S401, according to index sign, obtain webpage to be analyzed.
In step S402, described webpage to be analyzed is carried out to piecemeal processing, form the web page interlinkage piece of at least one.
Wherein equal step S101 and the step S102 in corresponding the first embodiment of the step S401 in the embodiment of the present invention and step S402 no longer describes in detail herein.
In step S403, obtain the attributive character of described web page interlinkage piece.
Preferably, the attributive character of described web page interlinkage piece comprises that the shape information of described web page interlinkage piece, described web page interlinkage piece are in the density that links of the Area Ratio of the positional information of described webpage to be analyzed, described web page interlinkage piece and described webpage to be analyzed and web page interlinkage piece; The link density of wherein said web page interlinkage piece is in described web page interlinkage piece, to have the character of link and the ratio of all characters.
In step S404, judge whether the attributive character of described web page interlinkage piece meets default linked character, if so, judge that described web page interlinkage piece meets default associated standard, and carry out step S405, otherwise proceed step S401.
For example, if described web page interlinkage piece be shaped as rectangle, can judge that described web page interlinkage piece meets associated standard; If described web page interlinkage piece is positioned at the centre position, right side of webpage to be analyzed, this generally can comprise other the novel lobby page URL similar or popular to current novel, can judge that described web page interlinkage piece meets associated standard; If the Area Ratio of described web page interlinkage piece and described webpage to be analyzed is greater than 10%, can judge that described web page interlinkage piece meets associated standard; If the link density of described web page interlinkage piece is greater than 40%, can judge that described web page interlinkage piece meets associated standard, etc.
In step S405, obtain the web page interlinkage of the web page interlinkage piece that meets associated standard.
In step S406, the web page interlinkage of obtaining is saved to the storage space of manipulative indexing sign.
The description that above-mentioned first to fourth embodiment is detailed the flow process of storage means of web page interlinkage, below in conjunction with concrete example, the embodiment of the present invention course of work is described:
First in storage space, select the webpage to be analyzed of a scientific and technological class, for example described web page address to be analyzed is: http:// www.alibuybuy.com/posts/79084.html, afterwards the webpage to be analyzed obtaining is carried out to piecemeal, form a plurality of web page interlinkage pieces as shown in Figure 5.Afterwards described web page interlinkage piece is identified, identify qualified web page interlinkage piece, for example in Fig. 5, because the content of web page interlinkage piece M1, M2 and M3 is electronics technology class, consistent with the content " mobile science and technology product " of webpage to be analyzed, therefore identifying qualified web page interlinkage piece is M1, M2 and M3, the web page interlinkage of extracting afterwards M1, M2 and M3, and add resources bank, such as the web page interlinkage of M1, M2 and M3 is respectively:
http://www.alibuybuy.com/posts/78957.html
http://www.alibuybuy.com/posts/78920.html
http://www.alibuybuy.com/posts/78941.html
Described server is after extracting above-mentioned web page interlinkage, the web page interlinkage of extraction is set up to index " science and technology ", and be stored in the storage space of scientific and technological class, when subsequent user is accessed scientific and technological class website, can directly from this storage space, extract web page interlinkage and recommend.
Refer to Fig. 6, Fig. 6 is the structural representation of first embodiment of the invention server.Described server comprises webpage acquisition module 61, piecemeal module 62, judge module 63, web page interlinkage acquisition module 64 and link memory module 65.
Wherein said webpage acquisition module 61 is for obtaining webpage to be analyzed according to index sign; Described piecemeal module 62, for described webpage to be analyzed is carried out to piecemeal processing, forms the web page interlinkage piece of at least one.Described judge module 63 is for judging whether described web page interlinkage piece meets default associated standard, and wherein said associated standard exists correlativity for judging the index sign whether described web page interlinkage piece is corresponding with described webpage to be analyzed.
When described judge module 63 judges that described web page interlinkage piece meets default associated standard, described web page interlinkage acquisition module 64 is for obtaining the web page interlinkage of described web page interlinkage piece; Described judge module 63 is further for judging whether described web page interlinkage has been present in described storage space, if, described link memory module 65 stops described web page interlinkage to be saved to described storage space, otherwise described link memory module 65 is saved to the web page interlinkage of obtaining the storage space of manipulative indexing sign.
Refer to Fig. 7, Fig. 7 is the structural representation of second embodiment of the invention server.Described server specifically comprises webpage acquisition module 71, piecemeal module 72, judge module 73, web page interlinkage acquisition module 74 and link memory module 75.
Be with the server difference of the first embodiment, the judge module 73 of this second embodiment comprises identifier acquisition module 731 and sign comparison module 732: described identifier acquisition module 731 is for obtaining the first banner of described webpage to be analyzed, and for obtaining the second banner of described web page interlinkage piece; Described sign comparison module 732 is for contrasting described the second banner and described the first banner, judge that described the second banner and described the first banner are whether within the scope of default similarity, if so, judge that described web page interlinkage piece meets default associated standard.And described link memory module 75 is saved to described the second banner the storage space of manipulative indexing sign.
Refer to Fig. 8, Fig. 8 is the structural representation of third embodiment of the invention server.Described server specifically comprises webpage acquisition module 81, piecemeal module 82, judge module 83, web page interlinkage acquisition module 84 and link memory module 85.
Be with the server difference of the first embodiment, the judge module 83 of this second embodiment comprises content obtaining module 831 and content comparison module 832: described content obtaining module 831 is for obtaining the content of described web page interlinkage piece; Whether described content comparison module 832 is consistent with the index sign of described webpage to be analyzed for judging the content of described web page interlinkage piece, if so, judges that described web page interlinkage piece meets default associated standard.
Refer to Fig. 9, Fig. 9 is the structural representation of fourth embodiment of the invention server.Described server specifically comprises webpage acquisition module 91, piecemeal module 92, judge module 93, web page interlinkage acquisition module 94 and link memory module 95.
Be with the server difference of the first embodiment, the judge module 93 of this second embodiment comprises attributive character acquisition module 931 and attributive character comparison module 932: described attributive character acquisition module 931 is for obtaining the attributive character of described web page interlinkage piece; 932 of described attributive character comparison modules, for judging whether the attributive character of described web page interlinkage piece meets default linked character, if so, judge that described web page interlinkage piece meets default associated standard.
The attributive character of wherein said web page interlinkage piece preferably includes the shape information of described web page interlinkage piece, described web page interlinkage piece in the density that links of the Area Ratio of the positional information of described webpage to be analyzed, described web page interlinkage piece and described webpage to be analyzed and web page interlinkage piece; The link density of wherein said web page interlinkage piece is in described web page interlinkage piece, to have the character of link and the ratio of all characters.
About the principle of work of each module in described server, refer to above the description for the embodiment of the storage means of web page interlinkage, no longer describe in detail herein.
The embodiment of the present invention is carried out piecemeal by treating analyzing web page, form the web page interlinkage piece of at least one, judge afterwards whether each web page interlinkage piece has correlativity with described webpage to be analyzed, if there is correlativity, by web storage to be analyzed to the storage space under webpage to be analyzed described in this.Obviously, the present invention can utilize the feature interlinking for webpage fully, treat analyzing web page and carry out piecemeal, and each web page interlinkage piece after piecemeal is identified, once recognize the web page interlinkage piece similar to webpage to be analyzed, using this web page interlinkage piece as storing with described other webpage of webpage same class to be analyzed, can store fast the web page interlinkage of accumulation identical category and set up for it index, improve the index efficiency of related web page link, and then improved the running efficiency of server.
In sum; although the present invention discloses as above with preferred embodiment; but above preferred embodiment is not in order to limit the present invention; those of ordinary skill in the art; without departing from the spirit and scope of the present invention; all can do various changes and retouching, so the scope that protection scope of the present invention defines with claim is as the criterion.

Claims (12)

1. a storage means for web page interlinkage, is characterized in that, described method comprises:
According to index sign, obtain webpage to be analyzed;
Described webpage to be analyzed is carried out to piecemeal processing, form the web page interlinkage piece of at least one;
Judging whether described web page interlinkage piece meets default associated standard, there is correlativity for judging the index sign whether described web page interlinkage piece is corresponding with described webpage to be analyzed in wherein said associated standard;
If described web page interlinkage piece meets default associated standard, the web page interlinkage of obtaining described web page interlinkage piece, and the web page interlinkage of obtaining is saved to the storage space that manipulative indexing identifies.
2. the storage means of web page interlinkage according to claim 1, is characterized in that, the described step that judges whether described web page interlinkage piece meets default associated standard comprises:
Obtain the first banner of described webpage to be analyzed;
Obtain the second banner of described web page interlinkage piece;
Described the second banner and described the first banner are contrasted, judge that described the second banner and described the first banner whether within the scope of default similarity, if so, judge that described web page interlinkage piece meets default associated standard.
3. the storage means of web page interlinkage according to claim 1, is characterized in that, the described step that judges whether described web page interlinkage piece meets default associated standard specifically comprises:
Obtain the content of described web page interlinkage piece;
Whether the content that judges described web page interlinkage piece is consistent with the index sign of described webpage to be analyzed, if so, judges that described web page interlinkage piece meets default associated standard.
4. the storage means of web page interlinkage according to claim 1, is characterized in that, the described step that judges whether described web page interlinkage piece meets default associated standard specifically comprises:
Obtain the attributive character of described web page interlinkage piece;
Whether the attributive character that judges described web page interlinkage piece meets default linked character, if so, judges that described web page interlinkage piece meets default associated standard.
5. the storage means of web page interlinkage according to claim 4, it is characterized in that, the attributive character of described web page interlinkage piece comprises that the shape information of described web page interlinkage piece, described web page interlinkage piece are in the density that links of the Area Ratio of the positional information of described webpage to be analyzed, described web page interlinkage piece and described webpage to be analyzed and web page interlinkage piece;
The link density of wherein said web page interlinkage piece is in described web page interlinkage piece, to have the character of link and the ratio of all characters.
6. the storage means of web page interlinkage according to claim 1, is characterized in that, before the web page interlinkage of obtaining being saved to the step of storage space of manipulative indexing sign, described method also comprises:
Judge whether described web page interlinkage has been present in described storage space, if exist, stop described web page interlinkage to be saved to described storage space, otherwise the web page interlinkage of obtaining is saved to the storage space that manipulative indexing identifies.
7. a server, is characterized in that, described server comprises:
Webpage acquisition module, for obtaining webpage to be analyzed according to index sign;
Piecemeal module, for described webpage to be analyzed is carried out to piecemeal processing, forms the web page interlinkage piece of at least one;
Judge module, for judging whether described web page interlinkage piece meets default associated standard, there is correlativity for judging the index sign whether described web page interlinkage piece is corresponding with described webpage to be analyzed in wherein said associated standard;
Web page interlinkage acquisition module, when judging that at described judge module described web page interlinkage piece meets default associated standard, the web page interlinkage of obtaining described web page interlinkage piece; And
Link memory module, is saved to for the web page interlinkage that described web page interlinkage acquisition module is obtained the storage space that manipulative indexing identifies.
8. server according to claim 7, is characterized in that, described judge module comprises:
Identifier acquisition module, for obtaining the first banner of described webpage to be analyzed, and for obtaining the second banner of described web page interlinkage piece; And
Sign comparison module, for described the second banner and described the first banner are contrasted, judge that described the second banner and described the first banner whether within the scope of default similarity, if so, judge that described web page interlinkage piece meets default associated standard.
9. server according to claim 7, is characterized in that, described judge module comprises:
Content obtaining module, for obtaining the content of described web page interlinkage piece; And
Content comparison module, whether consistent with the index sign of described webpage to be analyzed for judging the content of described web page interlinkage piece, if so, judge that described web page interlinkage piece meets default associated standard.
10. server according to claim 7, is characterized in that, described judge module comprises:
Attributive character acquisition module, for obtaining the attributive character of described web page interlinkage piece; And
Attributive character comparison module, for judging whether the attributive character of described web page interlinkage piece meets default linked character, if so, judges that described web page interlinkage piece meets default associated standard.
11. servers according to claim 10, it is characterized in that, the attributive character of described web page interlinkage piece comprises that the shape information of described web page interlinkage piece, described web page interlinkage piece are in the density that links of the Area Ratio of the positional information of described webpage to be analyzed, described web page interlinkage piece and described webpage to be analyzed and web page interlinkage piece;
The link density of wherein said web page interlinkage piece is in described web page interlinkage piece, to have the character of link and the ratio of all characters.
12. servers according to claim 7, is characterized in that, described judge module, further for judging whether described web page interlinkage has existed and described storage space;
If described web page interlinkage has existed and described storage space, described link memory module stops described web page interlinkage to be saved to described storage space, otherwise described link memory module is saved to the web page interlinkage of obtaining the storage space of manipulative indexing sign.
CN201310073553.5A 2013-03-07 2013-03-07 The storage method and server of web page interlinkage Active CN104035940B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310073553.5A CN104035940B (en) 2013-03-07 2013-03-07 The storage method and server of web page interlinkage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310073553.5A CN104035940B (en) 2013-03-07 2013-03-07 The storage method and server of web page interlinkage

Publications (2)

Publication Number Publication Date
CN104035940A true CN104035940A (en) 2014-09-10
CN104035940B CN104035940B (en) 2018-07-06

Family

ID=51466711

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310073553.5A Active CN104035940B (en) 2013-03-07 2013-03-07 The storage method and server of web page interlinkage

Country Status (1)

Country Link
CN (1) CN104035940B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113542047A (en) * 2020-04-21 2021-10-22 北京沃东天骏信息技术有限公司 Abnormal request detection method and device, electronic equipment and computer readable medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070168640A1 (en) * 2006-01-09 2007-07-19 International Business Machines Corporation Method for searching a data page for inserting a data record
CN101079057A (en) * 2007-03-14 2007-11-28 腾讯科技(深圳)有限公司 System and method for keeping multiple link object of web page
CN101650715A (en) * 2008-08-12 2010-02-17 厦门市美亚柏科信息股份有限公司 Method and device for screening links on web pages
CN101916285A (en) * 2010-08-20 2010-12-15 北京新岸线网络技术有限公司 Method and device for analyzing internet web page contents
CN101976271A (en) * 2010-11-19 2011-02-16 上海合合信息科技发展有限公司 Method for automatically extracting website and opening web page
CN102646129A (en) * 2012-03-09 2012-08-22 武汉大学 Topic-relative distributed web crawler system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070168640A1 (en) * 2006-01-09 2007-07-19 International Business Machines Corporation Method for searching a data page for inserting a data record
US7693875B2 (en) * 2006-01-09 2010-04-06 International Business Machines Corporation Method for searching a data page for inserting a data record
CN101079057A (en) * 2007-03-14 2007-11-28 腾讯科技(深圳)有限公司 System and method for keeping multiple link object of web page
CN101650715A (en) * 2008-08-12 2010-02-17 厦门市美亚柏科信息股份有限公司 Method and device for screening links on web pages
CN101916285A (en) * 2010-08-20 2010-12-15 北京新岸线网络技术有限公司 Method and device for analyzing internet web page contents
CN101976271A (en) * 2010-11-19 2011-02-16 上海合合信息科技发展有限公司 Method for automatically extracting website and opening web page
CN102646129A (en) * 2012-03-09 2012-08-22 武汉大学 Topic-relative distributed web crawler system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113542047A (en) * 2020-04-21 2021-10-22 北京沃东天骏信息技术有限公司 Abnormal request detection method and device, electronic equipment and computer readable medium

Also Published As

Publication number Publication date
CN104035940B (en) 2018-07-06

Similar Documents

Publication Publication Date Title
CN109325165B (en) Network public opinion analysis method, device and storage medium
CN109145215B (en) Network public opinion analysis method, device and storage medium
Sun et al. Dom based content extraction via text density
CN109145216A (en) Network public-opinion monitoring method, device and storage medium
CN104750754A (en) Website industry classification method and server
US20150242401A1 (en) Network searching method and network searching system
CN102270206A (en) Method and device for capturing valid web page contents
CN106815307A (en) Public Culture knowledge mapping platform and its use method
CN102662969B (en) Internet information object positioning method based on webpage structure semantic meaning
US20170235726A1 (en) Information identification and extraction
CN104123363A (en) Method and device for extracting main image of webpage
US20200004792A1 (en) Automated website data collection method
CN109492177B (en) web page blocking method based on web page semantic structure
CN104239298A (en) Text message recommendation method, server, browser and system
CN110020312B (en) Method and device for extracting webpage text
CN110457579B (en) Webpage denoising method and system based on cooperative work of template and classifier
CN102314494B (en) Method and equipment for processing webpage contents
CN102541937A (en) Webpage information detection method and system
CN104217038A (en) Knowledge network building method for financial news
CN112632278A (en) Labeling method, device, equipment and storage medium based on multi-label classification
CN103491116A (en) Method and device for processing text-related structural data
WO2014000130A1 (en) Method or system for automated extraction of hyper-local events from one or more web pages
CN112084342A (en) Test question generation method and device, computer equipment and storage medium
CN103942211A (en) Text page recognition method and device
WO2022134776A1 (en) Label-based anti-crawler method and apparatus, computer device, and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant