CN104035940B - The storage method and server of web page interlinkage - Google Patents

The storage method and server of web page interlinkage Download PDF

Info

Publication number
CN104035940B
CN104035940B CN201310073553.5A CN201310073553A CN104035940B CN 104035940 B CN104035940 B CN 104035940B CN 201310073553 A CN201310073553 A CN 201310073553A CN 104035940 B CN104035940 B CN 104035940B
Authority
CN
China
Prior art keywords
web page
page interlinkage
block
webpage
analyzed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310073553.5A
Other languages
Chinese (zh)
Other versions
CN104035940A (en
Inventor
蔡兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201310073553.5A priority Critical patent/CN104035940B/en
Publication of CN104035940A publication Critical patent/CN104035940A/en
Application granted granted Critical
Publication of CN104035940B publication Critical patent/CN104035940B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9558Details of hyperlinks; Management of linked annotations

Abstract

The present invention provides a kind of storage method and server of web page interlinkage, the method includes:Webpage to be analyzed is obtained according to index mark;Piecemeal processing is carried out to the webpage to be analyzed, forms at least one web page interlinkage block;Judge whether the web page interlinkage block meets preset association standard, there are correlations wherein the association standard is for judging web page interlinkage block index mark whether corresponding with the webpage to be analyzed;If the web page interlinkage block meets preset association standard, the web page interlinkage of the web page interlinkage block is obtained, and the web page interlinkage of acquisition is preserved to the memory space identified to manipulative indexing.The present invention can quickly the webpage link block of the same category and for it establish index after store, improve related web page link storage and index efficiency.

Description

The storage method and server of web page interlinkage
Technical field
The invention belongs to internet technique field more particularly to the storage methods and server of a kind of web page interlinkage.
Background technology
With constantly popularizing for internet, requirement of the user to internet function is higher and higher.
In the prior art, in order to improve index efficiency, background server is usually to be stored after webpage is established index, so as to Relevant webpage is sent directly to front end when user accesses, and user is shown to by front end.And the background server is When webpage establishes index, usually after user accesses a certain web page interlinkage, the web page interlinkage after access is collected and built Lithol draws, and all news web pages of user's browsing for example is carried out real time indexing, quickly to store web page interlinkage.
Existing internet web page links each other, and often there are a number of other web page interlinkages, Yong Huke in a webpage To open another corresponding webpage by web page interlinkage in a webpage.For example news web page is generally on the right side of text or bottom Portion has corresponding related news or news to recommend block(That is web page interlinkage), novel introduction page can generally be provided in both sides it is similar or The web page interlinkage of popular novel.
But due to user's access habits etc., above-mentioned web page interlinkage is often ignored, it is impossible to by corresponding webpage It has been shown that, once being accessed without user, the server of the prior art can not then be established for the corresponding webpage of the web page interlinkage in time It indexes and stores.
To sum up, there are following technical problems for the prior art:For not being clicked the web page interlinkage of access in webpage, service Device timely and effectively can not be identified and index, and cause index efficiency low, and a large amount of web page interlinkage cannot timely be known It does not index, causes server running efficiency low.
Invention content
The storage method and server that are designed to provide a kind of web page interlinkage of the embodiment of the present invention, it is intended to solve existing For not being clicked the web page interlinkage of access in webpage in technology, server timely and effectively can not be identified and index, Cause index efficiency low, a large amount of web page interlinkage cannot index by timely identification, cause server running efficiency low Technical problem.
In order to solve the above technical problems, the embodiment of the present invention provides following technical scheme:
A kind of storage method of web page interlinkage, the described method comprises the following steps:
Webpage to be analyzed is obtained according to index mark;
Piecemeal processing is carried out to the webpage to be analyzed, forms at least one web page interlinkage block;
Judge whether the web page interlinkage block meets preset association standard, wherein the association standard is described for judging There are correlations for web page interlinkage block index mark whether corresponding with the webpage to be analyzed;
If the web page interlinkage block meets preset association standard, the web page interlinkage of the web page interlinkage block is obtained, and The web page interlinkage of acquisition is preserved to the memory space identified to manipulative indexing.
In order to solve the above technical problems, the embodiment of the present invention also provides following technical scheme:
A kind of server, the server include:
Webpage acquisition module, for obtaining webpage to be analyzed according to index mark;
Piecemeal module for carrying out piecemeal processing to the webpage to be analyzed, forms at least one web page interlinkage block;
Judgment module, for judging whether the web page interlinkage block meets preset association standard, wherein the association mark It is mutatis mutandis that in judging web page interlinkage block index mark whether corresponding with the webpage to be analyzed, there are correlations;
Web page interlinkage acquisition module, for judging that the web page interlinkage block meets preset association mark in the judgment module On time, the web page interlinkage of the web page interlinkage block is obtained;And
Memory module is linked, the web page interlinkage for the web page interlinkage acquisition module to be obtained is preserved to manipulative indexing mark The memory space of knowledge.
The embodiment of the present invention adequately utilizes the characteristics of interlinking for webpage, and piecemeal is carried out to webpage to be analyzed, and Each web page interlinkage block after piecemeal is identified, once recognizing the web page interlinkage block similar to webpage to be analyzed, then should Web page interlinkage block is indexed and is stored, therefore energy of the embodiment of the present invention as with the same category of webpage foundation of webpage to be analyzed The web page interlinkage of enough quickly storage accumulation the same categories simultaneously establishes index for it, improves the index effect of related web page link Rate, and due to the display that bigger probability can be obtained after the corresponding webpage foundation index of related web page link, avoid The wasting of resources improves the running efficiency of server.
Description of the drawings
Fig. 1 is the flow diagram of the storage method of first embodiment of the invention web page interlinkage;
Fig. 2 is the flow diagram of the storage method of second embodiment of the invention web page interlinkage;
Fig. 3 is the flow diagram of the storage method of third embodiment of the invention web page interlinkage;
Fig. 4 is the flow diagram of the storage method of fourth embodiment of the invention web page interlinkage;
Fig. 5 is web page release schematic diagram to be analyzed provided in an embodiment of the present invention;
Fig. 6 is the example structure schematic diagram of first embodiment of the invention server;
Fig. 7 is the example structure schematic diagram of second embodiment of the invention server;
Fig. 8 is the example structure schematic diagram of third embodiment of the invention server;
Fig. 9 is the example structure schematic diagram of fourth embodiment of the invention server.
Specific embodiment
The explanation of following embodiment is with reference to additional schema, to illustrate the particular implementation that the present invention can be used to implementation Example.
Referring to Fig. 1, Fig. 1 is the flow diagram of the storage method of first embodiment of the invention web page interlinkage.
In step S101, webpage to be analyzed is obtained according to index mark.
The marks such as the index mark of meaning of the embodiment of the present invention is for example scientific and technological class, novel class, amusement class, wherein a certain treat Analysis webpage can belong to scientific and technological class webpage, can also belong to novel class webpage etc..Wherein each webpage to be analyzed is corresponding with one A or multiple index mark has the web storage of same index mark in same memory space, quickly to index Recommendation is carried out to related web page to show.
In step s 102, piecemeal processing is carried out to the webpage to be analyzed, forms at least one web page interlinkage block.
The embodiment of the present invention carries out piecemeal it is preferable to use web page release algorithm to the webpage to be analyzed, described to be analyzed Multiple web page interlinkage blocks are formed on webpage, in view of the web page release algorithm is known technology, are not described in detail herein.
In step s 103, judge whether the web page interlinkage block meets preset association standard, if so, carrying out step Otherwise S104 continues step S101.
Wherein described association standard is for judging web page interlinkage block index whether corresponding with the webpage to be analyzed For mark there are correlation, for example the index of the webpage to be analyzed to be identified as scientific and technological class, then whether judges the web page interlinkage block For with the relevant content of scientific and technological class, if so, judging that the web page interlinkage block meets preset association standard.About the association Standard more specifically describes, and please refers to the description of the storage method of second, third and fourth embodiment web page interlinkage, herein temporarily It is not described in detail.
In step S104, the web page interlinkage for the web page interlinkage block for meeting association standard is obtained.
It is preferable to use hypertext markup language for the embodiment of the present invention(Hypertext Markup Language, HTML)Source Code carries out web page interlinkage, to obtain the web page interlinkage of web page interlinkage block, for example to the web page interlinkage block for meeting association standard Obtain the web page address of web page interlinkage block(Uniform Resource Locator, URL).
In step S105, judge whether the web page interlinkage is present in corresponding memory space, if so, being walked Otherwise rapid S101 continues step S106.
In step s 106, the web page interlinkage of acquisition is preserved to the memory space identified to manipulative indexing.It for example will be scientific and technological The web page interlinkage of class is preserved to for storing the memory space of scientific and technological class webpage.
The embodiment of the present invention forms at least one webpage chain in webpage to be analyzed by carrying out piecemeal to webpage to be analyzed Block is connect, judges whether each web page interlinkage block with the webpage to be analyzed belongs to same category later, if so, being analysed to webpage It stores to the memory space belonging to the category.Obviously, the present invention can adequately using interlinking for webpage the characteristics of, it is right Webpage carries out piecemeal, and each web page interlinkage block after piecemeal is identified, once recognize the net similar to webpage to be analyzed Page chained block, then store using the web page interlinkage block as with the same category of webpage of webpage to be analyzed, reach quick The demand of homogeneous data resource is accumulated, improves the display efficiency of related web page chained block, and then improve the operating of server Efficiency.
Referring to Fig. 2, Fig. 2 is the flow diagram of the storage method for the web page interlinkage that second embodiment of the invention provides.
In step s 201, webpage to be analyzed is obtained according to index mark.
In step S202, piecemeal processing is carried out to the webpage to be analyzed, forms at least one web page interlinkage block.
Step S201 and step S202 wherein in the embodiment of the present invention correspond to step S101 in first embodiment and Step S102, no further details here.
In step S203, the first banner of the webpage to be analyzed is obtained.
Wherein described first banner is preferably webpage network address, and for example first banner is:http:// www.alibuybuy.com/posts/78920.html
In step S204, the second banner of the web page interlinkage block is obtained.
Second banner of wherein described web page interlinkage block corresponds to first banner, if for example first net Page is identified as web page address, then second banner is also web page address, and for example second banner ishttp:// www.alibuybuy.com/posts/78958.html
In step S205, second banner and first banner are compared, if described second It is preset then to judge that the web page interlinkage block meets in preset similarity dimensions for banner and first banner Association standard, and step S206 is carried out, otherwise carry out step S201.
In specific implementation process, the similarity dimensions are for example for 80.0% to 99.9% to the first banner and the When two banners are compared, can preferentially it be compared from the main link identifier of banner, later to banner Sublink identifier is compared, if for example first banner is:http://www.alibuybuy.com/posts/ 78920.html;And second banner is:http://www.alibuybuy.com/posts/78958.html, then The main link identifier of the first banner and the second banner is compared first, since the main link identifier of the two ishttp://www.alibuybuy.com/posts/, then it can determine that the first banner and the second banner are basically identical, it Continue to judge that the sublink of the first banner and the second banner accords with afterwards, wherein the sublink symbol of the first banner is 78920.html, and the sublink of the second banner symbol is 78958.html, the sublink symbol of the two differs only by two numbers Word based on above-mentioned analysis, then can determine that the similarity of first banner and the second banner close to 98%, similar In the range of degree, therefore, it is determined that the corresponding webpage of both first banner and second banner is related, i.e., described second The corresponding web page interlinkage block of banner webpage to be analyzed corresponding with first banner has correlation.
In step S206, second banner is stored to the memory space identical with the webpage to be analyzed.
I.e. described web storage to be analyzed and the memory space for being labeled with scientific and technological class index mark, then by second webpage Mark is equally stored to the memory space of scientific and technological class index mark.
Referring to Fig. 3, Fig. 3 is the flow diagram of the storage method for the web page interlinkage that third embodiment of the invention provides.
In step S301, webpage to be analyzed is obtained according to index mark.
In step s 302, piecemeal processing is carried out to the webpage to be analyzed, forms at least one web page interlinkage block.
Step S201 and step S202 wherein in the embodiment of the present invention correspond to step S101 in first embodiment and Step S102, no further details here.
In step S303, the content of web page interlinkage block is obtained.
Specifically, obtaining web page interlinkage word in the block, and summarize to the word segment of acquisition, obtain the webpage chain The general contents of block are connect, for example the web page interlinkage block mainly introduces Space Science and Technology content.
In step s 304, judge whether the content of web page interlinkage block is consistent with the index mark of the webpage to be analyzed, If so, judging that the web page interlinkage block meets preset association standard, and carries out step S305, step S301 is otherwise carried out.
For example the content of the web page interlinkage block is the introduction about Space Science and Technology, and the index mark of the webpage to be analyzed Know for scientific and technological class, then both can determine that content is consistent.
In step S305, the web page interlinkage for the web page interlinkage block for meeting association standard is obtained.
In step S306, the web page interlinkage of acquisition is preserved to the memory space identified to manipulative indexing.
Referring to Fig. 4, Fig. 4 is the flow diagram of the storage method of fourth embodiment of the invention web page interlinkage.
In step S401, webpage to be analyzed is obtained according to index mark.
In step S402, piecemeal processing is carried out to the webpage to be analyzed, forms at least one web page interlinkage block.
Step S401 and step S402 wherein in the embodiment of the present invention correspond to step S101 in first embodiment and Step S102, no further details here.
In step S403, the attributive character of the web page interlinkage block is obtained.
Preferably, the attributive character of the web page interlinkage block includes the shape information of the web page interlinkage block, the webpage Chained block is in the location information of the webpage to be analyzed, the area ratio and net of the web page interlinkage block and the webpage to be analyzed The link density of page chained block;The link density of wherein described web page interlinkage block is the word for existing in the web page interlinkage block link The ratio of symbol and all characters.
In step s 404, judge whether the attributive character of the web page interlinkage block meets preset linked character, if so, Then judge that the web page interlinkage block meets preset association standard, and carries out step S405, otherwise continue step S401.
For example, if the shape of the web page interlinkage block is rectangle, it can determine that the web page interlinkage block meets association standard; If the web page interlinkage block is located at the right side centre position of webpage to be analyzed, this piece can generally include it is similar to current novel or Popular other novel lobby page URL, then can determine that the web page interlinkage block meets association standard;If the web page interlinkage block with The area ratio of the webpage to be analyzed is more than 10%, then can determine that the web page interlinkage block meets association standard;If the webpage chain The link density for connecing block is more than 40%, then can determine that the web page interlinkage block meets association standard, etc..
In step S405, the web page interlinkage for the web page interlinkage block for meeting association standard is obtained.
In step S406, the web page interlinkage of acquisition is preserved to the memory space identified to manipulative indexing.
Above-mentioned first to fourth embodiment describes in detail the flow of the storage method of web page interlinkage, with reference to specific Example illustrate the course of work of the embodiment of the present invention:
The webpage to be analyzed of a scientific and technological class is selected first in memory space, for example the web page address to be analyzed is:http://www.alibuybuy.com/posts/79084.html, piecemeal is carried out to the webpage to be analyzed of acquisition later, is formed Multiple web page interlinkage blocks as shown in Figure 5.The web page interlinkage block is identified later, identifies qualified webpage Chained block, for example in Figure 5, since the content of web page interlinkage block M1, M2 and M3 are electronics technology class, with webpage to be analyzed Content " mobile science and technology product " unanimously, thus identify that qualified web page interlinkage block be M1, M2 and M3, later extract M1, The web page interlinkage of M2 and M3, and resources bank is added in, for example web page interlinkage of M1, M2 and M3 is respectively:
http://www.alibuybuy.com/posts/78957.html
http://www.alibuybuy.com/posts/78920.html
http://www.alibuybuy.com/posts/78941.html
The server establishes index " science and technology ", and store after above-mentioned web page interlinkage is extracted, by the web page interlinkage of extraction Into the memory space of scientific and technological class, when subsequent user accesses scientific and technological class website, directly webpage can be extracted from the memory space Link is recommended.
Referring to Fig. 6, Fig. 6 is the structure diagram of first embodiment of the invention server.The server includes webpage Acquisition module 61, piecemeal module 62, judgment module 63, web page interlinkage acquisition module 64 and link memory module 65.
Wherein described webpage acquisition module 61 is used to obtain webpage to be analyzed according to index mark;The piecemeal module 62 is used In carrying out piecemeal processing to the webpage to be analyzed, at least one web page interlinkage block is formed.The judgment module 63 is used to sentence Whether the web page interlinkage block that breaks meets preset association standard, wherein the association standard is used to judge the web page interlinkage block Whether index mark corresponding with the webpage to be analyzed there are correlations.
When the judgment module 63 judges that the web page interlinkage block meets preset association standard, the web page interlinkage obtains Modulus block 64 is used to obtain the web page interlinkage of the web page interlinkage block;The judgment module 63 is further used to judge the net Whether page link is present in the memory space, if so, the link memory module 65 stops preserving the web page interlinkage To the memory space, otherwise the web page interlinkage of acquisition is preserved the storage identified to manipulative indexing by the link memory module 65 Space.
Referring to Fig. 7, Fig. 7 is the structure diagram of second embodiment of the invention server.The server specifically includes Webpage acquisition module 71, piecemeal module 72, judgment module 73, web page interlinkage acquisition module 74 and link memory module 75.
The difference lies in the judgment module 73 of the second embodiment includes mark and obtains with the server of first embodiment Module 731 and mark comparison module 732:The identifier acquisition module 731 is used to obtain the first webpage of the webpage to be analyzed Mark and the second banner for obtaining the web page interlinkage block;The mark comparison module 732 is used for described the Two banners are compared with first banner, judge that second banner and first banner are It is no in preset similarity dimensions, if so, judging that the web page interlinkage block meets preset association standard.And the link Second banner is preserved the memory space identified to manipulative indexing by memory module 75.
Referring to Fig. 8, Fig. 8 is the structure diagram of third embodiment of the invention server.The server specifically includes Webpage acquisition module 81, piecemeal module 82, judgment module 83, web page interlinkage acquisition module 84 and link memory module 85.
The difference lies in the judgment module 83 of the second embodiment includes content obtaining with the server of first embodiment Module 831 and content comparison module 832:The content obtaining module 831 is used to obtain the content of the web page interlinkage block;And institute Content comparison module 832 is stated then for judging whether the content of the web page interlinkage block identifies with the index of the webpage to be analyzed Unanimously, if so, judging that the web page interlinkage block meets preset association standard.
Referring to Fig. 9, Fig. 9 is the structure diagram of fourth embodiment of the invention server.The server specifically includes Webpage acquisition module 91, piecemeal module 92, judgment module 93, web page interlinkage acquisition module 94 and link memory module 95.
The difference lies in the judgment module 93 of the second embodiment includes attributive character with the server of first embodiment Acquisition module 931 and attributive character comparison module 932:The attributive character acquisition module 931 is used to obtain the web page interlinkage The attributive character of block;The attributive character comparison module 932 is then used to judge whether the attributive character of the web page interlinkage block accords with Preset linked character is closed, if so, judging that the web page interlinkage block meets preset association standard.
The attributive character of wherein described web page interlinkage block preferably includes the shape information of the web page interlinkage block, the webpage Chained block is in the location information of the webpage to be analyzed, the area ratio and net of the web page interlinkage block and the webpage to be analyzed The link density of page chained block;The link density of wherein described web page interlinkage block is the word for existing in the web page interlinkage block link The ratio of symbol and all characters.
Operation principle about each module in the server please refers to the reality of the storage method above with respect to web page interlinkage The description of example is applied, no further details here.
The embodiment of the present invention forms at least one web page interlinkage block, sentences later by carrying out piecemeal to webpage to be analyzed Whether each web page interlinkage block that breaks with the webpage to be analyzed has correlation, if with correlation, is analysed to web storage To the memory space belonging to webpage to be analyzed this described.Obviously, the present invention can be adequately using interlinking for webpage Feature carries out piecemeal to webpage to be analyzed, and each web page interlinkage block after piecemeal is identified, once recognize with it is to be analyzed The similar web page interlinkage block of webpage is then deposited using the web page interlinkage block as with the same category of webpage of webpage to be analyzed Storage can quickly store the web page interlinkage of accumulation the same category and establish index for it, improve the rope of related web page link Draw efficiency, and then improve the running efficiency of server.
In conclusion although the present invention is disclosed above with preferred embodiment, above preferred embodiment is not to limit The system present invention, those of ordinary skill in the art without departing from the spirit and scope of the present invention, can make various changes and profit Decorations, therefore protection scope of the present invention is subject to the range that claim defines.

Claims (10)

1. a kind of storage method of web page interlinkage, which is characterized in that the method includes:
Webpage to be analyzed is obtained according to index mark;
Piecemeal processing is carried out to the webpage to be analyzed, forms at least one web page interlinkage block;
Judge whether the web page interlinkage block meets preset association standard, wherein the association standard is used to judge the webpage There are correlations for chained block index mark whether corresponding with the webpage to be analyzed;The method of judgement includes:Obtain the net The content of page chained block;Judge whether the content of the web page interlinkage block is consistent with the index mark of the webpage to be analyzed, if It is then to judge that the web page interlinkage block meets preset association standard;
If the web page interlinkage block meets preset association standard, the web page interlinkage of the web page interlinkage block is obtained, and will be obtained The web page interlinkage taken preserves the memory space identified to manipulative indexing.
2. the storage method of web page interlinkage according to claim 1, which is characterized in that described to judge the web page interlinkage block The step of whether meeting preset association standard includes:
Obtain the first banner of the webpage to be analyzed;
Obtain the second banner of the web page interlinkage block;
Second banner and first banner are compared, judge second banner and described Whether one banner is in preset similarity dimensions, if so, judging that the web page interlinkage block meets preset association mark It is accurate.
3. the storage method of web page interlinkage according to claim 1, which is characterized in that described to judge the web page interlinkage block The step of whether meeting preset association standard specifically includes:
Obtain the attributive character of the web page interlinkage block;
Judge whether the attributive character of the web page interlinkage block meets preset linked character, if so, judging the webpage chain It connects block and meets preset association standard.
4. the storage method of web page interlinkage according to claim 3, which is characterized in that the attribute of the web page interlinkage block is special Sign include the shape information of the web page interlinkage block, the web page interlinkage block the webpage to be analyzed location information, described The area ratio of web page interlinkage block and the webpage to be analyzed and web page interlinkage block link density;
The link density of wherein described web page interlinkage block is the character that there is link in the web page interlinkage block and all characters Ratio.
5. the storage method of web page interlinkage according to claim 1, which is characterized in that by the web page interlinkage of acquisition preserve to Before the step of memory space of manipulative indexing mark, the method further includes:
Judge whether the web page interlinkage is present in the memory space, if existing, the web page interlinkage is protected in stopping It deposits to the memory space, otherwise the web page interlinkage of acquisition is preserved to the memory space identified to manipulative indexing.
6. a kind of server, which is characterized in that the server includes:
Webpage acquisition module, for obtaining webpage to be analyzed according to index mark;
Piecemeal module for carrying out piecemeal processing to the webpage to be analyzed, forms at least one web page interlinkage block;
Judgment module, for judging whether the web page interlinkage block meets preset association standard, wherein the association standard is used In judging web page interlinkage block index mark whether corresponding with the webpage to be analyzed, there are correlations;The judgment module Including content obtaining module and content comparison module, the content obtaining module, for obtaining the content of the web page interlinkage block; The content comparison module, for judging whether the content of the web page interlinkage block identifies one with the index of the webpage to be analyzed It causes, if so, judging that the web page interlinkage block meets preset association standard;
Web page interlinkage acquisition module, for judging that the web page interlinkage block meets preset association standard in the judgment module When, obtain the web page interlinkage of the web page interlinkage block;And
Memory module is linked, the web page interlinkage for the web page interlinkage acquisition module to be obtained preserves what is identified to manipulative indexing Memory space.
7. server according to claim 6, which is characterized in that the judgment module includes:
Identifier acquisition module, for the first banner for obtaining the webpage to be analyzed and for obtaining the webpage chain Connect the second banner of block;And
Comparison module is identified, for second banner and first banner to be compared, judges described the Whether two banners and first banner are in preset similarity dimensions, if so, judging the web page interlinkage Block meets preset association standard.
8. server according to claim 6, which is characterized in that the judgment module includes:
Attributive character acquisition module, for obtaining the attributive character of the web page interlinkage block;And
Whether attributive character comparison module, the attributive character for judging the web page interlinkage block meet preset linked character, If so, judge that the web page interlinkage block meets preset association standard.
9. server according to claim 8, which is characterized in that the attributive character of the web page interlinkage block includes the net The page shape information of chained block, the web page interlinkage block the location information of the webpage to be analyzed, the web page interlinkage block with The area ratio of the webpage to be analyzed and the link density of web page interlinkage block;
The link density of wherein described web page interlinkage block is the character that there is link in the web page interlinkage block and all characters Ratio.
10. server according to claim 6, which is characterized in that the judgment module, it is further described for judging Whether web page interlinkage is existing with the memory space;
If the web page interlinkage is existing and the web page interlinkage is protected in the memory space, the link memory module stopping It deposits to the memory space, otherwise the web page interlinkage of acquisition is preserved the storage identified to manipulative indexing by the link memory module Space.
CN201310073553.5A 2013-03-07 2013-03-07 The storage method and server of web page interlinkage Active CN104035940B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310073553.5A CN104035940B (en) 2013-03-07 2013-03-07 The storage method and server of web page interlinkage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310073553.5A CN104035940B (en) 2013-03-07 2013-03-07 The storage method and server of web page interlinkage

Publications (2)

Publication Number Publication Date
CN104035940A CN104035940A (en) 2014-09-10
CN104035940B true CN104035940B (en) 2018-07-06

Family

ID=51466711

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310073553.5A Active CN104035940B (en) 2013-03-07 2013-03-07 The storage method and server of web page interlinkage

Country Status (1)

Country Link
CN (1) CN104035940B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113542047B (en) * 2020-04-21 2023-04-07 北京沃东天骏信息技术有限公司 Abnormal request detection method and device, electronic equipment and computer readable medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079057A (en) * 2007-03-14 2007-11-28 腾讯科技(深圳)有限公司 System and method for keeping multiple link object of web page
CN101650715A (en) * 2008-08-12 2010-02-17 厦门市美亚柏科信息股份有限公司 Method and device for screening links on web pages
US7693875B2 (en) * 2006-01-09 2010-04-06 International Business Machines Corporation Method for searching a data page for inserting a data record
CN101916285A (en) * 2010-08-20 2010-12-15 北京新岸线网络技术有限公司 Method and device for analyzing internet web page contents
CN101976271A (en) * 2010-11-19 2011-02-16 上海合合信息科技发展有限公司 Method for automatically extracting website and opening web page
CN102646129A (en) * 2012-03-09 2012-08-22 武汉大学 Topic-relative distributed web crawler system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7693875B2 (en) * 2006-01-09 2010-04-06 International Business Machines Corporation Method for searching a data page for inserting a data record
CN101079057A (en) * 2007-03-14 2007-11-28 腾讯科技(深圳)有限公司 System and method for keeping multiple link object of web page
CN101650715A (en) * 2008-08-12 2010-02-17 厦门市美亚柏科信息股份有限公司 Method and device for screening links on web pages
CN101916285A (en) * 2010-08-20 2010-12-15 北京新岸线网络技术有限公司 Method and device for analyzing internet web page contents
CN101976271A (en) * 2010-11-19 2011-02-16 上海合合信息科技发展有限公司 Method for automatically extracting website and opening web page
CN102646129A (en) * 2012-03-09 2012-08-22 武汉大学 Topic-relative distributed web crawler system

Also Published As

Publication number Publication date
CN104035940A (en) 2014-09-10

Similar Documents

Publication Publication Date Title
CN109325165B (en) Network public opinion analysis method, device and storage medium
CN106250513B (en) Event modeling-based event personalized classification method and system
CN109145216A (en) Network public-opinion monitoring method, device and storage medium
WO2019218514A1 (en) Method for extracting webpage target information, device, and storage medium
CN101216825B (en) Indexing key words extraction/ prediction method
CN109359244A (en) A kind of recommendation method for personalized information and device
CN103294781B (en) A kind of method and apparatus for processing page data
CN102270206A (en) Method and device for capturing valid web page contents
TWI695277B (en) Automatic website data collection method
CN106815307A (en) Public Culture knowledge mapping platform and its use method
CN110827112B (en) Deep learning commodity recommendation method and device, computer equipment and storage medium
CN110457579B (en) Webpage denoising method and system based on cooperative work of template and classifier
CN110020312B (en) Method and device for extracting webpage text
JP5013065B2 (en) Rustic monitoring system, ruling monitoring method and program
CN103942211B (en) A kind of recognition methods of text page and device
CN107291755A (en) A kind of terminal method for pushing and device
CN107608980A (en) Information-pushing method and system based on the analysis of DPI big datas
CN102314494A (en) Method and equipment for processing webpage contents
CN103729178A (en) Method and system for processing multiple tabs of browsers
CN106446123A (en) Webpage verification code element identification method
CN106095772A (en) The method and apparatus that a kind of http protocol information extracts
CN113569118A (en) Self-media pushing method and device, computer equipment and storage medium
CN101115024A (en) Method and system for displaying web page contents related information
CN103593360A (en) Internet information publishing time extraction method based on page analysis
CN104035940B (en) The storage method and server of web page interlinkage

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant