CN104035940B - The storage method and server of web page interlinkage - Google Patents
The storage method and server of web page interlinkage Download PDFInfo
- Publication number
- CN104035940B CN104035940B CN201310073553.5A CN201310073553A CN104035940B CN 104035940 B CN104035940 B CN 104035940B CN 201310073553 A CN201310073553 A CN 201310073553A CN 104035940 B CN104035940 B CN 104035940B
- Authority
- CN
- China
- Prior art keywords
- web page
- page interlinkage
- block
- webpage
- analyzed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
- G06F16/9558—Details of hyperlinks; Management of linked annotations
Abstract
The present invention provides a kind of storage method and server of web page interlinkage, the method includes:Webpage to be analyzed is obtained according to index mark;Piecemeal processing is carried out to the webpage to be analyzed, forms at least one web page interlinkage block;Judge whether the web page interlinkage block meets preset association standard, there are correlations wherein the association standard is for judging web page interlinkage block index mark whether corresponding with the webpage to be analyzed;If the web page interlinkage block meets preset association standard, the web page interlinkage of the web page interlinkage block is obtained, and the web page interlinkage of acquisition is preserved to the memory space identified to manipulative indexing.The present invention can quickly the webpage link block of the same category and for it establish index after store, improve related web page link storage and index efficiency.
Description
Technical field
The invention belongs to internet technique field more particularly to the storage methods and server of a kind of web page interlinkage.
Background technology
With constantly popularizing for internet, requirement of the user to internet function is higher and higher.
In the prior art, in order to improve index efficiency, background server is usually to be stored after webpage is established index, so as to
Relevant webpage is sent directly to front end when user accesses, and user is shown to by front end.And the background server is
When webpage establishes index, usually after user accesses a certain web page interlinkage, the web page interlinkage after access is collected and built
Lithol draws, and all news web pages of user's browsing for example is carried out real time indexing, quickly to store web page interlinkage.
Existing internet web page links each other, and often there are a number of other web page interlinkages, Yong Huke in a webpage
To open another corresponding webpage by web page interlinkage in a webpage.For example news web page is generally on the right side of text or bottom
Portion has corresponding related news or news to recommend block(That is web page interlinkage), novel introduction page can generally be provided in both sides it is similar or
The web page interlinkage of popular novel.
But due to user's access habits etc., above-mentioned web page interlinkage is often ignored, it is impossible to by corresponding webpage
It has been shown that, once being accessed without user, the server of the prior art can not then be established for the corresponding webpage of the web page interlinkage in time
It indexes and stores.
To sum up, there are following technical problems for the prior art:For not being clicked the web page interlinkage of access in webpage, service
Device timely and effectively can not be identified and index, and cause index efficiency low, and a large amount of web page interlinkage cannot timely be known
It does not index, causes server running efficiency low.
Invention content
The storage method and server that are designed to provide a kind of web page interlinkage of the embodiment of the present invention, it is intended to solve existing
For not being clicked the web page interlinkage of access in webpage in technology, server timely and effectively can not be identified and index,
Cause index efficiency low, a large amount of web page interlinkage cannot index by timely identification, cause server running efficiency low
Technical problem.
In order to solve the above technical problems, the embodiment of the present invention provides following technical scheme:
A kind of storage method of web page interlinkage, the described method comprises the following steps:
Webpage to be analyzed is obtained according to index mark;
Piecemeal processing is carried out to the webpage to be analyzed, forms at least one web page interlinkage block;
Judge whether the web page interlinkage block meets preset association standard, wherein the association standard is described for judging
There are correlations for web page interlinkage block index mark whether corresponding with the webpage to be analyzed;
If the web page interlinkage block meets preset association standard, the web page interlinkage of the web page interlinkage block is obtained, and
The web page interlinkage of acquisition is preserved to the memory space identified to manipulative indexing.
In order to solve the above technical problems, the embodiment of the present invention also provides following technical scheme:
A kind of server, the server include:
Webpage acquisition module, for obtaining webpage to be analyzed according to index mark;
Piecemeal module for carrying out piecemeal processing to the webpage to be analyzed, forms at least one web page interlinkage block;
Judgment module, for judging whether the web page interlinkage block meets preset association standard, wherein the association mark
It is mutatis mutandis that in judging web page interlinkage block index mark whether corresponding with the webpage to be analyzed, there are correlations;
Web page interlinkage acquisition module, for judging that the web page interlinkage block meets preset association mark in the judgment module
On time, the web page interlinkage of the web page interlinkage block is obtained;And
Memory module is linked, the web page interlinkage for the web page interlinkage acquisition module to be obtained is preserved to manipulative indexing mark
The memory space of knowledge.
The embodiment of the present invention adequately utilizes the characteristics of interlinking for webpage, and piecemeal is carried out to webpage to be analyzed, and
Each web page interlinkage block after piecemeal is identified, once recognizing the web page interlinkage block similar to webpage to be analyzed, then should
Web page interlinkage block is indexed and is stored, therefore energy of the embodiment of the present invention as with the same category of webpage foundation of webpage to be analyzed
The web page interlinkage of enough quickly storage accumulation the same categories simultaneously establishes index for it, improves the index effect of related web page link
Rate, and due to the display that bigger probability can be obtained after the corresponding webpage foundation index of related web page link, avoid
The wasting of resources improves the running efficiency of server.
Description of the drawings
Fig. 1 is the flow diagram of the storage method of first embodiment of the invention web page interlinkage;
Fig. 2 is the flow diagram of the storage method of second embodiment of the invention web page interlinkage;
Fig. 3 is the flow diagram of the storage method of third embodiment of the invention web page interlinkage;
Fig. 4 is the flow diagram of the storage method of fourth embodiment of the invention web page interlinkage;
Fig. 5 is web page release schematic diagram to be analyzed provided in an embodiment of the present invention;
Fig. 6 is the example structure schematic diagram of first embodiment of the invention server;
Fig. 7 is the example structure schematic diagram of second embodiment of the invention server;
Fig. 8 is the example structure schematic diagram of third embodiment of the invention server;
Fig. 9 is the example structure schematic diagram of fourth embodiment of the invention server.
Specific embodiment
The explanation of following embodiment is with reference to additional schema, to illustrate the particular implementation that the present invention can be used to implementation
Example.
Referring to Fig. 1, Fig. 1 is the flow diagram of the storage method of first embodiment of the invention web page interlinkage.
In step S101, webpage to be analyzed is obtained according to index mark.
The marks such as the index mark of meaning of the embodiment of the present invention is for example scientific and technological class, novel class, amusement class, wherein a certain treat
Analysis webpage can belong to scientific and technological class webpage, can also belong to novel class webpage etc..Wherein each webpage to be analyzed is corresponding with one
A or multiple index mark has the web storage of same index mark in same memory space, quickly to index
Recommendation is carried out to related web page to show.
In step s 102, piecemeal processing is carried out to the webpage to be analyzed, forms at least one web page interlinkage block.
The embodiment of the present invention carries out piecemeal it is preferable to use web page release algorithm to the webpage to be analyzed, described to be analyzed
Multiple web page interlinkage blocks are formed on webpage, in view of the web page release algorithm is known technology, are not described in detail herein.
In step s 103, judge whether the web page interlinkage block meets preset association standard, if so, carrying out step
Otherwise S104 continues step S101.
Wherein described association standard is for judging web page interlinkage block index whether corresponding with the webpage to be analyzed
For mark there are correlation, for example the index of the webpage to be analyzed to be identified as scientific and technological class, then whether judges the web page interlinkage block
For with the relevant content of scientific and technological class, if so, judging that the web page interlinkage block meets preset association standard.About the association
Standard more specifically describes, and please refers to the description of the storage method of second, third and fourth embodiment web page interlinkage, herein temporarily
It is not described in detail.
In step S104, the web page interlinkage for the web page interlinkage block for meeting association standard is obtained.
It is preferable to use hypertext markup language for the embodiment of the present invention(Hypertext Markup Language, HTML)Source
Code carries out web page interlinkage, to obtain the web page interlinkage of web page interlinkage block, for example to the web page interlinkage block for meeting association standard
Obtain the web page address of web page interlinkage block(Uniform Resource Locator, URL).
In step S105, judge whether the web page interlinkage is present in corresponding memory space, if so, being walked
Otherwise rapid S101 continues step S106.
In step s 106, the web page interlinkage of acquisition is preserved to the memory space identified to manipulative indexing.It for example will be scientific and technological
The web page interlinkage of class is preserved to for storing the memory space of scientific and technological class webpage.
The embodiment of the present invention forms at least one webpage chain in webpage to be analyzed by carrying out piecemeal to webpage to be analyzed
Block is connect, judges whether each web page interlinkage block with the webpage to be analyzed belongs to same category later, if so, being analysed to webpage
It stores to the memory space belonging to the category.Obviously, the present invention can adequately using interlinking for webpage the characteristics of, it is right
Webpage carries out piecemeal, and each web page interlinkage block after piecemeal is identified, once recognize the net similar to webpage to be analyzed
Page chained block, then store using the web page interlinkage block as with the same category of webpage of webpage to be analyzed, reach quick
The demand of homogeneous data resource is accumulated, improves the display efficiency of related web page chained block, and then improve the operating of server
Efficiency.
Referring to Fig. 2, Fig. 2 is the flow diagram of the storage method for the web page interlinkage that second embodiment of the invention provides.
In step s 201, webpage to be analyzed is obtained according to index mark.
In step S202, piecemeal processing is carried out to the webpage to be analyzed, forms at least one web page interlinkage block.
Step S201 and step S202 wherein in the embodiment of the present invention correspond to step S101 in first embodiment and
Step S102, no further details here.
In step S203, the first banner of the webpage to be analyzed is obtained.
Wherein described first banner is preferably webpage network address, and for example first banner is:http:// www.alibuybuy.com/posts/78920.html。
In step S204, the second banner of the web page interlinkage block is obtained.
Second banner of wherein described web page interlinkage block corresponds to first banner, if for example first net
Page is identified as web page address, then second banner is also web page address, and for example second banner ishttp:// www.alibuybuy.com/posts/78958.html。
In step S205, second banner and first banner are compared, if described second
It is preset then to judge that the web page interlinkage block meets in preset similarity dimensions for banner and first banner
Association standard, and step S206 is carried out, otherwise carry out step S201.
In specific implementation process, the similarity dimensions are for example for 80.0% to 99.9% to the first banner and the
When two banners are compared, can preferentially it be compared from the main link identifier of banner, later to banner
Sublink identifier is compared, if for example first banner is:http://www.alibuybuy.com/posts/ 78920.html;And second banner is:http://www.alibuybuy.com/posts/78958.html, then
The main link identifier of the first banner and the second banner is compared first, since the main link identifier of the two ishttp://www.alibuybuy.com/posts/, then it can determine that the first banner and the second banner are basically identical, it
Continue to judge that the sublink of the first banner and the second banner accords with afterwards, wherein the sublink symbol of the first banner is
78920.html, and the sublink of the second banner symbol is 78958.html, the sublink symbol of the two differs only by two numbers
Word based on above-mentioned analysis, then can determine that the similarity of first banner and the second banner close to 98%, similar
In the range of degree, therefore, it is determined that the corresponding webpage of both first banner and second banner is related, i.e., described second
The corresponding web page interlinkage block of banner webpage to be analyzed corresponding with first banner has correlation.
In step S206, second banner is stored to the memory space identical with the webpage to be analyzed.
I.e. described web storage to be analyzed and the memory space for being labeled with scientific and technological class index mark, then by second webpage
Mark is equally stored to the memory space of scientific and technological class index mark.
Referring to Fig. 3, Fig. 3 is the flow diagram of the storage method for the web page interlinkage that third embodiment of the invention provides.
In step S301, webpage to be analyzed is obtained according to index mark.
In step s 302, piecemeal processing is carried out to the webpage to be analyzed, forms at least one web page interlinkage block.
Step S201 and step S202 wherein in the embodiment of the present invention correspond to step S101 in first embodiment and
Step S102, no further details here.
In step S303, the content of web page interlinkage block is obtained.
Specifically, obtaining web page interlinkage word in the block, and summarize to the word segment of acquisition, obtain the webpage chain
The general contents of block are connect, for example the web page interlinkage block mainly introduces Space Science and Technology content.
In step s 304, judge whether the content of web page interlinkage block is consistent with the index mark of the webpage to be analyzed,
If so, judging that the web page interlinkage block meets preset association standard, and carries out step S305, step S301 is otherwise carried out.
For example the content of the web page interlinkage block is the introduction about Space Science and Technology, and the index mark of the webpage to be analyzed
Know for scientific and technological class, then both can determine that content is consistent.
In step S305, the web page interlinkage for the web page interlinkage block for meeting association standard is obtained.
In step S306, the web page interlinkage of acquisition is preserved to the memory space identified to manipulative indexing.
Referring to Fig. 4, Fig. 4 is the flow diagram of the storage method of fourth embodiment of the invention web page interlinkage.
In step S401, webpage to be analyzed is obtained according to index mark.
In step S402, piecemeal processing is carried out to the webpage to be analyzed, forms at least one web page interlinkage block.
Step S401 and step S402 wherein in the embodiment of the present invention correspond to step S101 in first embodiment and
Step S102, no further details here.
In step S403, the attributive character of the web page interlinkage block is obtained.
Preferably, the attributive character of the web page interlinkage block includes the shape information of the web page interlinkage block, the webpage
Chained block is in the location information of the webpage to be analyzed, the area ratio and net of the web page interlinkage block and the webpage to be analyzed
The link density of page chained block;The link density of wherein described web page interlinkage block is the word for existing in the web page interlinkage block link
The ratio of symbol and all characters.
In step s 404, judge whether the attributive character of the web page interlinkage block meets preset linked character, if so,
Then judge that the web page interlinkage block meets preset association standard, and carries out step S405, otherwise continue step S401.
For example, if the shape of the web page interlinkage block is rectangle, it can determine that the web page interlinkage block meets association standard;
If the web page interlinkage block is located at the right side centre position of webpage to be analyzed, this piece can generally include it is similar to current novel or
Popular other novel lobby page URL, then can determine that the web page interlinkage block meets association standard;If the web page interlinkage block with
The area ratio of the webpage to be analyzed is more than 10%, then can determine that the web page interlinkage block meets association standard;If the webpage chain
The link density for connecing block is more than 40%, then can determine that the web page interlinkage block meets association standard, etc..
In step S405, the web page interlinkage for the web page interlinkage block for meeting association standard is obtained.
In step S406, the web page interlinkage of acquisition is preserved to the memory space identified to manipulative indexing.
Above-mentioned first to fourth embodiment describes in detail the flow of the storage method of web page interlinkage, with reference to specific
Example illustrate the course of work of the embodiment of the present invention:
The webpage to be analyzed of a scientific and technological class is selected first in memory space, for example the web page address to be analyzed is:http://www.alibuybuy.com/posts/79084.html, piecemeal is carried out to the webpage to be analyzed of acquisition later, is formed
Multiple web page interlinkage blocks as shown in Figure 5.The web page interlinkage block is identified later, identifies qualified webpage
Chained block, for example in Figure 5, since the content of web page interlinkage block M1, M2 and M3 are electronics technology class, with webpage to be analyzed
Content " mobile science and technology product " unanimously, thus identify that qualified web page interlinkage block be M1, M2 and M3, later extract M1,
The web page interlinkage of M2 and M3, and resources bank is added in, for example web page interlinkage of M1, M2 and M3 is respectively:
http://www.alibuybuy.com/posts/78957.html;
http://www.alibuybuy.com/posts/78920.html;
http://www.alibuybuy.com/posts/78941.html。
The server establishes index " science and technology ", and store after above-mentioned web page interlinkage is extracted, by the web page interlinkage of extraction
Into the memory space of scientific and technological class, when subsequent user accesses scientific and technological class website, directly webpage can be extracted from the memory space
Link is recommended.
Referring to Fig. 6, Fig. 6 is the structure diagram of first embodiment of the invention server.The server includes webpage
Acquisition module 61, piecemeal module 62, judgment module 63, web page interlinkage acquisition module 64 and link memory module 65.
Wherein described webpage acquisition module 61 is used to obtain webpage to be analyzed according to index mark;The piecemeal module 62 is used
In carrying out piecemeal processing to the webpage to be analyzed, at least one web page interlinkage block is formed.The judgment module 63 is used to sentence
Whether the web page interlinkage block that breaks meets preset association standard, wherein the association standard is used to judge the web page interlinkage block
Whether index mark corresponding with the webpage to be analyzed there are correlations.
When the judgment module 63 judges that the web page interlinkage block meets preset association standard, the web page interlinkage obtains
Modulus block 64 is used to obtain the web page interlinkage of the web page interlinkage block;The judgment module 63 is further used to judge the net
Whether page link is present in the memory space, if so, the link memory module 65 stops preserving the web page interlinkage
To the memory space, otherwise the web page interlinkage of acquisition is preserved the storage identified to manipulative indexing by the link memory module 65
Space.
Referring to Fig. 7, Fig. 7 is the structure diagram of second embodiment of the invention server.The server specifically includes
Webpage acquisition module 71, piecemeal module 72, judgment module 73, web page interlinkage acquisition module 74 and link memory module 75.
The difference lies in the judgment module 73 of the second embodiment includes mark and obtains with the server of first embodiment
Module 731 and mark comparison module 732:The identifier acquisition module 731 is used to obtain the first webpage of the webpage to be analyzed
Mark and the second banner for obtaining the web page interlinkage block;The mark comparison module 732 is used for described the
Two banners are compared with first banner, judge that second banner and first banner are
It is no in preset similarity dimensions, if so, judging that the web page interlinkage block meets preset association standard.And the link
Second banner is preserved the memory space identified to manipulative indexing by memory module 75.
Referring to Fig. 8, Fig. 8 is the structure diagram of third embodiment of the invention server.The server specifically includes
Webpage acquisition module 81, piecemeal module 82, judgment module 83, web page interlinkage acquisition module 84 and link memory module 85.
The difference lies in the judgment module 83 of the second embodiment includes content obtaining with the server of first embodiment
Module 831 and content comparison module 832:The content obtaining module 831 is used to obtain the content of the web page interlinkage block;And institute
Content comparison module 832 is stated then for judging whether the content of the web page interlinkage block identifies with the index of the webpage to be analyzed
Unanimously, if so, judging that the web page interlinkage block meets preset association standard.
Referring to Fig. 9, Fig. 9 is the structure diagram of fourth embodiment of the invention server.The server specifically includes
Webpage acquisition module 91, piecemeal module 92, judgment module 93, web page interlinkage acquisition module 94 and link memory module 95.
The difference lies in the judgment module 93 of the second embodiment includes attributive character with the server of first embodiment
Acquisition module 931 and attributive character comparison module 932:The attributive character acquisition module 931 is used to obtain the web page interlinkage
The attributive character of block;The attributive character comparison module 932 is then used to judge whether the attributive character of the web page interlinkage block accords with
Preset linked character is closed, if so, judging that the web page interlinkage block meets preset association standard.
The attributive character of wherein described web page interlinkage block preferably includes the shape information of the web page interlinkage block, the webpage
Chained block is in the location information of the webpage to be analyzed, the area ratio and net of the web page interlinkage block and the webpage to be analyzed
The link density of page chained block;The link density of wherein described web page interlinkage block is the word for existing in the web page interlinkage block link
The ratio of symbol and all characters.
Operation principle about each module in the server please refers to the reality of the storage method above with respect to web page interlinkage
The description of example is applied, no further details here.
The embodiment of the present invention forms at least one web page interlinkage block, sentences later by carrying out piecemeal to webpage to be analyzed
Whether each web page interlinkage block that breaks with the webpage to be analyzed has correlation, if with correlation, is analysed to web storage
To the memory space belonging to webpage to be analyzed this described.Obviously, the present invention can be adequately using interlinking for webpage
Feature carries out piecemeal to webpage to be analyzed, and each web page interlinkage block after piecemeal is identified, once recognize with it is to be analyzed
The similar web page interlinkage block of webpage is then deposited using the web page interlinkage block as with the same category of webpage of webpage to be analyzed
Storage can quickly store the web page interlinkage of accumulation the same category and establish index for it, improve the rope of related web page link
Draw efficiency, and then improve the running efficiency of server.
In conclusion although the present invention is disclosed above with preferred embodiment, above preferred embodiment is not to limit
The system present invention, those of ordinary skill in the art without departing from the spirit and scope of the present invention, can make various changes and profit
Decorations, therefore protection scope of the present invention is subject to the range that claim defines.
Claims (10)
1. a kind of storage method of web page interlinkage, which is characterized in that the method includes:
Webpage to be analyzed is obtained according to index mark;
Piecemeal processing is carried out to the webpage to be analyzed, forms at least one web page interlinkage block;
Judge whether the web page interlinkage block meets preset association standard, wherein the association standard is used to judge the webpage
There are correlations for chained block index mark whether corresponding with the webpage to be analyzed;The method of judgement includes:Obtain the net
The content of page chained block;Judge whether the content of the web page interlinkage block is consistent with the index mark of the webpage to be analyzed, if
It is then to judge that the web page interlinkage block meets preset association standard;
If the web page interlinkage block meets preset association standard, the web page interlinkage of the web page interlinkage block is obtained, and will be obtained
The web page interlinkage taken preserves the memory space identified to manipulative indexing.
2. the storage method of web page interlinkage according to claim 1, which is characterized in that described to judge the web page interlinkage block
The step of whether meeting preset association standard includes:
Obtain the first banner of the webpage to be analyzed;
Obtain the second banner of the web page interlinkage block;
Second banner and first banner are compared, judge second banner and described
Whether one banner is in preset similarity dimensions, if so, judging that the web page interlinkage block meets preset association mark
It is accurate.
3. the storage method of web page interlinkage according to claim 1, which is characterized in that described to judge the web page interlinkage block
The step of whether meeting preset association standard specifically includes:
Obtain the attributive character of the web page interlinkage block;
Judge whether the attributive character of the web page interlinkage block meets preset linked character, if so, judging the webpage chain
It connects block and meets preset association standard.
4. the storage method of web page interlinkage according to claim 3, which is characterized in that the attribute of the web page interlinkage block is special
Sign include the shape information of the web page interlinkage block, the web page interlinkage block the webpage to be analyzed location information, described
The area ratio of web page interlinkage block and the webpage to be analyzed and web page interlinkage block link density;
The link density of wherein described web page interlinkage block is the character that there is link in the web page interlinkage block and all characters
Ratio.
5. the storage method of web page interlinkage according to claim 1, which is characterized in that by the web page interlinkage of acquisition preserve to
Before the step of memory space of manipulative indexing mark, the method further includes:
Judge whether the web page interlinkage is present in the memory space, if existing, the web page interlinkage is protected in stopping
It deposits to the memory space, otherwise the web page interlinkage of acquisition is preserved to the memory space identified to manipulative indexing.
6. a kind of server, which is characterized in that the server includes:
Webpage acquisition module, for obtaining webpage to be analyzed according to index mark;
Piecemeal module for carrying out piecemeal processing to the webpage to be analyzed, forms at least one web page interlinkage block;
Judgment module, for judging whether the web page interlinkage block meets preset association standard, wherein the association standard is used
In judging web page interlinkage block index mark whether corresponding with the webpage to be analyzed, there are correlations;The judgment module
Including content obtaining module and content comparison module, the content obtaining module, for obtaining the content of the web page interlinkage block;
The content comparison module, for judging whether the content of the web page interlinkage block identifies one with the index of the webpage to be analyzed
It causes, if so, judging that the web page interlinkage block meets preset association standard;
Web page interlinkage acquisition module, for judging that the web page interlinkage block meets preset association standard in the judgment module
When, obtain the web page interlinkage of the web page interlinkage block;And
Memory module is linked, the web page interlinkage for the web page interlinkage acquisition module to be obtained preserves what is identified to manipulative indexing
Memory space.
7. server according to claim 6, which is characterized in that the judgment module includes:
Identifier acquisition module, for the first banner for obtaining the webpage to be analyzed and for obtaining the webpage chain
Connect the second banner of block;And
Comparison module is identified, for second banner and first banner to be compared, judges described the
Whether two banners and first banner are in preset similarity dimensions, if so, judging the web page interlinkage
Block meets preset association standard.
8. server according to claim 6, which is characterized in that the judgment module includes:
Attributive character acquisition module, for obtaining the attributive character of the web page interlinkage block;And
Whether attributive character comparison module, the attributive character for judging the web page interlinkage block meet preset linked character,
If so, judge that the web page interlinkage block meets preset association standard.
9. server according to claim 8, which is characterized in that the attributive character of the web page interlinkage block includes the net
The page shape information of chained block, the web page interlinkage block the location information of the webpage to be analyzed, the web page interlinkage block with
The area ratio of the webpage to be analyzed and the link density of web page interlinkage block;
The link density of wherein described web page interlinkage block is the character that there is link in the web page interlinkage block and all characters
Ratio.
10. server according to claim 6, which is characterized in that the judgment module, it is further described for judging
Whether web page interlinkage is existing with the memory space;
If the web page interlinkage is existing and the web page interlinkage is protected in the memory space, the link memory module stopping
It deposits to the memory space, otherwise the web page interlinkage of acquisition is preserved the storage identified to manipulative indexing by the link memory module
Space.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310073553.5A CN104035940B (en) | 2013-03-07 | 2013-03-07 | The storage method and server of web page interlinkage |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310073553.5A CN104035940B (en) | 2013-03-07 | 2013-03-07 | The storage method and server of web page interlinkage |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104035940A CN104035940A (en) | 2014-09-10 |
CN104035940B true CN104035940B (en) | 2018-07-06 |
Family
ID=51466711
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310073553.5A Active CN104035940B (en) | 2013-03-07 | 2013-03-07 | The storage method and server of web page interlinkage |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104035940B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113542047B (en) * | 2020-04-21 | 2023-04-07 | 北京沃东天骏信息技术有限公司 | Abnormal request detection method and device, electronic equipment and computer readable medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101079057A (en) * | 2007-03-14 | 2007-11-28 | 腾讯科技(深圳)有限公司 | System and method for keeping multiple link object of web page |
CN101650715A (en) * | 2008-08-12 | 2010-02-17 | 厦门市美亚柏科信息股份有限公司 | Method and device for screening links on web pages |
US7693875B2 (en) * | 2006-01-09 | 2010-04-06 | International Business Machines Corporation | Method for searching a data page for inserting a data record |
CN101916285A (en) * | 2010-08-20 | 2010-12-15 | 北京新岸线网络技术有限公司 | Method and device for analyzing internet web page contents |
CN101976271A (en) * | 2010-11-19 | 2011-02-16 | 上海合合信息科技发展有限公司 | Method for automatically extracting website and opening web page |
CN102646129A (en) * | 2012-03-09 | 2012-08-22 | 武汉大学 | Topic-relative distributed web crawler system |
-
2013
- 2013-03-07 CN CN201310073553.5A patent/CN104035940B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7693875B2 (en) * | 2006-01-09 | 2010-04-06 | International Business Machines Corporation | Method for searching a data page for inserting a data record |
CN101079057A (en) * | 2007-03-14 | 2007-11-28 | 腾讯科技(深圳)有限公司 | System and method for keeping multiple link object of web page |
CN101650715A (en) * | 2008-08-12 | 2010-02-17 | 厦门市美亚柏科信息股份有限公司 | Method and device for screening links on web pages |
CN101916285A (en) * | 2010-08-20 | 2010-12-15 | 北京新岸线网络技术有限公司 | Method and device for analyzing internet web page contents |
CN101976271A (en) * | 2010-11-19 | 2011-02-16 | 上海合合信息科技发展有限公司 | Method for automatically extracting website and opening web page |
CN102646129A (en) * | 2012-03-09 | 2012-08-22 | 武汉大学 | Topic-relative distributed web crawler system |
Also Published As
Publication number | Publication date |
---|---|
CN104035940A (en) | 2014-09-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109325165B (en) | Network public opinion analysis method, device and storage medium | |
CN106250513B (en) | Event modeling-based event personalized classification method and system | |
CN109145216A (en) | Network public-opinion monitoring method, device and storage medium | |
WO2019218514A1 (en) | Method for extracting webpage target information, device, and storage medium | |
CN101216825B (en) | Indexing key words extraction/ prediction method | |
CN109359244A (en) | A kind of recommendation method for personalized information and device | |
CN103294781B (en) | A kind of method and apparatus for processing page data | |
CN102270206A (en) | Method and device for capturing valid web page contents | |
TWI695277B (en) | Automatic website data collection method | |
CN106815307A (en) | Public Culture knowledge mapping platform and its use method | |
CN110827112B (en) | Deep learning commodity recommendation method and device, computer equipment and storage medium | |
CN110457579B (en) | Webpage denoising method and system based on cooperative work of template and classifier | |
CN110020312B (en) | Method and device for extracting webpage text | |
JP5013065B2 (en) | Rustic monitoring system, ruling monitoring method and program | |
CN103942211B (en) | A kind of recognition methods of text page and device | |
CN107291755A (en) | A kind of terminal method for pushing and device | |
CN107608980A (en) | Information-pushing method and system based on the analysis of DPI big datas | |
CN102314494A (en) | Method and equipment for processing webpage contents | |
CN103729178A (en) | Method and system for processing multiple tabs of browsers | |
CN106446123A (en) | Webpage verification code element identification method | |
CN106095772A (en) | The method and apparatus that a kind of http protocol information extracts | |
CN113569118A (en) | Self-media pushing method and device, computer equipment and storage medium | |
CN101115024A (en) | Method and system for displaying web page contents related information | |
CN103593360A (en) | Internet information publishing time extraction method based on page analysis | |
CN104035940B (en) | The storage method and server of web page interlinkage |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |