CN101477576B - Method, equipment and system for providing network materials to search engine - Google Patents
Method, equipment and system for providing network materials to search engine Download PDFInfo
- Publication number
- CN101477576B CN101477576B CN 200910105235 CN200910105235A CN101477576B CN 101477576 B CN101477576 B CN 101477576B CN 200910105235 CN200910105235 CN 200910105235 CN 200910105235 A CN200910105235 A CN 200910105235A CN 101477576 B CN101477576 B CN 101477576B
- Authority
- CN
- China
- Prior art keywords
- message
- html
- http message
- user
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the field of telecommunication, and discloses a method for receiving network materials for search engine. The method comprises the following steps: messages from the network side are received, and when the received messages adopt HTTP messages, web page information carried in the messages is obtained and stored. The method can immediately obtain materials after a user accesses the network, so that the database of network materials for search engine is updated in time. The invention further discloses network equipment and a system of providing the network materials for search engine.
Description
Technical field
The present invention relates to the communications field, particularly a kind of method, apparatus and system of obtaining network materials for search engine.
Background technology
Along with popularizing of internet, increasing website appears in the internet world, how to find the content of oneself wanting to become the problem that numerous Internet users are concerned about very much in numerous website.The appearance of search engine has given this problem a good answer.Login these search engines, the keyword searched is wanted in input, and search engine will be searched in the materials database of oneself by this keyword, feeds back to user-dependent webpage or web site url.The quality of search engine quality depends on the method for building up of materials database to a great extent.
The method that early stage search engine is set up materials database is to adopt the mode of artificial input.The benefit of this mode maximum is that Search Results can accomplish controlledly fully, and can adjust searching order by the mode of collecting advertising fee.But the mode efficient of artificial input is very low, has every day thousands of website to occur or disappearance, and the content of existing website is also in continuous renewal, and the mode of artificial input can be missed a lot of emerging websites or webpage.
Along with the development of technology, the materials database of search engine of new generation is set up the technology that has adopted a kind of Web Spider (Spider).This technology is exactly the webpage that goes to visit other website by the server of search engine, and the content of webpage is analyzed, and finds wherein to comprise hyperlink with regard to the webpage from this hyperlink that is dynamically connected.All webpages of visiting in this process all can be recorded in the materials database.The Web Spider technology has realized the robotization that the search engine material is collected, and the promptness of material has had essential raising than the method for artificial input.Yet, adopt the material collection method of Web Spider to be subject to access speed, generally can only accomplish to finish in 1-2 days the renewal of a materials database.And the Web Spider technology is difficult to realization to the identification of class of subscriber, identity, therefore also just can't provide search service targetedly.
Summary of the invention
In view of this, it is a kind of for search engine provides the method, apparatus and system of network materials that embodiments of the invention provide, to solve the slow-footed problem of network materials database update of search engine in the prior art.
A kind of for search engine provides the method for network materials, comprising:
Reception is from the message of network side;
When message is HTML (Hypertext Markup Language) HTTP message, obtain the info web that carries in the HTTP message;
The info web that carries in the storage HTTP message.
A kind of for search engine provides the network equipment of network materials, comprising:
Receiver module is used for receiving the message from network side;
Parsing module when the message that receives when receiver module is the HTTP message, is used for obtaining the info web that the HTTP message carries;
Memory module, the info web that the HTTP message that gets access to for the storing and resolving module carries.
A kind of for search engine provides the system of network materials, comprising:
The network equipment is used for obtaining the info web that the HTTP message from network side carries, and the info web that gets access to is sent to memory device;
Memory device is used for the info web that storage receives.
The method and apparatus that adopts the embodiment of the invention to provide can just get access to material immediately, thereby realize upgrading in time of search engine network materials database behind customer access network.
Description of drawings
A kind of method flow diagram that network materials is provided for search engine that Fig. 1 provides for one embodiment of the invention;
A kind of network equipment that network materials is provided for search engine that Fig. 2 provides for further embodiment of this invention;
Fig. 3 is the structural drawing of parsing module 210 in further embodiment of this invention;
A kind of network equipment that network materials is provided for search engine that Fig. 4 provides for further embodiment of this invention;
A kind of network equipment that network materials is provided for search engine that Fig. 5 provides for further embodiment of this invention;
A kind of system that network materials is provided for search engine that Fig. 6 provides for further embodiment of this invention.
Embodiment
For the purpose, technical scheme and the advantage that make the embodiment of the invention is clearer, below with reference to accompanying drawing, embodiments of the invention are described in further detail.
Http protocol (Hypertext Transfer Protocol, HTML (Hypertext Markup Language)) is to transmit hypertext to user's transportation protocol for the server from the website.It not only guarantees correctly to transmit rapidly hypertext document, can also determine which part in the transferring documents, and which partial content at first shows etc.HTML (Hypertext Mark-up Language, HTML (Hypertext Markup Language)) is the language that is most widely used on the present network, also is the main language that constitutes web document.
Generally, the server of user and website carries out the process of http communication and is:
The user sends the HTTP request to the server of website;
After the server of website is received request, send corresponding http response to the user;
Wherein, not only carry the info web of website in the http response that the server of website returns, also carry user's relevant information.
Therefore, in an embodiment of the present invention, by resolving the HTTP message from network side, obtain the info web that wherein carries, thereby can behind customer access network, just get access to material immediately, thereby realize upgrading in time of search engine network materials database.
As shown in Figure 1, Fig. 1 for one embodiment of the present of invention provide a kind of for search engine provides the method flow diagram of network materials, comprising:
100, receive the message from network side.
110, when message is the HTTP message, obtain the info web that carries in this HTTP message.
On network, http communication usually occurs on TCP/IP (Transmission ControlProtocol//Internet Protocol, the transmission control protocol/Internet protocol) connection, and default port is TCP 80.Therefore, in the present embodiment, by analytic message, obtain source port number and the protocol type of message, when the source port number of message is 80 and the protocol type of message when being TCP, determine that then this message is exactly the HTTP message.Certainly, http protocol also can be finished on other agreement, and other port also is available, and present embodiment repeats no more.
And in the present embodiment, obtain the info web that carries in the message and can realize by HTML head and the HTML tail identified in this HTTP message.
HTML can be understood as a synthetic text of a series of set of tags.Generally, these labels become two and occur, and are called and open label and close label.For example, the heading of webpage is at<head〉label and</head label.The text of webpage is then at<body〉label and</body between the label, that is: anything that shows on the page is included among these two labels.Whole html page is then with<html〉label begins, with</html〉label finishes.Wherein,<head 〉,<body〉and<html〉be and open label,</head 〉,</body〉and</html〉then be and close label.Therefore, in the present embodiment, identify HTML head and HTML tail, that is: identification<html in this HTTP message〉label and</html label, from<html〉label, to</html label, these information in the message are exactly info web.
120, the info web that carries in the storage HTTP message.
In the present embodiment, can classify to the info web of storage.For example: according to source IP address, that is: classify in the IP address of webpage, just can obtain the content approximate with the content that obtains by the Web Spider technology; And, can further obtain other web page contents of this links on web pages by the Web Spider technology.
Certainly, also can classify to info web according to actual conditions, for example, can classify according to the frequency that word in the info web occurs etc.Embodiments of the invention are not done restriction to The classification basis.
Adopt that embodiments of the invention provide for search engine provides the method for network materials, can user's accessed web page after, just get access to material immediately, thereby realize searching for upgrading in time with the Engine-Network materials database.
Preferably, in another embodiment of the present invention, can also obtain user's identification information, and store according to the HTTP message that receives.User's identification information can be user's I P address, also can be user's user name.
For example, carry user's relevant informations such as physical address, IP address (being the purpose IP address in the HTTP message) and port numbers in the HTTP message.After receiving the HTTP message, obtain the purpose IP address of message, just can access user's IP address.Certainly, after receiving the HTTP message, also can mate according to information such as the physical address of user in this HTTP message, IP address, port numbers, thereby obtain user's user name.Herein, user name can be the account number of user's logging in network, also can be user's cell-phone number, can also be other information that can identifying user etc.
After getting access to user's identification information, just can classify according to the user to the info web of storage, obtain a certain specific user's accessed content with this.For example, can know the web page contents of the frequent visit of a certain user, the web page contents of hobby visit etc.Know these information, just can for example, provide advertising message relevant with this user's hobby etc. for this user provides information more targetedly.
Certainly, get access to after user's the identification information, different Search Results can also be provided at different users, thereby realize the search engine functionality based on the user.For example: for high-end business users, if the key word of searching for catering class just provides commercial hotel information preferential ranking results; At low end subscriber, then provide the inexpensive preferential ranking results in restaurant.At teenage user, then can filter unsound information etc.
Adopt that another embodiment of the present invention provides for search engine provides the method for network materials, not only can realize upgrading in time of network materials database, can also realize the search engine service based on the user.
Another embodiment of the present invention provides a kind of for search engine provides the network equipment of network materials, as shown in Figure 2, comprising:
Parsing module 210 when the message that receives when receiver module 200 is the HTTP message, is used for obtaining the info web that this HTTP message carries;
Preferably, in the another embodiment of the present invention, as shown in Figure 3, parsing module 210 comprises:
Recognition unit 211 is for HTML head and the HTML tail of identification HTTP message;
Acquiring unit 212 is used for obtaining the info web that the HTTP message carries, and in the present embodiment, info web comprises HTML head and HTML tail, and the information between HTML head and HTML tail.
Preferably, in the another embodiment of the present invention, as shown in Figure 4, a kind of network equipment that network materials is provided for search engine also comprises except comprising receiver module 200, parsing module 210 and memory module 220:
Preferably, in the another embodiment of the present invention, memory module 220 also is used for storing the HTTP message that receives according to receiver module 200, the user's who gets access to identification information.User's identification information can be user's IP address, also can be user's user name.
For example, carry user's relevant informations such as physical address, IP address (being the purpose IP address in the HTTP message) and port numbers in the HTTP message.After the network equipment receives the HTTP message, obtain the purpose IP address of message, just can access user's IP address.Certainly, after receiving the HTTP message, the network equipment also can mate (in the present embodiment, preserving user's user name in the network equipment) according to information such as the physical address of user in this HTTP message, IP address, port numbers, thereby obtains user's user name.Herein, user name can be the account number of user's logging in network, also can be user's cell-phone number, can also be other information that can identifying user etc.
Preferably, in the another embodiment of the present invention, as shown in Figure 5, a kind ofly for providing the network equipment of network materials, search engine can further include:
In the network of reality, what relate in the embodiments of the invention provides the network equipment of network materials for search engine, it can be the network equipment in the data communication network, router for example, SR (Service Router, business router), BRAS (Broadband RemoteAccess Server, BAS Broadband Access Server) etc.; Also can be the network equipment in the cordless communication network, GGSN (Gateway GPRS Support Node, Gateway GPRS Support Node) equipment etc. for example; Certainly, can also be other network equipments.
Adopt that embodiments of the invention provide for search engine provides the network equipment of network materials, can get access to material at an equipment, realize upgrading in time of search engine network materials database.
Certainly, embodiments of the invention not only can be realized at an equipment, also can realize at a plurality of equipment.As shown in Figure 6, provide a kind of at another embodiment of the present invention and comprised the network equipment 300 and memory device 310 for search engine provides the system of network materials, wherein:
The network equipment 300 is used for obtaining the info web that the HTTP message from network side carries, and the info web that gets access to is sent to memory device 310;
Memory device 310 is used for the info web that storage receives.
Through the above description of the embodiments, those of ordinary skill in the art can be well understood to the embodiment of the invention and can realize by the mode that software adds essential general hardware platform, can certainly realize by hardware.Based on such understanding, the technical scheme of the embodiment of the invention can embody with the form of software product, this computer software product can be stored in the storage medium, as ROM/RAM, magnetic disc, CD etc., comprise that some instructions are with so that computer equipment or server or other network equipments are carried out the described method of some part of each embodiment of the present invention or embodiment.
Being preferred embodiment of the present invention only below, is not for limiting protection scope of the present invention.Within the spirit and principles in the present invention all, any modification of doing, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.
Claims (10)
1. one kind for search engine obtains the method for network materials, it is characterized in that, comprising:
Reception is from the message of network side;
When described message is HTML (Hypertext Markup Language) HTTP message, obtain the info web that carries in the described HTTP message for search engine;
Store the info web that carries in the described HTTP message for described search engine;
According to the frequency of word appearance in the IP address of webpage, the described info web or user's identification information the info web that carries in the stored described HTTP message is classified;
Obtain other web page contents of described links on web pages by the Web Spider technology.
2. the method for claim 1 is characterized in that, in described reception during from the message of network side, judge whether described message is the HTTP message, when the source port number of described message is 80, and the protocol type of described message is when being transmission control protocol TCP, and described message is the HTTP message.
3. method as claimed in claim 2 is characterized in that, the described info web that carries in the described HTTP message that obtains comprises:
Identify HTML (Hypertext Markup Language) HTML head and HTML tail in the described HTTP message;
Obtain the info web that described HTTP message carries, described info web comprises: described HTML head, described HTML tail, and the information between described HTML head and described HTML tail in the described HTTP message.
4. method as claimed in claim 3 is characterized in that, described HTML head is<html〉label, described HTML tail is</html〉label.
5. method as claimed in claim 4 is characterized in that, described method also comprises:
When described message is HTML (Hypertext Markup Language) HTTP message, according to described HTTP message, obtain user's identification information;
Store described user's identification information.
6. method as claimed in claim 5 is characterized in that, described user's identification information comprises: user's IP address or user's user name;
Wherein, described user's user name comprises: the account number of user's logging in network, perhaps user's cell-phone number.
7. one kind for search engine obtains the network equipment of network materials, it is characterized in that, comprising:
Receiver module is used for receiving the message from network side;
Parsing module when the described message that receives when described receiver module is the HTTP message, is used to search engine to obtain the info web that carries in the described HTTP message;
Memory module, the described info web that the described HTTP message that gets access to for the described parsing module of storage carries;
Sort module is used for according to the IP address of webpage, the frequency of described info web word appearance or user's identification information the info web that carries in the stored described HTTP message being classified;
The related pages acquisition module is for other web page contents that obtain described links on web pages by the Web Spider technology.
8. the network equipment as claimed in claim 7 is characterized in that, the described network equipment also comprises:
Judge module is used for judging whether the described message that described receiver module receives is the HTTP message, when the described message that receives when described receiver module is the HTTP message, triggers described parsing module.
9. the network equipment as claimed in claim 8 is characterized in that, described memory module also is used for storing the user's who gets access to according to the described HTTP message that receives identification information.
10. as each described network equipment among the claim 7-9, it is characterized in that described parsing module comprises:
Recognition unit is for HTML head and the HTML tail of the described HTTP message of identification;
Acquiring unit is used for obtaining the info web that described HTTP message carries, and described info web comprises: described HTML head, described HTML tail, and the information between described HTML head and described HTML tail in the described HTTP message.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 200910105235 CN101477576B (en) | 2009-01-20 | 2009-01-20 | Method, equipment and system for providing network materials to search engine |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 200910105235 CN101477576B (en) | 2009-01-20 | 2009-01-20 | Method, equipment and system for providing network materials to search engine |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101477576A CN101477576A (en) | 2009-07-08 |
CN101477576B true CN101477576B (en) | 2013-08-28 |
Family
ID=40838292
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 200910105235 Expired - Fee Related CN101477576B (en) | 2009-01-20 | 2009-01-20 | Method, equipment and system for providing network materials to search engine |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101477576B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8639773B2 (en) * | 2010-06-17 | 2014-01-28 | Microsoft Corporation | Discrepancy detection for web crawling |
CN103235785B (en) * | 2013-03-28 | 2016-02-24 | 四三九九网络股份有限公司 | A kind of method of batch extracting web page resources material |
CN106790105B (en) * | 2016-12-26 | 2020-08-21 | 携程旅游网络技术(上海)有限公司 | Crawler identification interception method and system based on business data |
CN117574010B (en) * | 2023-11-03 | 2024-07-19 | 中信建投证券股份有限公司 | Data acquisition method, device, equipment and storage medium |
-
2009
- 2009-01-20 CN CN 200910105235 patent/CN101477576B/en not_active Expired - Fee Related
Also Published As
Publication number | Publication date |
---|---|
CN101477576A (en) | 2009-07-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102812452B (en) | Be used for system, server, terminal, the method for display buffer webpage and record the computer readable recording medium storing program for performing of the method | |
CN101847160B (en) | Method and device for pushing personalized pages to mobile terminal | |
CA2865187C (en) | Method and system relating to salient content extraction for electronic content | |
CN102521251A (en) | Method for directly realizing personalized search, device for realizing method, and search server | |
CN101986306B (en) | Method and equipment for acquiring yellow page information based on query sequence | |
US20180247035A1 (en) | Method and Apparatus for Identifying User Behavior Object Based on Traffic Analysis | |
JP2009532797A (en) | SYSTEM AND METHOD FOR PROVIDING ADAPTIVE RECOMMENDED WORDS BY USER AND COMPUTER-READABLE RECORDING MEDIUM CONTAINING PROGRAM FOR EXECUTING THE METHOD | |
CN101833570A (en) | Method and device for optimizing page push of mobile terminal | |
CN102750352A (en) | Method and device for classified collection of historical access records in browser | |
CN101542482A (en) | Bookmarks and ranking | |
CN102624756B (en) | Data download terminal and data download method | |
CN103092857A (en) | Method and device for sorting historical records | |
CN101567906A (en) | Method and server for confirming the webpage content language | |
CN102622402B (en) | Server, method and system for providing information search service by using sheaf of pages | |
CN101477576B (en) | Method, equipment and system for providing network materials to search engine | |
CN102348171A (en) | Message processing method and system thereof | |
CN104572719A (en) | Information collecting method and device | |
US20120158796A1 (en) | Method, apparatus and system for generating bookmarks | |
WO2005121982A1 (en) | Information providing system, method, program, information communication terminal, and information display switching program | |
KR101307578B1 (en) | System for supplying a representative phone number information with a search function | |
EP3026567B1 (en) | Method and system for exchanging messages on the basis of current position | |
CN101296201A (en) | Network information sharing method, system and instant communication device | |
CN103365860A (en) | Method, device and terminal for processing web pages | |
CN102447788A (en) | Method and device for reading multimedia message through mobile phone browser | |
US20120173341A1 (en) | Information publishing method, apparatus and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20130828 Termination date: 20180120 |
|
CF01 | Termination of patent right due to non-payment of annual fee |