CN102821088A - System and method for acquiring network data - Google Patents

System and method for acquiring network data Download PDF

Info

Publication number
CN102821088A
CN102821088A CN2012101378742A CN201210137874A CN102821088A CN 102821088 A CN102821088 A CN 102821088A CN 2012101378742 A CN2012101378742 A CN 2012101378742A CN 201210137874 A CN201210137874 A CN 201210137874A CN 102821088 A CN102821088 A CN 102821088A
Authority
CN
China
Prior art keywords
data
webpage
response
server
page
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012101378742A
Other languages
Chinese (zh)
Other versions
CN102821088B (en
Inventor
王彬
徐舟林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201210137874.2A priority Critical patent/CN102821088B/en
Publication of CN102821088A publication Critical patent/CN102821088A/en
Application granted granted Critical
Publication of CN102821088B publication Critical patent/CN102821088B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a system and a method for acquiring network data. The system comprises a sniffer module and an output module, wherein the sniffer module is used for acquiring response data sent to a client by a server from the data forwarded by a network forwarding device, the network forwarding device is used for forwarding the data between the server and the client, and the output module is used for outputting the response data. According to the system and the method for acquiring the network data, efficiencies for acquiring webpage dynamic data can be improved.

Description

Obtain the system and method for network data
Technical field
The present invention relates to technical field of the computer network, relate to a kind of system and method that obtains network data especially.
Background technology
Along with the extensive use of computer network, the amount of information in the computer network also increases day by day.In some application scenarios, require to obtain as far as possible efficiently the information in the webpage.Information in the webpage can be divided into two types, and one type is static data, is included in the hypertext markup language (html) file, gets final product through the downloading page source code.Another kind of is dynamic data, and these data are sightless in the page source code, and for example (Asynchronous JavaScript and XML, AJAX) mode is delivered to browser with the mode that pushes (POST) or asynchronous JavaScript and XML.In correlation technique, thereby the associated script code that utilizes the JavaScript presentation engine to carry out again usually to obtain data obtains execution result, reaches the purpose that grasps dynamic data.This kind mode need be constructed a browser (IE) core major key JavaScript presentation engine; The coding quantities is big; And the code of location related data needs manual intervention processing mode underaction, and is low through repeating the associated script code efficiency, and have repeated labor.Therefore the efficient of obtaining the webpage dynamic data of the prior art is lower.
Summary of the invention
In view of this, the present invention provides a kind of system and method that obtains network data, helps to improve the efficient of obtaining the webpage dynamic data.
For realizing above-mentioned purpose, the invention provides following technical scheme:
A kind of system that obtains network data; Comprise: the packet capturing module; Be used for obtaining the response data that server sends to client from the data that the forwarded device is transmitted, said forwarded device is used between said server and said client, carrying out data forwarding; Output module is used to export said response data.
Alternatively, also comprise the preservation module, be used for preserving the selecteed part of said response data.
Alternatively, also comprise trigger module, analysis module and acquisition module, wherein: trigger module, thus the page-turning button that is used for triggering webpage makes said server send the data of following one page webpage of this webpage; Analysis module; Be used for the solicited message that the webpage after the more said server response page turning submits to and the solicited message of webpage submission before the address of the dynamic data that sends and the said server response page turning and the address of the dynamic data that sends; Confirm a plurality of addresses according to the difference between the address that relatively obtains, the address of the dynamic data that sent when being said servers in response to page turn over operation repeatedly said a plurality of addresses; Acquisition module is used for obtaining the data of said a plurality of addresses; And said preservation module also is used to preserve the data that said acquisition module obtains.
The last page that arrives when alternatively, said analysis module also is used for confirming said repeatedly page turn over operation.
A kind of method of obtaining network data; Be applied to system of the present invention; Said method comprises: from the data that the forwarded device is transmitted, obtain the response data that server sends to client, said forwarded device is used between said server and said client, carrying out data forwarding; Export said response data.
Alternatively, also comprise: preserve selecteed part in the said response data.
Alternatively, the data of being transmitted from the forwarded device, obtain server after the response data that client is sent, said method also comprises: thus the page-turning button that triggers in the webpage makes said server send the data of following one page webpage of this webpage; The solicited message of the webpage submission before the address of the solicited message that the webpage after the more said server response page turning is submitted to and the dynamic data that sends and the said server response page turning and the address of the dynamic data that sends; Confirm a plurality of addresses according to the difference between the address that relatively obtains, the address of the dynamic data that sent when being said servers in response to page turn over operation repeatedly said a plurality of addresses; Obtain data and preservation in said a plurality of address.
Alternatively, the solicited message submitted to of said webpage comprises through ajax mode or the solicited message submitted to through the propelling movement mode.
Alternatively, said response data comprises the text data of JSON form.
According to technical scheme of the present invention; From the data that the forwarded device is transmitted, obtaining response data that server sends to client carries out data analysis then and can obtain through the existing mode data transmitted of ajax mode call function for example; Need not to utilize once more browser to repeat for example JavaScript function of relevant function, perhaps use any JavsScript presentation engine to go the analog mouse click event and repeatedly carry out the JavaScript function in the page.In addition,, only need directly to obtain dynamic data then, help to improve the efficient of obtaining of dynamic data through constructing the URL of dynamic data after a small amount of visit of browser according to technical scheme of the present invention.
Description of drawings
Accompanying drawing is used for understanding better the present invention, does not constitute improper qualification of the present invention.Wherein:
Fig. 1 is the sketch map according to the allocation position of the system that obtains network data in network of the embodiment of the invention;
Fig. 2 is the sketch map according to the element of the system that obtains network data of the embodiment of the invention; And
Fig. 3 is the sketch map according to the course of work of the system that obtains network data of the embodiment of the invention.
Embodiment
Below in conjunction with accompanying drawing example embodiment of the present invention is explained, to help understanding, should it only be exemplary for them to be thought comprising the various details of the embodiment of the invention.Therefore, those of ordinary skills will be appreciated that, can make various changes and modification to the embodiments described herein, and can not deviate from scope of the present invention and spirit.Equally, for clear and simple and clear, omitted description in the following description to known function and structure.
Fig. 1 is the sketch map according to the allocation position of the system that obtains network data in network of the embodiment of the invention.As shown in Figure 1; Be similar to prior art; Client 11 is in the local network 12; Local network 12 networks are connected with server 13, and client 11 is sent for example http protocol access request of access request via local network 12 to server 13, and server 13 returns to client 11 via local network 12 and responds for example http protocol response.Dispose forwarded device 121 in the local network, for example equipment such as gateway, router is used between server 13 and client 11, carrying out data forwarding.
For the safety of local network, network monitoring service is arranged usually in forwarded device 121.Because existing client is usually via the relevant device access server in the local network; So the request msg between client and the server, response data can adopt the mode of existing " packet capturing " to obtain, promptly from the data that local network is transmitted, directly obtain the data that need between client and server from local network.
Therefore as shown in Figure 1, the system that obtains network data 10 of the embodiment of the invention can be arranged in the forwarded device 121.
Fig. 2 is the sketch map according to the element of the system that obtains network data of the embodiment of the invention.As shown in Figure 2, the system that obtains network data 10 of the embodiment of the invention comprises packet capturing module 21 and output module 22 basically.
Packet capturing module 21 is used for obtaining the response data that server sends to client from the data that the forwarded device is transmitted; Output module 22 is used for output response data.
The data that comprise polytype and function in the data that the forwarded device is transmitted; Selected to grasp the response data that server sends to client in the present embodiment; Because dynamic data just is included in this response data,, can carry out data analysis then so can obtain dynamic data through grasping response data; Need not to carry out again the associated script code that obtains data, help to improve the efficient of obtaining dynamic data.
The mode that webpage can adopt the JavaScript function to carry out the ajax submission is sent solicited message is sent appointment with request server dynamic data.Server is after receiving this solicited message, and to the network address that client is returned dynamic data uniform resource position mark URL for example, client can obtain comprise the dynamic data of review information from this URL.
Dynamic data possibly be the data of text, picture, video or extended formatting, so the network data obtained of packet capturing module also possibly include the data of above-mentioned various forms.But in practical application, possibly only need obtain the dynamic data of a certain kind.
For example, in e-commerce field, the buyer is when the comment commodity, and the content of input comment is submitted to server then in the list of the page, and client can be seen these comments after opening webpage.Review information is a kind of dynamic data; It can reflect the sales situation and the prospect of commodity from a certain angle; Be the content that often need be concerned about, and the packet capturing module grasp all packets in this time period usually on a time period, wherein includes various data; Except the html file that has comprised comment text, files such as picture can also be arranged.So preferred mode should be that packet capturing module captured packets is made screening.This work can manually be accomplished by the network analysis personnel.The system that obtains network data 10 of the embodiment of the invention can also comprise preservation module (not shown), is used for preserving the selecteed part of response data.
The pairing rendering content of some dynamic datas in the webpage might be divided into multipage and show, on the page, presents page-turning button and supplies user's page turn over operation.For example review information has multipage usually after reaching some, can present 1,2,3 page numbers such as grade and " following one page " or similar page-turning button in the page.
In order to realize that more easily thereby automatic page turning gets access to the dynamic data in the page turning page afterwards, in the present embodiment, the system 10 that obtains network data can also comprise trigger module, analysis module and acquisition module (not shown).
Trigger module, thus the page-turning button that is used for triggering webpage makes server send the data of following one page webpage of this webpage; Analysis module; Be used for the solicited message that the webpage after the comparison server response page turning submits to and the solicited message of webpage submission before the address of the dynamic data that sends and the server response page turning and the address of the dynamic data that sends; Confirm a plurality of addresses according to the difference between the address that relatively obtains, the address of the dynamic data that sent when being servers in response to page turn over operation repeatedly a plurality of addresses; Acquisition module is used for obtaining the data of said a plurality of addresses.
Like this, preserve module and also can be used for preserving the data that acquisition module obtains.
Below confirm that for analysis module the address of dynamic data explains for example again.
For example: the URL of the dynamic data before the page turning is:
http://club.360buy.com/clubservice/productcomment-570142-0-0.html
The URL of the dynamic data after the page turning is:
http://club.360buy.com/clubservice/productcomment-570142-0-1.html
Analysis module carries out text relatively to the two, finds that difference only is the position, end in " 570142-0-0 ".Therefore in above-mentioned URL, change the URL that this last figure can obtain the dynamic data after the multipage page turning.
And for example: the URL of the dynamic data before the page turning is:
http://www.suning.com/emall/SNMoreCommentView?productId=1123906&catalogId=10051&storeId=10052
The URL of the dynamic data after the page turning is:
http://www.suning.com/emall/SNMemberTestMulitePage?catalogId=10051&storeId=10052&productId=1123906&langId=-7&typeFlg=all&pag?eNumber=2&pageSize=10&sortType=%E5%85%A8%E9%83%A8%E8%AF%84%E4%BB%B7(51)%E5%A5%BD%E8%AF%84(44)%E4%B8%AD%E8%AF%84(5)%E5%B7%AE%E8%AF%84(2)
Analysis module finds that relatively the field of storeId back is identical, and identifies keyword pageNumber among the URL that after page turning, obtains, and this keyword can obtain from the key word library that prestores.Analysis module just can draw the URL that continues dynamic data after the page turning backward according to the URL that obtains after the page turning and keyword like this.
Can find out from above example; Analysis module can directly be determined the URL address according to comparative result; Comprise the dynamic data after the page turning in this address; Then directly from this URL, obtain dynamic data, need not to trigger repeatedly the page turning knob in the webpage, thereby help to improve the extracting efficient of dynamic data.
The last page that arrives when in addition, analysis module can also be confirmed repeatedly page turn over operation according to the sign of the last page in the webpage.After arriving last page, promptly no longer construct new URL to obtain down the dynamic data in one page according to current URL rule.
Below explain for the course of work of the system that obtains network data in the embodiment of the invention.
Fig. 3 is the sketch map according to the course of work of the system that obtains network data of the embodiment of the invention.As shown in Figure 3, the basic process of the system that obtains network data of the embodiment of the invention when work is following:
Step S31: from the data that the forwarded device is transmitted, obtain the response data that server sends to client;
Step S32: output response data.
After step S32, the network analysis personnel can screen the response data of output.The system that obtains network data can preserve selecteed part in the response data.
After the step S31, can also obtain the dynamic data after the page turning, promptly carry out following steps:
Thereby the page-turning button that triggers in the webpage makes server send the data of following one page webpage of this webpage; The relatively solicited message of the webpage submission before the address of the solicited message submitted to of the webpage after the server response page turning and the dynamic data that sends and the server response page turning and the address of the dynamic data that sends; Confirm a plurality of addresses according to the difference between the address that relatively obtains, the address of the dynamic data that sent when being said server in response to page turn over operation repeatedly a plurality of addresses here; Obtain data and preservation in said a plurality of address.
The solicited message that webpage is submitted to can be through ajax mode or the solicited message submitted to through the propelling movement mode, or the solicited message submitted to of other modes.In the present embodiment, the response data that obtains for any request mode can be obtained.Response data can be the text data of JSON form, and for example review information can adopt this data format, and current response data also can be the data of extended formatting.
Technical scheme according to the embodiment of the invention; From the data that the forwarded device is transmitted, obtaining response data that server sends to client carries out data analysis then and can obtain through the existing mode data transmitted of ajax mode call function for example; Need not to utilize once more browser to repeat for example JavaScript function of relevant function, perhaps use any JavsScript presentation engine to go the analog mouse click event and repeatedly carry out the JavaScript function in the page.In addition,, only need directly to obtain dynamic data then, help to improve the efficient of obtaining of dynamic data through constructing the URL of dynamic data after a small amount of visit of browser according to the technical scheme of present embodiment.
More than combine specific embodiment to describe basic principle of the present invention; But; It is to be noted; As far as those of ordinary skill in the art, can understand whole or any step or the parts of method and apparatus of the present invention, can be in the network of any calculation element (comprising processor, storage medium etc.) or calculation element; Realize that with hardware, firmware, software or their combination this is that those of ordinary skills use their basic programming skill just can realize under the situation of having read explanation of the present invention.
Therefore, the object of the invention can also be realized through program of operation or batch processing on any calculation element.Said calculation element can be known fexible unit.Therefore, the object of the invention also can be only through providing the program product that comprises the program code of realizing said method or device to realize.That is to say that such program product also constitutes the present invention, and the storage medium that stores such program product also constitutes the present invention.Obviously, said storage medium can be any storage medium that is developed in any known storage medium or future.
It is pointed out that also that in apparatus and method of the present invention obviously, each parts or each step can decompose and/or reconfigure.These decomposition and/or reconfigure and to be regarded as equivalents of the present invention.And, carry out the step of above-mentioned series of processes and can order following the instructions naturally carry out in chronological order, but do not need necessarily to carry out according to time sequencing.Some step can walk abreast or carry out independently of one another, for example, to original vision content carry out the step of colour correction and the image that photographs carried out the step of geometric correction can be sequentially, concurrently or carry out independently with any order.
Above-mentioned embodiment does not constitute the restriction to protection range of the present invention.Those skilled in the art should be understood that, depend on designing requirement and other factors, and various modifications, combination, son combination and alternative can take place.Any modification of within spirit of the present invention and principle, being done, be equal to replacement and improvement etc., all should be included within the protection range of the present invention.

Claims (9)

1. a system that obtains network data is characterized in that, comprising:
The packet capturing module is used for obtaining the response data that server sends to client from the data that the forwarded device is transmitted, and said forwarded device is used between said server and said client, carrying out data forwarding;
Output module is used to export said response data.
2. system according to claim 1 is characterized in that, also comprises the preservation module, is used for preserving the selecteed part of said response data.
3. system according to claim 1 and 2 is characterized in that, also comprises trigger module, analysis module and acquisition module, wherein:
Trigger module, thus the page-turning button that is used for triggering webpage makes said server send the data of following one page webpage of this webpage;
Analysis module; Be used for the solicited message that the webpage after the more said server response page turning submits to and the solicited message of webpage submission before the address of the dynamic data that sends and the said server response page turning and the address of the dynamic data that sends; Confirm a plurality of addresses according to the difference between the address that relatively obtains, the address of the dynamic data that sent when being said servers in response to page turn over operation repeatedly said a plurality of addresses;
Acquisition module is used for obtaining the data of said a plurality of addresses;
And said preservation module also is used to preserve the data that said acquisition module obtains.
4. system according to claim 3 is characterized in that, the last page that said analysis module arrives when also being used for confirming said repeatedly page turn over operation.
5. a method of obtaining network data is applied to each described system in the claim 1 to 4, it is characterized in that said method comprises:
From the data that the forwarded device is transmitted, obtain the response data that server sends to client, said forwarded device is used between said server and said client, carrying out data forwarding;
Export said response data.
6. method according to claim 5 is characterized in that, also comprises: preserve selecteed part in the said response data.
7. method according to claim 5 is characterized in that, the data of being transmitted from the forwarded device, obtains server after the response data that client is sent, and said method also comprises:
Thereby the page-turning button that triggers in the webpage makes said server send the data of following one page webpage of this webpage;
The solicited message of the webpage submission before the address of the solicited message that the webpage after the more said server response page turning is submitted to and the dynamic data that sends and the said server response page turning and the address of the dynamic data that sends; Confirm a plurality of addresses according to the difference between the address that relatively obtains, the address of the dynamic data that sent when being said servers in response to page turn over operation repeatedly said a plurality of addresses;
Obtain data and preservation in said a plurality of address.
8. according to the described method of claim 5,6 or 7, it is characterized in that the solicited message that said webpage is submitted to comprises through ajax mode or the solicited message submitted to through the propelling movement mode.
9. according to the described method of claim 5,6 or 7, it is characterized in that said response data comprises the text data of JSON form.
CN201210137874.2A 2012-05-07 2012-05-07 Obtain the system and method for network data Active CN102821088B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210137874.2A CN102821088B (en) 2012-05-07 2012-05-07 Obtain the system and method for network data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210137874.2A CN102821088B (en) 2012-05-07 2012-05-07 Obtain the system and method for network data

Publications (2)

Publication Number Publication Date
CN102821088A true CN102821088A (en) 2012-12-12
CN102821088B CN102821088B (en) 2015-12-16

Family

ID=47304947

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210137874.2A Active CN102821088B (en) 2012-05-07 2012-05-07 Obtain the system and method for network data

Country Status (1)

Country Link
CN (1) CN102821088B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107659544A (en) * 2016-08-26 2018-02-02 平安科技(深圳)有限公司 Using merging deployment system and method

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070239701A1 (en) * 2006-03-29 2007-10-11 International Business Machines Corporation System and method for prioritizing websites during a webcrawling process
CN101101601A (en) * 2007-07-10 2008-01-09 北京大学 Subject crawling method based on link hierarchical classification in network search
CN101140574A (en) * 2006-09-05 2008-03-12 腾讯科技(深圳)有限公司 Web page content revealing method and customer terminal device
CN101441662A (en) * 2008-11-28 2009-05-27 北京交通大学 Topic information acquisition method based on network topology
CN101753566A (en) * 2009-12-25 2010-06-23 北京畅游天下网络技术有限公司 Multi-application inter-system data application method and system
CN102087648A (en) * 2009-12-03 2011-06-08 北京大学 Method and system for fetching news comment page
CN102122283A (en) * 2010-01-07 2011-07-13 宏碁股份有限公司 Method for turning web pages and electronic device
CN102123168A (en) * 2011-01-14 2011-07-13 广州市动景计算机科技有限公司 Web page pre-reading and integration method and system based on relay server
CN102239680A (en) * 2011-03-09 2011-11-09 华为技术有限公司 Method and device for web application hosting

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070239701A1 (en) * 2006-03-29 2007-10-11 International Business Machines Corporation System and method for prioritizing websites during a webcrawling process
CN101140574A (en) * 2006-09-05 2008-03-12 腾讯科技(深圳)有限公司 Web page content revealing method and customer terminal device
CN101101601A (en) * 2007-07-10 2008-01-09 北京大学 Subject crawling method based on link hierarchical classification in network search
CN101441662A (en) * 2008-11-28 2009-05-27 北京交通大学 Topic information acquisition method based on network topology
CN102087648A (en) * 2009-12-03 2011-06-08 北京大学 Method and system for fetching news comment page
CN101753566A (en) * 2009-12-25 2010-06-23 北京畅游天下网络技术有限公司 Multi-application inter-system data application method and system
CN102122283A (en) * 2010-01-07 2011-07-13 宏碁股份有限公司 Method for turning web pages and electronic device
CN102123168A (en) * 2011-01-14 2011-07-13 广州市动景计算机科技有限公司 Web page pre-reading and integration method and system based on relay server
CN102239680A (en) * 2011-03-09 2011-11-09 华为技术有限公司 Method and device for web application hosting

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107659544A (en) * 2016-08-26 2018-02-02 平安科技(深圳)有限公司 Using merging deployment system and method

Also Published As

Publication number Publication date
CN102821088B (en) 2015-12-16

Similar Documents

Publication Publication Date Title
CN104915398B (en) A kind of webpage buries method and device a little
CN105578488B (en) Network data acquisition system and method
US10795629B2 (en) Text and custom format information processing method, client, server, and computer-readable storage medium
CN102761554B (en) Method, device and system for pushing information to client
CN107967143A (en) Obtain the methods, devices and systems of the update instruction information of client application source code
CN103810268B (en) Search result recommendation information loading method, device and system and URL detection method, device and system
CN104283723A (en) Network access log processing method and device
US8868646B2 (en) Apparatus and method for generating virtual game clients
CN110688598A (en) Service parameter acquisition method and device, computer equipment and storage medium
WO2013013556A1 (en) Data reporting method and device
CN103246699A (en) Method and device for data access control based on browser
CN103618773A (en) Display method, device and system for thermodynamic diagrams
US20150095487A1 (en) Third-party link tracker system and method
CN104954501A (en) Cross-domain information interactive method, device thereof and system thereof
CN104394041A (en) Access log generation method and device
JP2016507803A (en) Homepage forming method, peripheral device, and homepage forming system
CN102810110B (en) Obtain the method and system of network text data
CN104361004B (en) The processing method and browser of browser collection folder data
JP4503464B2 (en) Content relay server, content distribution system, and content relay method
CN103634338A (en) Method for modifying primary domain name of webpage online, data processing device and system
CN102821088A (en) System and method for acquiring network data
CN113542416B (en) Message receiving and sending method and device
CN105959344B (en) web pushing method and device
CN109791563A (en) Information Collection System, formation gathering method and recording medium
CN100592300C (en) Data display method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant