CN102710748B - Data capture method, system and equipment - Google Patents

Data capture method, system and equipment Download PDF

Info

Publication number
CN102710748B
CN102710748B CN201210133394.9A CN201210133394A CN102710748B CN 102710748 B CN102710748 B CN 102710748B CN 201210133394 A CN201210133394 A CN 201210133394A CN 102710748 B CN102710748 B CN 102710748B
Authority
CN
China
Prior art keywords
data
resource
web page
proxy server
page address
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210133394.9A
Other languages
Chinese (zh)
Other versions
CN102710748A (en
Inventor
王一磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201210133394.9A priority Critical patent/CN102710748B/en
Publication of CN102710748A publication Critical patent/CN102710748A/en
Application granted granted Critical
Publication of CN102710748B publication Critical patent/CN102710748B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a kind of data capture method, system and equipment, belong to computer network field.Described method comprises: receive the data acquisition request from client, and described data acquisition request comprises web page address; Set up at least two data channels with web page server to walk abreast the data resource captured corresponding to described web page address; The described data resource grabbed is pushed to described client.The present invention is by utilize proxy server to set up data resource that many data channel capture webpage, then active push is to the scheme of client, solve client when obtaining network data, waste flow and response time longer problem, reach client and only need initiation data acquisition request, just can obtain all data resources of webpage fast to be shown to the effect of user.

Description

Data capture method, system and equipment
Technical field
The present invention relates to computer network field, particularly a kind of data capture method, system and equipment.
Background technology
Along with the fast development of mobile Internet, the mobile terminal viewing network data of such as smart mobile phone, panel computer and so on are used to become people's requisite part in daily life.
To use the browser on smart mobile phone to surf the web, a kind of data capture method for obtaining web data existed in prior art is: first, smart mobile phone and server set up a TCP (TransmissionControlProtocol, transmission control protocol) channel, then utilize this TCP channel to initiate the data acquisition request of a web page address to server; The second, server returns the primary resource of this web page address to smart mobile phone, and primary resource simply can be interpreted as the related data of the main contents for showing this webpage; 3rd, this primary resource is carried out buffer memory by smart mobile phone, and resolves this primary resource to judge whether webpage also has child resource, and child resource can be Javascript script, picture or music etc.; 4th, if smart mobile phone judges that this webpage also has child resource, then smart mobile phone needs to re-establish a TCP channel, because a TCP channel same time can only be a data acquisition request service, then smart mobile phone utilizes newly-built TCP channel to initiate the data acquisition request of child resource to server; 5th, server returns the primary resource of this child resource to smart mobile phone; ,; 6th, when webpage comprises multiple child resource, repeat the acquisition process of above-mentioned child resource, until the primary resource of this webpage and all child resources are all acquired smart mobile phone this locality, then smart mobile phone shows this webpage.
There is following problem in prior art: the first, and smart mobile phone is in the process of an acquisition web data, and often acquisition child resource all will re-establish TCP channel and send and once ask, thus causes very wasting flow; The second, when a webpage comprises multiple child resource, need repeatedly data acquisition, response time is longer, adding mobile network itself postpones just higher, make a webpage from get the time that final display successfully needs to grow very much, Consumer's Experience is poor.
Summary of the invention
In order to solve client when obtaining network data, waste flow and response time longer problem, embodiments provide a kind of data capture method, system and equipment.Described technical scheme is as follows:
According to an aspect of the present invention, the embodiment of the present invention provides a kind of data capture method, and in proxy server, described method comprises:
Receive the lasting Path Setup request from client;
Lasting passage is set up, to receive at least one data acquisition request from described client with described client;
Described lasting passage is maintained by heartbeat signal and described client;
Receive the data acquisition request from described client, described data acquisition request comprises web page address;
Set up at least two data channels with web page server to walk abreast the data resource captured corresponding to described web page address, described data resource comprises primary resource and child resource, and every bar data channel is for capturing a primary resource or a child resource;
The described data resource grabbed is pushed to described client;
Describedly set up at least two data channels with web page server and to walk abreast the data resource captured corresponding to described web page address, comprising:
A transmission control protocol TCP passage is set up to obtain described primary resource with described web page server;
When determining all data resources corresponding to described web page address and also comprising multiple described child resource, set up many described TCP channel respectively and walk abreast and obtain described child resource corresponding to described web page address.
According to a further aspect in the invention, the embodiment of the present invention also provides a kind of data capture method, and in client, described method comprises:
Lasting Path Setup request is sent to proxy server;
Lasting passage is set up, to utilize described lasting passage to send at least one data acquisition request to described proxy server with described proxy server;
Described lasting passage is maintained by heartbeat signal and described proxy server;
Receive the web access requests from user, described web access requests comprises web page address;
Data acquisition request is sent to described proxy server according to described web page address;
The data resource that the described web page address that reception is pushed by described proxy server is corresponding, described data resource is that described proxy server and web page server set up at least two data channels to walk abreast the data resource corresponding to described web page address grabbed, described data resource comprises primary resource and child resource, and every bar data channel is for capturing a primary resource or a child resource; To be described proxy server obtained by the transmission control protocol TCP passage set up with described web page server described primary resource; Described child resource is all data resources that described proxy server judges corresponding to described web page address when also comprising multiple described child resource, walks abreast obtain by setting up many described TCP channel respectively with described web page server.
In accordance with a further aspect of the present invention, the embodiment of the present invention additionally provides a kind of proxy server, and described proxy server comprises:
Signal receiving module, for receiving the lasting Path Setup request from client;
Path Setup module, for setting up lasting passage with described client, to receive at least one data acquisition request from described client;
Passage maintains module, for maintaining described lasting passage by heartbeat signal and described client;
Request receiving module, for receiving the data acquisition request from described client, described data acquisition request comprises web page address;
Resource handling module, to walk abreast the data resource captured corresponding to described web page address for setting up at least two data channels with web page server, described data resource comprises primary resource and child resource, and every bar data channel is for capturing a primary resource or a child resource;
Resource supplying module, for being pushed to described client by the data resource grabbed;
Described resource handling module, also for setting up a transmission control protocol TCP passage to obtain described primary resource with described web page server; When determining all data resources corresponding to described web page address and also comprising multiple described child resource, set up many described TCP channel respectively and walk abreast and obtain described child resource corresponding to described web page address.
According to another aspect of the invention, the embodiment of the present invention also provides a kind of client, and described client comprises:
Set up request sending module, for sending lasting Path Setup request to proxy server;
Lasting Path Setup module, for setting up lasting passage with described proxy server, to utilize described lasting passage to send at least one data acquisition request to described proxy server;
Lasting passage maintains module, for maintaining described lasting passage by heartbeat signal and described proxy server;
Web-page requests receiver module, for receiving the web access requests from user, described web access requests comprises web page address;
Obtain request sending module, for sending data acquisition request according to described web page address to described proxy server;
Data resource receiver module, for receiving all data resources corresponding to the described web page address that pushed by described proxy server, described data resource is that described proxy server and web page server set up at least two data channels to walk abreast the data resource corresponding to described web page address grabbed, described data resource comprises primary resource and child resource, and every bar data channel is for capturing a primary resource or a child resource; To be described proxy server obtained by the transmission control protocol TCP passage set up with described web page server described primary resource; Described child resource is all data resources that described proxy server judges corresponding to described web page address when also comprising multiple described child resource, walks abreast obtain by setting up many described TCP channel respectively with described web page server.
According to also one side of the present invention, the embodiment of the present invention also provides a kind of data-acquisition system, and it comprises the proxy server provided in the embodiment of the present invention, and the client provided in the embodiment of the present invention.
The beneficial effect that the technical scheme that the embodiment of the present invention provides is brought is:
The data resource that many data channel capture webpage is set up by utilizing proxy server, then active push is to the scheme of client, solve client when obtaining network data, waste flow and response time longer problem, reach client and only need initiation data acquisition request, just can obtain all data resources of webpage fast to be shown to the effect of user.
Accompanying drawing explanation
In order to be illustrated more clearly in the technical scheme in the embodiment of the present invention, below the accompanying drawing used required in describing embodiment is briefly described, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
Fig. 1 is the structural representation of the implementation environment involved by the embodiment of the present invention;
Fig. 2 is the method flow diagram of the data capture method that the embodiment of the present invention one provides;
Fig. 3 is the method flow diagram of the data capture method that the embodiment of the present invention two provides;
Fig. 4 is the method flow diagram of the data capture method that the embodiment of the present invention three provides;
Fig. 5 is the block diagram of the proxy server that the embodiment of the present invention four provides;
Fig. 6 is another block diagram of the proxy server that the embodiment of the present invention four provides;
Fig. 7 is a block diagram again of the proxy server that the embodiment of the present invention four provides;
Fig. 8 is the another block diagram of the proxy server that the embodiment of the present invention four provides;
Fig. 9 is the block diagram of the client that the embodiment of the present invention five provides;
Figure 10 is the another block diagram of the client that the embodiment of the present invention five provides;
Figure 11 is the block diagram of the data-acquisition system that the embodiment of the present invention six provides.
Embodiment
For making the object, technical solutions and advantages of the present invention clearly, below in conjunction with accompanying drawing, embodiment of the present invention is described further in detail.
Please first with reference to figure 1, it illustrates the structural representation of the implementation environment involved by the embodiment of the present invention.This implementation environment not only comprises client 110 and the web page server 120 of user's use, also includes the proxy server 130 for bridge joint client 110 and web page server 120.
Wherein, the client 110 that user uses can be the mobile terminal of such as mobile phone, panel computer and super portable PC and so on, and user can use this mobile terminal to visit network.
Namely web page server 120, for providing web services, also mainly provides all data resources of webpage to client 110.
Proxy server 130 can be connected with web page server 120 with client 110 respectively, for replacing client 110 to obtain data in web page server 120, then the data got is supplied to client 110.This proxy server 130 can be a server, also can be a server cluster, or a cloud computing center.
Please refer to Fig. 2, it illustrates the method flow diagram of the data capture method that the embodiment of the present invention one provides.This data capture method goes in proxy server shown in Fig. 1, and also namely the present embodiment mainly describes with proxy server side, and this data capture method can comprise:
Step 202, receives the data acquisition request from client, and this data acquisition request comprises web page address;
Proxy server can receive the data acquisition request from client, this data acquisition request can be HTTP (HypertextTransferProtocol, HTML (Hypertext Markup Language)) request, web page address can be comprised in this data acquisition request, web page address can use URL (UniversalResourceLocator, URL(uniform resource locator)) represent, such as: a web page address is http://www.xxx.com/index.html.
Step 204, sets up at least two data channels with web page server and to walk abreast the data resource captured corresponding to web page address;
Proxy server can be set up some data channels with web page server and to walk abreast the data resource captured corresponding to web page address, and wherein, data resource generally includes primary resource and child resource.That is, when the data resource corresponding to a web page address comprises a primary resource and multiple child resource, proxy server can set up some data channels with web page server simultaneously, every bar data channel is used for acquisition primary resource or a child resource, and then proxy server can get all data resources corresponding to a web page address concurrently by these some data channels.
Primary resource can be HTML (HypertextMarkupLanguage, the HTML) content described by speech like sound, is mainly used in body matter or the framework of a display webpage.Child resource can be a pictures, a music file or JS script (also i.e. Javascript script), and its major part is all the content or data that directly can not be described by HTML speech like sound.Under normal circumstances, a webpage is made up of a primary resource and several child resources.
Step 206, is pushed to client by the data resource grabbed.
The data resource grabbed can be pushed to client after all data resources corresponding to this web page address are all grabbed local cache by proxy server.Now, client just can obtain the data resource corresponding to this web page address.
In sum, the data capture method that the embodiment of the present invention one provides walks abreast capture the data resource of webpage by utilizing proxy server to set up many data channel, then active push is to the scheme of client, solve client when obtaining network data, waste flow and response time longer problem, reach client and only need initiation data acquisition request, just can obtain all data resources of webpage fast to be shown to the effect of user.
Please refer to Fig. 3, it illustrates the method flow diagram of the data capture method that the embodiment of the present invention two provides.This data capture method goes in proxy server shown in Fig. 1, and also namely the present embodiment mainly describes with proxy server side, and this data capture method can comprise:
Step 301, receives the lasting Path Setup request from client;
When browser in client has just started to start, just can send a lasting Path Setup request to proxy server, proxy server can receive the lasting Path Setup request from client.
Step 302, sets up lasting passage with client, to receive at least one data acquisition request from client, and maintains lasting passage by heartbeat signal and client;
Proxy server, after receiving the lasting Path Setup request from client, can set up lasting passage with client, and maintains this lasting passage by heartbeat signal and client.Lasting passage can refer to that the TCP of a long period of maintenance connects, and this TCP connects and supports multiplexed, and also namely connecting at the TCP of a full duplex can the simultaneously multiple HTTP request of parallel transmission.After this TCP connection is created, proxy server can utilize this TCP to connect the HTTP request monitoring client.The implementation procedure of this step, can come by realizing new agreement at TCP layer and HTTP layer.
Step 303, receives the data acquisition request from client, and this data acquisition request comprises web page address;
Proxy server can receive the data acquisition request from client, this data acquisition request can be HTTP request, web page address can be comprised in this data acquisition request, web page address can use URL to represent, such as: a web page address is http://www.xxx.com/index.html.
Step 304, set up at least two data channels with web page server and to walk abreast the data resource captured corresponding to web page address, data resource comprises primary resource or child resource, and every bar data channel is for capturing a primary resource or a child resource;
Proxy server can be set up some data channels with web page server and to walk abreast the data resource captured corresponding to web page address, and wherein, data resource comprises primary resource and child resource.That is, when the data resource corresponding to a web page address comprises a primary resource and multiple child resource, proxy server can set up some data channels with web page server simultaneously, every bar data channel is used for acquisition primary resource or a child resource, and then proxy server can get all data resources corresponding to a web page address concurrently by these some data channels.This data channel can refer to the TCP channel set up between proxy server and web page server, such as, first proxy server can set up a TCP channel to obtain primary resource with web page server, then, when determining all data resources corresponding to web page address and also comprising multiple child resource, set up many TCP channel respectively and walk abreast and obtain child resource corresponding to web page address.Because client has the restriction can only setting up 6 TCP channel under same domain name usually, and proxy server can be optimized based on stronger disposal ability and the network bandwidth, makes to be increased to 20 with the concurrent TCP channel of web page server.So proxy server can get all data resources corresponding to a web page address with speed quickly.
Primary resource can be HTML (HypertextMarkupLanguage, the HTML) content described by speech like sound, is mainly used in body matter or the framework of a display webpage.Child resource can be a pictures, a music file or JS script (also i.e. Javascript script), and its major part is all the content or data that directly can not be described by HTML speech like sound.Under normal circumstances, a webpage is made up of a primary resource and several child resources.
Step 305, whether the primary resource that judgement gets or child resource comprise script data, if so, then enter step 306; If not, then directly step 307 is entered;
When proxy server is by a data channel, when getting a primary resource or the child resource of all data resources corresponding to a web page address, proxy server can judge whether comprise script data in this primary resource or child resource.This script data can refer to Javascript script.
Step 306, if the primary resource got or child resource comprise script data, then pre-execution script data;
If proxy server determines in primary resource or child resource comprise script data, then client can be replaced to perform this script data in advance.Such client carries out executable operations with regard to not needing to this part script data, significantly alleviates the workload of client, can improve the loading velocity of webpage further.
Step 307, is cached to local cache by the primary resource corresponding to the web page address got or child resource;
When proxy server is by a data channel, after the primary resource getting all data resources corresponding to a web page address or child resource, this primary resource or child resource can be cached in local cache.Local cache can be realized by the memory of the fair speed such as internal memory and Flash, also can be realized by the memory compared with low velocity such as hard disk or distributed memory system.
More preferably, when this primary resource or child resource are cached to local cache, can specifically comprise:
The all data resources corresponding to a web page address are carried out buffer memory, and the unique identification obtained after doing Hash process according to web page address stores.A visit capacity is arranged for each unique identification simultaneously.
Whether step 308, be cached to the primary resource corresponding to web page address or child resource in monitoring local cache;
Proxy server can monitor in local cache whether be cached to primary resource corresponding to web page address or child resource.In other words, be cached to a primary resource or child resource in step 307, proxy server can detect this change in real time at every turn.Such as, a message queue is set, when being cached to a primary resource or child resource, send a message to this message queue, proxy server can learn local cache from this message queue at once be cached to primary resource corresponding to web page address or child resource at every turn.
Step 309, if the primary resource be cached in local cache corresponding to web page address or child resource, then propelling movement primary resource or child resource are to client.
After proxy server listens in local cache and is cached to primary resource corresponding to web page address or child resource, then this primary resource of active push or child resource are to client.When all data resources corresponding to a web page address comprise a primary resource and multiple child resource, step 305 will repeatedly to step 309, also after namely proxy server often gets a primary resource or child resource, just carry out a step 305 to step 309, this primary resource or child resource are pushed to client, until all data resources corresponding to this web page address are all pushed to client.
Obviously, in above process, client only needs to send to proxy server in step 303 once to comprise the HTTP request of web page address, then just always Receiving Agent server push with all data resources corresponding to this web page address.The disposal ability utilizing proxy server stronger and higher high speed bandwidth, can get web data very fast and show.
Also it should be added that, in step 307, the data resource that a web page address can get by proxy server carries out buffer memory.The unique identification that now this web page address can be carried out obtaining after Hash process carrys out buffer memory as storage index, and corresponding and each unique identification arranges a record simultaneously: pushed number of times.Moreover, local cache also can simultaneously by the high-speed memory of such as internal memory or flash memory and so on and the slow memory of such as hard disk or distributed memory system and so in conjunction with realization.
In such cases, in step 303, after proxy server receives the data acquisition request of different clients, first according to the web page address within this data acquisition request, in local cache, first search the buffer memory of all data resources whether existed corresponding to this web page address, if any, leap in step 309, the data resource corresponding to this web page address is pushed to client.Such as, a lot of client more intensively may access the news pages corresponding to same web page address within a period of time.When data resource in local cache is pushed to client by proxy server at every turn, pushed number of times corresponding to this web page address can be added 1, then whether proxy server can also exceed reservation threshold according to the pushed number of times of some web page addresses, select the data resource corresponding to this web page address to be cached in Cache, be still stored in low speed buffer.
In sum, the data capture method that the embodiment of the present invention two provides walks abreast capture the data resource of webpage by utilizing proxy server to set up many data channel, then active push is to the scheme of client, solve client when obtaining network data, waste flow and response time longer problem, reach client and only need initiation data acquisition request, just can obtain all data resources of webpage fast to be shown to the effect of user.Meanwhile, the method that the present embodiment provides also by increasing the mechanism of lasting passage mechanism, script data pre-execution mechanism and employing Cache cache web pages data, reaches the effect that web data can be pushed to client by proxy server quickly.
Please refer to Fig. 4, it illustrates the method flow diagram of the data capture method that the embodiment of the present invention three provides.This data capture method goes in client shown in Fig. 1, and also namely the present embodiment mainly describes with client side, and this data capture method can comprise:
Step 401, sends lasting Path Setup request to proxy server;
When browser in client has just started to start, just can send a lasting Path Setup request to proxy server, proxy server can receive the lasting Path Setup request from client.
Step 402, sets up lasting passage with proxy server, to utilize lasting passage to send at least one data acquisition request to proxy server, and maintains lasting passage by heartbeat signal and proxy server;
Proxy server, after receiving the lasting Path Setup request from client, can feed back and whether accept this lasting Path Setup request.If accepted, then client can set up lasting passage with proxy server, and maintains this lasting passage by heartbeat signal and proxy server.Lasting passage can refer to that the TCP of a long period of maintenance connects, and this TCP connects and supports multiplexed, and also namely connecting at the TCP of a full duplex can the simultaneously multiple HTTP request of parallel transmission.After this TCP connection is created, proxy server can utilize this TCP to connect the HTTP request monitoring client.The implementation procedure of this step, can come by realizing new agreement at TCP layer and HTTP layer.
Step 403, receive the web access requests from user, web access requests comprises web page address;
When using the browser in client as user, can initiate web access requests by this browser to client, include web page address in this web access requests, client can receive the web access requests from user.
Step 404, sends data acquisition request according to web page address to proxy server;
After client receives the web access requests from user, the lasting passage set up in advance in step 402 can be utilized to send data acquisition request to proxy server, and this data acquisition request comprises web page address.
Because lasting passage needs to be maintained by heartbeat signal, may occur the situation of lasting channel failure, for this reason, this step can also specifically comprise:
The first, judge whether there is lasting passage;
Client, after receiving the web access requests from user, has first judged whether and has had lasting passage between proxy server.
The second, if there is lasting passage, then continue to judge that whether lasting passage is normal;
If client has judged and had lasting passage between proxy server, then continue to judge that whether lasting passage is normal.
3rd, if lasting passage is normal, then utilize lasting passage to send data acquisition request to proxy server, this data acquisition request comprises web page address.
If client continues to judge that lasting passage is normal, just utilize lasting passage to send data acquisition request to proxy server, this data acquisition request comprises web page address.
Step 405, receives the data resource that the web page address that pushed by proxy server is corresponding.
Client wait-receiving mode proxy server pushes data resource corresponding to the web page address of coming.If the local cache in proxy server has had data resource corresponding to this web page address, then all data resources corresponding for this web page address disposablely can be all pushed to client this locality by proxy server, all data resources corresponding for this web page address are stored in the buffer memory of client this locality by client, are then shown to user.If the local cache in proxy server does not also have data resource corresponding to this web page address, when then proxy server often gets a primary resource in all data resources corresponding to this web page address or child resource, this primary resource or child resource can be pushed to client at once, client can receive all data resources corresponding to this web page address in batches, and be cached to successively in the buffer memory of client this locality, be then shown to user.
In sum, the data capture method that the embodiment of the present invention three provides walks abreast capture the data resource of webpage by utilizing proxy server to set up many data channel, then active push is to the scheme of client, solve client when obtaining network data, waste flow and response time longer problem, reach client and only need initiation data acquisition request, just can obtain all data resources of webpage fast to be shown to the effect of user.
Please refer to Fig. 5, it illustrates the block diagram of the proxy server that the embodiment of the present invention four provides.This proxy server comprises request receiving module 520, resource handling module 540 and resource supplying module 560.
Request receiving module 520 is for receiving the data acquisition request from client, and this data acquisition request comprises web page address.
Resource handling module 540 to walk abreast the data resource captured corresponding to web page address for setting up at least two data channels with web page server.
Resource supplying module 560 is for being pushed to client by the data resource grabbed.
Preferably, proxy server can also comprise: signal receiving module 512, Path Setup module 514 and passage maintain module 516, as shown in Figure 6.Wherein, signal receiving module 512 is for receiving the lasting Path Setup request from client.Path Setup module 514 for setting up lasting passage with client, to receive at least one data acquisition request from client.Passage maintains module 516 for maintaining lasting passage by heartbeat signal and client.
Wherein, resource handling module 540 can specifically comprise: parallel placement unit 542, script judging unit 544 and script executing unit 546, as shown in Figure 7.Wherein, parallel placement unit 542 to walk abreast the data resource captured corresponding to web page address for setting up at least two data channels with web page server, data resource comprises primary resource and child resource, and every bar data channel is for capturing a primary resource or a child resource; Script judging unit 544 is for judging whether the primary resource that gets or child resource comprise script data; If script executing unit 546 determines for script judging unit 544 primary resource that parallel placement unit 542 gets or child resource comprises script data, then pre-execution script data.
Preferably, proxy server can also comprise: data cache module 550, as shown in Figure 8.Data cache module 550 is for being cached to local cache by the primary resource corresponding to the web page address got or child resource.Now, resource supplying module 560 can specifically comprise: resource monitoring unit 562 and resource supplying unit 564.Wherein, whether resource monitoring unit 562 is cached to primary resource corresponding to web page address or child resource for monitoring in local cache.If resource supplying unit 564 determines in local cache for resource judgment unit 562 and is cached to primary resource corresponding to web page address or child resource, then push primary resource or child resource to client.
In sum, the proxy server that the embodiment of the present invention four provides walks abreast capture the data resource of webpage by utilizing proxy server to set up many data channel, then active push is to the scheme of client, solve client when obtaining network data, waste flow and response time longer problem, reach client and only need initiation data acquisition request, just can obtain all data resources of webpage fast to be shown to the effect of user.Simultaneously, the proxy server that the embodiment of the present invention four provides also by increasing the mechanism of lasting passage mechanism, script data pre-execution mechanism and employing Cache cache web pages data, reaches the effect that web data can be pushed to client by proxy server quickly.
Please refer to Fig. 9, it illustrates the block diagram of the client that the embodiment of the present invention five provides.This client can comprise web-page requests receiver module 920, obtain request sending module 940 and data resource receiver module 960.
Web-page requests receiver module 920 is for receiving the web access requests from user, and this web access requests comprises web page address.
Obtain request sending module 940 for sending data acquisition request according to web page address to proxy server.
Data resource receiver module 960 is for receiving data resource corresponding to the web page address that pushed by proxy server.
Further, client can also comprise: set up request sending module 912, lasting Path Setup module 914 and lasting passage and maintain module 916, as shown in Figure 10.Wherein, request sending module 912 is set up for sending lasting Path Setup request to proxy server; Lasting Path Setup module 914 for setting up lasting passage with proxy server, to utilize lasting passage to send at least one data acquisition request to proxy server; Lasting passage maintains module 916 for maintaining lasting passage by heartbeat signal and proxy server.
Now, obtain request sending module 940 can also specifically comprise: first passage judging unit 942, second channel judging unit 944 and acquisition request transmitting unit 946.Wherein, there is lasting passage for judging whether in first passage judging unit 942; If second channel judging unit 944 is for existing lasting passage, then continue to judge that whether lasting passage is normal; If it is normal for lasting passage to obtain request transmitting unit 946, then utilize lasting passage to send data acquisition request to proxy server, data acquisition request comprises web page address.
In sum, the client that the embodiment of the present invention five provides walks abreast capture the data resource of webpage by utilizing proxy server to set up many data channel, then active push is to the scheme of client, solve client when obtaining network data, waste flow and response time longer problem, reach client and only need initiation data acquisition request, just can obtain all data resources of webpage fast to be shown to the effect of user.
Please refer to Figure 11, it illustrates the structural representation of the data-acquisition system that the embodiment of the present invention six provides.This data-acquisition system can comprise proxy server 11a, and client 11b.
Wherein, proxy server 11a can be the proxy server provided in embodiment four.Client 11b can be the proxy server provided in embodiment five.
It should be noted that: the proxy server that above-described embodiment provides, client and data-acquisition system are when obtaining web data, only be illustrated with the division of above-mentioned each functional module, in practical application, can distribute as required and by above-mentioned functions and be completed by different functional modules, internal structure by device is divided into different functional modules, to complete all or part of function described above.In addition, the proxy server that above-described embodiment provides, client and data-acquisition system and data capture method embodiment belong to same design, and its specific implementation process refers to embodiment of the method, repeats no more here.
One of ordinary skill in the art will appreciate that all or part of step realizing above-described embodiment can have been come by hardware, the hardware that also can carry out instruction relevant by program completes, described program can be stored in a kind of computer-readable recording medium, the above-mentioned storage medium mentioned can be read-only memory, disk or CD etc.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (13)

1. a data capture method, is characterized in that, described method comprises:
Receive the lasting Path Setup request from client;
Lasting passage is set up, to receive at least one data acquisition request from described client with described client;
Described lasting passage is maintained by heartbeat signal and described client;
Receive the data acquisition request from described client, described data acquisition request comprises web page address;
Set up at least two data channels with web page server to walk abreast the data resource captured corresponding to described web page address, described data resource comprises primary resource and child resource, and every bar data channel is for capturing a primary resource or a child resource;
The described data resource grabbed is pushed to described client;
Describedly set up at least two data channels with web page server and to walk abreast the data resource captured corresponding to described web page address, comprising:
A transmission control protocol TCP passage is set up to obtain described primary resource with described web page server;
When determining all data resources corresponding to described web page address and also comprising multiple described child resource, set up many described TCP channel respectively and walk abreast and obtain described child resource corresponding to described web page address.
2. data capture method according to claim 1, is characterized in that, describedly sets up at least two data channels with web page server and to walk abreast the data resource captured corresponding to described web page address, specifically comprises:
Set up at least two data channels with web page server to walk abreast the data resource captured corresponding to described web page address;
Whether the primary resource that judgement gets or child resource comprise script data;
If the primary resource got or child resource comprise script data, then script data described in pre-execution.
3. data capture method according to claim 1 and 2, is characterized in that, describedly sets up with web page server after at least two data channels walk abreast and capture corresponding to described web page address data resource, also comprises:
Primary resource corresponding to the described web page address got or child resource are cached to local cache.
4. data capture method according to claim 3, is characterized in that, described all data resources grabbed is pushed to described client, specifically comprises:
Monitor in described local cache and whether be cached to primary resource corresponding to described web page address or child resource;
If the primary resource be cached in described local cache corresponding to described web page address or child resource, then push described primary resource or child resource extremely described client.
5. a data capture method, is characterized in that, described method comprises:
Lasting Path Setup request is sent to proxy server;
Lasting passage is set up, to utilize described lasting passage to send at least one data acquisition request to described proxy server with described proxy server;
Described lasting passage is maintained by heartbeat signal and described proxy server;
Receive the web access requests from user, described web access requests comprises web page address;
Data acquisition request is sent to described proxy server according to described web page address;
Receive the data resource corresponding with described web page address pushed by described proxy server, described data resource is that described proxy server and web page server set up at least two data channels to walk abreast the data resource corresponding to described web page address grabbed, described data resource comprises primary resource and child resource, and every bar data channel is for capturing a primary resource or a child resource; To be described proxy server obtained by the transmission control protocol TCP passage set up with described web page server described primary resource; Described child resource is all data resources that described proxy server judges corresponding to described web page address when also comprising multiple described child resource, walks abreast obtain by setting up many described TCP channel respectively with described web page server.
6. data capture method according to claim 5, is characterized in that, describedly sends data acquisition request according to described web page address to described proxy server, specifically comprises:
Judge whether there is lasting passage;
If there is lasting passage, then continue to judge that whether described lasting passage is normal;
If described lasting passage is normal, then utilizes described lasting passage to send data acquisition request to proxy server, in described data acquisition request, comprise described web page address.
7. a proxy server, is characterized in that, described proxy server comprises:
Signal receiving module, for receiving the lasting Path Setup request from client;
Path Setup module, for setting up lasting passage with described client, to receive at least one data acquisition request from described client;
Passage maintains module, for maintaining described lasting passage by heartbeat signal and described client;
Request receiving module, for receiving the data acquisition request from described client, described data acquisition request comprises web page address;
Resource handling module, to walk abreast the data resource captured corresponding to described web page address for setting up at least two data channels with web page server, described data resource comprises primary resource and child resource, and every bar data channel is for capturing a primary resource or a child resource;
Resource supplying module, for being pushed to described client by the described data resource grabbed;
Described resource handling module, also for setting up a transmission control protocol TCP passage to obtain described primary resource with described web page server; When determining all data resources corresponding to described web page address and also comprising multiple described child resource, set up many described TCP channel respectively and walk abreast and obtain described child resource corresponding to described web page address.
8. proxy server according to claim 7, is characterized in that, described resource handling module, specifically comprises:
Parallel placement unit, script judging unit and script executing unit;
Described parallel placement unit, to walk abreast the data resource captured corresponding to described web page address for setting up at least two data channels with web page server;
Described script judging unit, for judging whether the primary resource that gets or child resource comprise script data;
Described script executing unit, if comprise script data, then script data described in pre-execution for the primary resource that gets or child resource.
9. the proxy server according to claim 7 or 8, is characterized in that, described proxy server, also comprises:
Data cache module;
Described data cache module, for being cached to local cache by the primary resource corresponding to the described web page address got or child resource.
10. proxy server according to claim 9, is characterized in that, described resource supplying module, specifically comprises:
Resource monitoring unit and resource supplying unit;
Whether described monitoring judging unit, be cached to primary resource corresponding to described web page address or child resource for monitoring in described local cache;
Described resource supplying unit, if for being cached to primary resource corresponding to described web page address or child resource in described local cache, then pushes described primary resource or child resource to described client.
11. 1 kinds of clients, is characterized in that, described client comprises:
Set up request sending module, for sending lasting Path Setup request to proxy server;
Lasting Path Setup module, for setting up lasting passage with described proxy server, to utilize described lasting passage to send at least one data acquisition request to described proxy server;
Lasting passage maintains module, for maintaining described lasting passage by heartbeat signal and described proxy server;
Web-page requests receiver module, for receiving the web access requests from user, described web access requests comprises web page address;
Obtain request sending module, for sending data acquisition request according to described web page address to described proxy server;
Data resource receiver module, for receiving the data resource corresponding with described web page address pushed by described proxy server, described data resource is that described proxy server and web page server set up at least two data channels to walk abreast the data resource corresponding to described web page address grabbed, described data resource comprises primary resource and child resource, and every bar data channel is for capturing a primary resource or a child resource; To be described proxy server obtained by the transmission control protocol TCP passage set up with described web page server described primary resource; Described child resource is all data resources that described proxy server judges corresponding to described web page address when also comprising multiple described child resource, walks abreast obtain by setting up many described TCP channel respectively with described web page server.
12. clients according to claim 11, is characterized in that, described acquisition request sending module, specifically comprises:
First passage judging unit, second channel judging unit and acquisition request transmitting unit;
, there is lasting passage for judging whether in described first passage judging unit;
Described second channel judging unit, if for there is lasting passage, then continues to judge that whether described lasting passage is normal;
Described acquisition request transmitting unit, if normal for described lasting passage, then utilize described lasting passage to send data acquisition request to proxy server, described data acquisition request comprises described web page address.
13. 1 kinds of data-acquisition systems, is characterized in that, it comprise as arbitrary in claim 7 to 10 as described in proxy server, and the client as described in claim 11 or 12.
CN201210133394.9A 2012-05-02 2012-05-02 Data capture method, system and equipment Active CN102710748B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210133394.9A CN102710748B (en) 2012-05-02 2012-05-02 Data capture method, system and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210133394.9A CN102710748B (en) 2012-05-02 2012-05-02 Data capture method, system and equipment

Publications (2)

Publication Number Publication Date
CN102710748A CN102710748A (en) 2012-10-03
CN102710748B true CN102710748B (en) 2016-01-27

Family

ID=46903294

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210133394.9A Active CN102710748B (en) 2012-05-02 2012-05-02 Data capture method, system and equipment

Country Status (1)

Country Link
CN (1) CN102710748B (en)

Families Citing this family (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102902803B (en) * 2012-10-12 2016-01-13 北京奇虎科技有限公司 Webpage trans-coding system
DE102012022796B4 (en) 2012-11-21 2014-08-28 Audi Ag A motor vehicle with an operating device for operating a social networking service Internet portal
CN104113564A (en) * 2013-04-17 2014-10-22 中国移动通信集团公司 Hyper text transfer protocol connection multiplexing method, device, system, and terminal
CN103258056B (en) * 2013-05-31 2016-06-29 北京奇虎科技有限公司 Process the method for style design table, server, client and system
CN103281387B (en) * 2013-05-31 2017-04-12 北京奇虎科技有限公司 Method, server, client side and system for processing child resources in web page
CN104239302B (en) * 2013-06-07 2017-10-03 腾讯科技(深圳)有限公司 Content of pages acquisition methods, device and application apparatus and mobile terminal
CN103530390B (en) * 2013-10-22 2018-09-04 北京奇虎科技有限公司 The method and apparatus of webpage capture
CN104850560A (en) * 2014-02-18 2015-08-19 北京京东尚科信息技术有限公司 Method and system for loading business data in webpage in real time
CN104866499A (en) * 2014-02-24 2015-08-26 腾讯科技(深圳)有限公司 Webpage loading method and device
CN104602034B (en) * 2014-12-31 2019-05-31 北京奇艺世纪科技有限公司 A kind of playback method and system of mobile webpage
CN104615771B (en) * 2015-02-13 2018-12-21 广州华多网络科技有限公司 A kind of method and device obtaining web data
CN106302572B (en) * 2015-05-15 2020-09-22 Tcl科技集团股份有限公司 Multi-channel communication method and system between servers
CN106293794A (en) * 2015-06-05 2017-01-04 阿里巴巴集团控股有限公司 Load the methods, devices and systems of the page
CN106549989B (en) * 2015-09-17 2020-02-18 腾讯科技(深圳)有限公司 Data transmission method and system, user terminal and application server
CN106612261A (en) * 2015-10-26 2017-05-03 北京国双科技有限公司 Website data obtaining method, devices and system
CN105893451A (en) * 2015-12-31 2016-08-24 乐视移动智能信息技术(北京)有限公司 Resource acquisition method and device
CN105610836B (en) * 2015-12-31 2019-01-15 浙江省公众信息产业有限公司 A kind of method and system of data transmission
CN107025234B (en) * 2016-02-01 2020-11-06 中国移动通信集团公司 Information pushing method and cache server
CN106095506A (en) * 2016-06-14 2016-11-09 乐视控股(北京)有限公司 A kind of page loading method and device
CN107656934B (en) * 2016-07-25 2021-09-07 腾讯科技(深圳)有限公司 Preloading method, device and equipment
CN107798008B (en) * 2016-08-31 2020-06-26 腾讯科技(深圳)有限公司 Content pushing system, method and device
CN106776947A (en) * 2016-12-02 2017-05-31 乐视控股(北京)有限公司 Resource acquiring method, device and terminal
CN107070973B (en) * 2017-01-05 2020-12-18 创新先进技术有限公司 Mobile terminal resource loading method and device
CN107230130A (en) * 2017-04-28 2017-10-03 杭州集盒网络技术有限公司 A kind of commercial circle data display method
CN107798061A (en) * 2017-09-18 2018-03-13 维沃移动通信有限公司 A kind of webpage loading method and mobile terminal
CN110737447B (en) * 2018-07-18 2023-11-14 阿里巴巴集团控股有限公司 Application updating method and device
CN110134896B (en) * 2019-05-17 2023-05-09 山东渤聚通云计算有限公司 Monitoring process and intelligent caching method of proxy server
CN110569462A (en) * 2019-07-31 2019-12-13 深圳市富途网络科技有限公司 Network request data management method, data management equipment and readable medium
CN110674379A (en) * 2019-09-26 2020-01-10 凡普数字技术有限公司 Method, device and storage medium for acquiring information
CN111131019B (en) * 2019-12-12 2021-06-22 华为技术有限公司 Multiplexing method and terminal for multiple HTTP channels
CN111191158A (en) * 2019-12-27 2020-05-22 北京达佳互联信息技术有限公司 Webpage resource caching method and device, computer equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101635718A (en) * 2009-08-26 2010-01-27 中兴通讯股份有限公司 Network crawler system and method for acquiring resource as well as network resource gripping device
CN101651707A (en) * 2009-09-22 2010-02-17 西安交通大学 Method for automatically acquiring user behavior log of network
CN102143187A (en) * 2011-04-07 2011-08-03 北京星网锐捷网络技术有限公司 Method and system for terminal equipment to access network as well as network access proxy device
CN102184231A (en) * 2011-05-12 2011-09-14 广州市动景计算机科技有限公司 Method and device for acquiring page resources

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080119177A1 (en) * 2006-09-15 2008-05-22 Speedus Corp. Metadata Content Delivery System for Wireless Networks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101635718A (en) * 2009-08-26 2010-01-27 中兴通讯股份有限公司 Network crawler system and method for acquiring resource as well as network resource gripping device
CN101651707A (en) * 2009-09-22 2010-02-17 西安交通大学 Method for automatically acquiring user behavior log of network
CN102143187A (en) * 2011-04-07 2011-08-03 北京星网锐捷网络技术有限公司 Method and system for terminal equipment to access network as well as network access proxy device
CN102184231A (en) * 2011-05-12 2011-09-14 广州市动景计算机科技有限公司 Method and device for acquiring page resources

Also Published As

Publication number Publication date
CN102710748A (en) 2012-10-03

Similar Documents

Publication Publication Date Title
CN102710748B (en) Data capture method, system and equipment
US11205037B2 (en) Content distribution network
US10411956B2 (en) Enabling planned upgrade/downgrade of network devices without impacting network sessions
CN108153798B (en) Page information processing method, device and system
CN105045887B (en) The system and method for mixed mode cross-domain data interaction
EP2888862B1 (en) Information transmission method, apparatus, and system, terminal, and server
CN103188574B (en) method and system for transmitting network video
CN102780711B (en) A kind of SNS application data access method and device thereof and system
US11343210B1 (en) Message selection and presentation in a real-time messaging platform
US10015204B2 (en) Method, terminal, and server for restoring transmission of session content
WO2015021898A1 (en) File sharing method, server and terminal
CN103826139A (en) CDN system, watching server and streaming media data transmission method
Qiao et al. Ccnxtomcat: An extended web server for content-centric networking
CN109495553A (en) A kind of webpage display control method, system and Reverse Proxy
CN103905496A (en) Picture downloading method and device
CN102891851A (en) Access control method, equipment and system of virtual desktop
CN1812410B (en) File transfer management systems and methods
CN103020241A (en) Dynamic page cache method and system based on session
CN103595744A (en) Information management method and client
JP2009110041A (en) Web screen sharing system, its terminal for sharing, and its sharing program
CN103001924A (en) Method, network server and system for accessing pages
CN104866499A (en) Webpage loading method and device
CN108810070A (en) A kind of resource share method, device, smart machine and storage medium
CN113542335A (en) Information sharing method and device, electronic equipment and storage medium
CN108471375A (en) A kind of message treatment method, device and terminal

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant