CN111641664A - Crawler equipment service request method, device and system - Google Patents
Crawler equipment service request method, device and system Download PDFInfo
- Publication number
- CN111641664A CN111641664A CN201910153670.XA CN201910153670A CN111641664A CN 111641664 A CN111641664 A CN 111641664A CN 201910153670 A CN201910153670 A CN 201910153670A CN 111641664 A CN111641664 A CN 111641664A
- Authority
- CN
- China
- Prior art keywords
- service request
- long connection
- proxy
- sending
- target station
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 235000014510 cooky Nutrition 0.000 claims abstract description 46
- 238000013507 mapping Methods 0.000 claims abstract description 29
- 230000004044 response Effects 0.000 claims description 17
- 238000001514 detection method Methods 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 6
- 238000010586 diagram Methods 0.000 description 4
- 230000007547 defect Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 101150012579 ADSL gene Proteins 0.000 description 2
- 102100020775 Adenylosuccinate lyase Human genes 0.000 description 2
- 108700040193 Adenylosuccinate lyases Proteins 0.000 description 2
- 230000000903 blocking effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000012790 confirmation Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/60—Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
- H04L67/63—Routing a service request depending on the request content or context
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1001—Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
- H04L67/1004—Server selection for load balancing
- H04L67/1014—Server selection for load balancing based on the content of a request
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
Abstract
The application provides a method, a device and a system for requesting a crawler device service, wherein when a load balancing device receives a service request sent by a crawler device deployed in an intranet, if the service request carries route cookies, the load balancing device sends the service request to a proxy server corresponding to the route cookies; the proxy server determines whether the mapping relation between the route cookie and the long connection identifier is stored locally or not when receiving the service request sent by the load balancing equipment, and if so, sends the service request to the corresponding proxy client through the corresponding long connection; otherwise, according to a first preset rule, selecting a long connection and sending the service request to a corresponding proxy client; and the proxy client sends the service request to the target station when receiving the service request sent by the proxy server. The scheme can reduce cost, improve safety and improve usability.
Description
Technical Field
The invention relates to the technical field of internet, in particular to a crawler equipment service request method, a crawler equipment service request device and a crawler equipment service request system.
Background
Many applications of the internet require the use of crawler technology, and some frequently executed operations are manually completed by using crawler equipment, such as a crawler robot agent.
At present, in general network deployment, robots are deployed in an external network, but due to security problems, most development resources of operation customers are not open to the outside, and thus, crawler devices deployed in the external network lose the right to use the resources.
In order to solve the above problems, a set of development environment resources needs to be established in the public network again, for example: the system comprises a redis cluster, an MQ cluster, an RPC scheduling center and a monitoring system. In addition, the robot deployed on the external network is also unsafe, and extra operation, maintenance and safety protection efforts are required to be invested.
Disclosure of Invention
In view of this, the present application provides a method, an apparatus, and a system for requesting a service of a crawler device, which can reduce cost, and improve security and usability.
In order to solve the technical problem, the technical scheme of the application is realized as follows:
a crawler device service request system, the system comprising: the system comprises crawler equipment, load balancing equipment, a plurality of proxy servers and a plurality of proxy clients;
the system comprises a load balancing device and a proxy server, wherein the load balancing device is used for sending a service request to the proxy server corresponding to a route cookie if the service request carries the route cookie when receiving the service request sent by a crawler device deployed in an intranet;
the proxy server determines whether the mapping relation between the routecookie and the long connection identifier is stored locally or not when receiving the service request sent by the load balancing equipment, and if so, sends the service request to the corresponding proxy client through the corresponding long connection; otherwise, according to a first preset rule, selecting a long connection and sending the service request to a corresponding proxy client;
and the proxy client sends the service request to the target station when receiving the service request sent by the proxy server.
A service request method of crawler equipment is applied to any proxy server in a system comprising the crawler equipment, load balancing equipment, a plurality of proxy servers and a plurality of proxy clients, and comprises the following steps:
when a service request transmitted by crawler equipment deployed in an intranet and forwarded by load balancing equipment is received, determining whether a mapping relation between route cookies carried by the service request and long connection identifiers is stored locally, if so, transmitting the service request to a corresponding proxy client through corresponding long connection, and enabling the proxy client to transmit the service request to a target station; otherwise, according to a first preset rule, selecting a long connection and sending the service request to a corresponding proxy client, so that the proxy client sends the service request to a target station.
A crawler device service request device is applied to any proxy server in a system comprising a crawler device, a load balancing device, a plurality of proxy servers and a plurality of proxy clients, and comprises the following components: a receiving unit, a determining unit and a transmitting unit;
the receiving unit is used for receiving a service request which is forwarded by the load balancing equipment and sent by the crawler equipment deployed in the intranet;
the determining unit is configured to determine whether to locally store a mapping relationship between a route cookie carried by the service request and the long connection identifier when the receiving unit receives the service request;
the sending unit is used for sending the service request to a corresponding proxy client through a corresponding long connection when the determining unit determines to store the mapping relation between the route cookie carried by the service request and the long connection identifier, so that the proxy client sends the service request to a target station; otherwise, according to a first preset rule, selecting a long connection and sending the service request to a corresponding proxy client, so that the proxy client sends the service request to a target station.
An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the steps of the crawler device service request method being implemented when the processor executes the program.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the crawler service request method.
According to the technical scheme, by introducing the high-availability distributed agent cluster, the system consists of a plurality of agent servers and agent clients, service requests sent by the crawler equipment are scattered to multiple machines, and the defect that the crawler equipment is deployed by using an intranet and a single-machine agent outlet is overcome; and the same IP outlet is used by the same request of the same route cookie to realize that one group of requests uses one IP outlet as much as possible, so that the scheme can reduce the cost and improve the usability.
Drawings
Fig. 1 is a schematic diagram of a service request system of crawler equipment in an embodiment of the present application;
FIG. 2 is a schematic view illustrating a service request flow of a crawler device in an embodiment of the present application;
fig. 3 is a schematic structural diagram of an apparatus applied to the above-described technology in the embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more clearly apparent, the technical solutions of the present invention are described in detail below with reference to the accompanying drawings and examples.
A system for requesting a service of a crawler device is provided in an embodiment of the present application, and referring to fig. 1, fig. 1 is a schematic diagram of the system for requesting a service of a crawler device in an embodiment of the present application. The system comprises: the system comprises a crawler device, a load balancing device, a plurality of proxy servers and a plurality of proxy clients.
Wherein, the crawler equipment can be the equipment that crawler robot etc. can realize the crawler function.
In the embodiment of the application, the crawler device, the load balancing device and the proxy server are deployed in an intranet, and the proxy client is deployed in a public network.
Before the crawler equipment sends a service request, the proxy client establishes long connection with the proxy server when the proxy client is online;
the proxy server stores the mapping relation between the long connection identifier and the proxy client identifier when the long connection between the proxy server and the proxy client is completed; wherein, one proxy server establishes long connection with 1 or more proxy terminals; a proxy client establishes long connections with 1 or more proxy servers.
That is, one proxy server or one proxy client can establish 1 or more long connections with the opposite end.
In fig. 1, two proxy servers and 3 proxy clients are taken as examples, and the two proxy servers are a proxy server 1, a proxy server 2, a proxy client 1, a proxy client 2, and a proxy client 3, respectively.
Assuming that the proxy server 1 establishes long connections with the proxy client 1 and the proxy client 2, respectively, and the corresponding long connection identifiers are 1 and 2, respectively, the mapping relationship stored on the proxy server 1 is as follows:
the long connection identifier 1 is a proxy client identifier 1, and the long connection identifier 2 is a proxy client identifier 2.
Assuming that the proxy server 2 establishes long connections with the proxy client 2 (two proxy clients) and the proxy client 3, and the corresponding long connection identifiers are 3, 5, and 4, respectively, the mapping relationship stored on the proxy server 2 is:
the long connection identifier 3 is the proxy client identifier 2, the long connection identifier 5 is the proxy client identifier 2, and the long connection identifier 4 is the proxy client identifier 3.
Multiple long connections may also be established between two devices.
The crawler equipment sends a service request, and if the service request is directed to a group of requests, such as a form filling request, a form extracting request and the like, the requests can be respectively regarded as a group of requests; if the service request in the group of requests is sent for the first time, the routing data (route cookie) is not carried; if the service request in the group of requests is not sent for the first time and the route cookie fed back by the load balancing device is received, the sent service request carries the corresponding route cookie.
Cookie is a piece of data that the server temporarily stores on your computer, and is preferably used by the server to identify your computer. When you are browsing the website, the Web server will first send a small data to be placed on your computer, and the Cookie will help you to record the words or some choices made on the website. When you next visit the same website, the Web server will first see if there is any Cookie data left by the Web server last time, and if there is any, the Web server will judge the user according to the contents in the Cookie and send the specific webpage contents to you.
The load balancing equipment is used for determining whether the service request carries route cookie or not when receiving the service request sent by the crawler equipment deployed in the intranet;
if the service request carries a route cookie, sending the service request to a proxy server corresponding to the route cookie;
further, if it is determined that the service request does not carry a route cookie, generating a route cookie for the service request, and returning the route cookie to the crawler device;
selecting a proxy server according to a second preset rule, sending the request to the selected proxy server, and carrying the generated route cookie in the service request sent to the proxy server; and establishing a mapping relation between the route cookie and the identifier of the selected proxy server, and storing the mapping relation.
The second preset rule may be a load balancing rule, and if a polling rule is adopted to select a proxy server, if the embodiment of the present application does not implement this, a rule for reasonably selecting a proxy server is configured according to actual needs.
The proxy server determines whether the mapping relation between the routecookie and the long connection identifier is stored locally or not when receiving the service request sent by the load balancing equipment, and if so, sends the service request to the corresponding proxy client through the corresponding long connection; otherwise, according to a first preset rule, selecting a long connection and sending the service request to a corresponding proxy client;
the proxy server further selects a long connection and sends the service request to a corresponding proxy client according to a first preset rule, establishes a mapping relation between a route cookie carried in the service request and a long connection identifier of the selected long connection, and locks the long connection identifier;
and when the long connection is selected according to a first preset rule aiming at a service request, selecting the long connection from the long connections except the long connection corresponding to the locked long connection identifier. That is, the present application is directed to long connections that are already locked, and when a new set of requests arrives, the locked long connections are not used.
The first preset rule may be a load balancing rule, and if a polling rule is adopted to select the proxy client, if the rule is not embodied, the rule for selecting the proxy client is configured reasonably according to actual needs.
When the polling selection proxy client is adopted, the method selects the client which establishes long connection with the proxy server, and selects the long connection from the unlocked long connections, namely, if a plurality of long connections are established between one proxy server and one proxy client, one of the long connections is locked without influencing the selection of other long connections:
if a long connection 3 and a long connection 5 are connected between the proxy server 2 and the proxy client 2, the long connection 5 may still be the long connection to be selected if the long connection 3 is locked.
And the proxy client sends the service request to the target station when receiving the service request sent by the proxy server.
In order to reduce cost and achieve high availability, the method that an intranet uses a reverse proxy HTTP + WebSocket service cluster + an ADSL host WebSocket client cluster to perform reverse proxy is adopted, and the ADSL host is low in price and rich in IP resources and has a dynamic IP function.
By introducing a high-availability distributed agent cluster which consists of a plurality of agent servers and agent clients, service requests sent by crawler equipment are scattered to a plurality of machines, and the defect that intranet deployment crawler equipment uses a single machine agent outlet is overcome; and the same IP outlet is used by the same request of the same route cookie to realize that one group of requests uses one IP outlet as much as possible, so that the scheme can reduce the cost and improve the usability.
In the embodiment of the present application, in order to prevent the blocking, an implementation scheme of the blocking prevention performed by the proxy client through the following two cases is provided, which is specifically as follows:
in the first case:
in the embodiment of the application, the proxy client needs to add a function of disconnection reconnection to achieve the purpose of regularly replacing the IP address. The method comprises the following specific steps:
the proxy client configures a switching timer aiming at an IP address used when a service request is sent to a target station; and when the switching timer is up, the switching IP address sends the service request.
For example, the timer timing time is set to 1 hour and 2 hours, and the timer timing here is set for each IP address.
Taking windows as an example, we can let the client call the rasdial program, and the script is implemented as follows:
@echo off
initializing connection data
set adsName ═ broadband connection
set adslUsername=05711937xxxx
set adslPassword=348124
:start
The dial-up connection
rasdial%adslName%%adslUsername%%adslPassword%
echo adsl connecting
IP after successful output connection
For/f "tokens ═ 2 defims:%% i in (' ipconfig ^ findstr ' IPv4 address ' ″) deset ip =% i
::echo IP adress:%ip%
Disconnection and reconnection every 60 minutes
ping 127.0.0.1-n 900
In that the connection is broken
rasdial%adslName%/disconnect
echo adsl disconnect
::
goto start
In the second case:
the agent client sends a connection request to the target station before sending the received service request to the target station;
if the response of the target station is not received within the first preset time, or the response rejection of the target station is received, switching the currently used IP address and then sending a connection request to the target station;
and sending the service request to the target station by using the IP address sending the connection request until receiving an acceptance response sent by the target station within first preset time after sending the connection request.
After a service request is sent to a target station, if a response of the target station is not received within a second preset time or the received response carries an error keyword configured by the proxy client, switching the currently used IP address;
after switching the IP address used currently, detecting the target station; if the detection fails, switching the IP address again, and detecting again; and sending the service request by using the IP address used when the detection is successful until the detection is successful.
That is, before sending a service request, a connection request is sent; if the confirmation response is received, namely the connection is successful, the corresponding IP address is used again to send the service request;
after receiving the corresponding service response, determining that the service request is successfully processed; otherwise, switching the IP address until the service request is successfully processed; and in the service request process, if the used IP address is timed out, switching the IP address.
Based on the same inventive concept, the embodiment of the application further provides a crawler device service request method, which is applied to any proxy server in a system comprising the crawler device, the load balancing device, a plurality of proxy servers and a plurality of proxy clients.
When the proxy client is online, long connection is established between the proxy client and the proxy server;
the proxy server stores the mapping relation between the long connection identifier and the proxy client identifier when the long connection between the proxy server and the proxy client is completed; wherein, one proxy server establishes long connection with 1 or more proxy terminals; a proxy client establishes long connections with 1 or more proxy servers.
Referring to fig. 2, fig. 2 is a schematic view of a service request flow of a crawler device in an embodiment of the present application. The method comprises the following specific steps:
The system comprises a load balancing device and a proxy server, wherein the load balancing device is used for sending a service request to the proxy server corresponding to a route cookie if the service request carries the route cookie when receiving the service request sent by a crawler device deployed in an intranet;
the load balancing equipment is used for further generating a route cookie for the service request and returning the route cookie to the crawler equipment if the service request is determined not to carry the route cookie;
selecting a proxy server according to a second preset rule, sending the request to the selected proxy server, and carrying the generated route cookie in the service request sent to the proxy server;
and establishing a mapping relation between the route cookie and the identifier of the selected proxy server, and storing the mapping relation.
And step 204, the proxy server selects a long connection and sends the service request to a corresponding proxy client according to a first preset rule, so that the proxy client sends the service request to a target station.
The proxy server further selects a long connection and sends the service request to a corresponding proxy client according to a first preset rule, establishes a mapping relation between a route cookie carried in the service request and a long connection identifier of the selected long connection, and locks the long connection identifier;
and when the long connection is selected according to a first preset rule aiming at a service request, selecting the long connection from the long connections except the long connection corresponding to the locked long connection identifier.
To prevent the IP address from being masked, the processing by the proxy client further includes:
the proxy server enables the proxy client to send a connection request to the target station before sending the service request to the target station; if the response of the target station is not received within the first preset time, or the response rejection of the target station is received, switching the currently used IP address and then sending a connection request to the target station; and sending the service request to the target station by using the IP address sending the connection request until receiving an acceptance response sent by the target station within first preset time after sending the connection request.
After the proxy server enables the proxy client to send the service request to the target station by using the IP address sending the connection request, if the response of the target station is not received within a second preset time or the received response carries the error keywords configured by the proxy client, the currently used IP address is switched;
the proxy client detects the target station after switching the currently used IP address; if the detection fails, switching the IP address again, and detecting again; and sending the service request by using the IP address used when the detection is successful until the detection is successful.
In combination with the foregoing anti-blocking processing procedure, an implementation scheme for periodically replacing an IP address is also provided, which specifically includes:
the proxy server enables the proxy client to configure a switching timer aiming at an IP address used when a service request is sent to a target station; and when the switching timer is up, the switching IP address sends the service request.
Based on the same inventive concept, the embodiment of the present application further provides a crawler device service request apparatus, which is applied to any proxy server in a system including a crawler device, a load balancing device, a plurality of proxy servers, and a plurality of proxy clients. Referring to fig. 3, fig. 3 is a schematic structural diagram of an apparatus applied to the above technology in the embodiment of the present application. The device includes: a receiving unit 301, a determining unit 302, and a transmitting unit 303;
a receiving unit 301, configured to receive a service request sent by a crawler device deployed in an intranet and forwarded by a load balancing device;
a determining unit 302, configured to determine, when the receiving unit 301 receives the service request, whether to locally store a mapping relationship between a route cookie carried by the service request and the long connection identifier;
a sending unit 303, configured to send the service request to a corresponding proxy client through a corresponding long connection when the determining unit 302 determines to store the mapping relationship between the route cookie and the long connection identifier carried in the service request, so that the proxy client sends the service request to a target station; otherwise, according to a first preset rule, selecting a long connection and sending the service request to a corresponding proxy client, so that the proxy client sends the service request to a target station.
Preferably, the apparatus further comprises: a building unit 304;
when the establishment unit 304 completes the establishment of the long connection with the proxy client, storing the mapping relationship between the long connection identifier and the proxy client identifier; wherein, one proxy server establishes long connection with 1 or more proxy terminals; a proxy client establishes long connections with 1 or more proxy servers.
Preferably, the apparatus further comprises:
a building unit 304, configured to, after the sending unit 303 selects a long connection according to a first preset rule and sends the service request to a corresponding proxy client, build a mapping relationship between a route cookie carried in the service request and a long connection identifier of the selected long connection, and lock the long connection identifier; and when the long connection is selected according to a first preset rule aiming at a service request, selecting the long connection from the long connections except the long connection corresponding to the locked long connection identifier.
The units of the above embodiments may be integrated into one body, or may be separately deployed; may be combined into one unit or further divided into a plurality of sub-units.
In addition, an electronic device is further provided in an embodiment of the present application, and includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the crawler device service request method when executing the program.
In addition, a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the crawler device service request method.
In summary, the application introduces a highly available distributed agent cluster, which is composed of a plurality of agent servers and agent clients, so that service requests sent by crawler equipment are scattered to multiple machines, and the defect that intranet deployment crawler equipment uses a single machine agent outlet is avoided; and the same IP outlet is used for the requests of the same route cookie to realize that one IP outlet is used as much as possible for one group of requests, and the scheme can reduce the cost, improve the safety and improve the usability.
And causes the proxy client to periodically switch the IP address sending the service request so as to avoid shadowing.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (12)
1. A crawler service request system, the system comprising: the system comprises crawler equipment, load balancing equipment, a plurality of proxy servers and a plurality of proxy clients;
the system comprises a load balancing device and a proxy server, wherein the load balancing device is used for sending a service request to the proxy server corresponding to a route cookie if the service request carries the route cookie when receiving the service request sent by a crawler device deployed in an intranet;
the proxy server determines whether the mapping relation between the routecookie and the long connection identifier is stored locally or not when receiving the service request sent by the load balancing equipment, and if so, sends the service request to the corresponding proxy client through the corresponding long connection; otherwise, according to a first preset rule, selecting a long connection and sending the service request to a corresponding proxy client;
and the proxy client sends the service request to the target station when receiving the service request sent by the proxy server.
2. A service request method for crawler equipment is applied to any proxy server in a system comprising the crawler equipment, load balancing equipment, a plurality of proxy servers and a plurality of proxy clients, and comprises the following steps:
when a service request transmitted by crawler equipment deployed in an intranet and forwarded by load balancing equipment is received, determining whether a mapping relation between route cookies carried by the service request and long connection identifiers is stored locally, if so, transmitting the service request to a corresponding proxy client through corresponding long connection, and enabling the proxy client to transmit the service request to a target station; otherwise, according to a first preset rule, selecting a long connection and sending the service request to a corresponding proxy client, so that the proxy client sends the service request to a target station.
3. The method of claim 2,
when the long connection with the agent client is completed, storing the mapping relation between the long connection identifier and the agent client identifier; wherein, one proxy server establishes long connection with 1 or more proxy terminals; a proxy client establishes long connections with 1 or more proxy servers.
4. The method of claim 2, further comprising:
after selecting a long connection and sending the service request to a corresponding proxy client according to a first preset rule, establishing a mapping relation between a route cookie carried in the service request and a long connection identifier of the selected long connection, and locking the long connection identifier;
and when the long connection is selected according to a first preset rule aiming at a service request, selecting the long connection from the long connections except the long connection corresponding to the locked long connection identifier.
5. The method of claim 2, further comprising:
before the agent client sends the service request to the target station, sending a connection request to the target station; if the response of the target station is not received within the first preset time, or the response rejection of the target station is received, switching the currently used IP address and then sending a connection request to the target station; and sending the service request to the target station by using the IP address sending the connection request until receiving an acceptance response sent by the target station within first preset time after sending the connection request.
6. The method of claim 5, further comprising:
after the agent client sends the service request to the target station by using the IP address sending the connection request, if the response of the target station is not received within a second preset time or the received response carries an error keyword configured by the agent client, switching the currently used IP address;
after the agent client switches the currently used IP address, the target station is detected; if the detection fails, switching the IP address again, and detecting again; and sending the service request by using the IP address used when the detection is successful until the detection is successful.
7. The method of claims 2-6, further comprising:
enabling the proxy client to configure a switching timer aiming at an IP address used when a service request is sent to a target station; and when the switching timer is up, the switching IP address sends the service request.
8. A service request device of a crawler device is applied to any proxy server in a system comprising the crawler device, a load balancing device, a plurality of proxy servers and a plurality of proxy clients, and comprises the following components: a receiving unit, a determining unit and a transmitting unit;
the receiving unit is used for receiving a service request which is forwarded by the load balancing equipment and sent by the crawler equipment deployed in the intranet;
the determining unit is configured to determine whether to locally store a mapping relationship between a route cookie carried by the service request and the long connection identifier when the receiving unit receives the service request;
the sending unit is used for sending the service request to a corresponding proxy client through a corresponding long connection when the determining unit determines to store the mapping relation between the route cookie carried by the service request and the long connection identifier, so that the proxy client sends the service request to a target station; otherwise, according to a first preset rule, selecting a long connection and sending the service request to a corresponding proxy client, so that the proxy client sends the service request to a target station.
9. The method of claim 8, wherein the apparatus further comprises: a building unit;
when the establishment unit establishes the long connection with the agent client, the mapping relation between the long connection identifier and the agent client identifier is stored; wherein, one proxy server establishes long connection with 1 or more proxy terminals; a proxy client establishes long connections with 1 or more proxy servers.
10. The apparatus of claim 8, further comprising:
the establishing unit is used for establishing a mapping relation between route cookies carried in the service request and the long connection identifier of the selected long connection and locking the long connection identifier after the sending unit selects one long connection and sends the service request to the corresponding proxy client according to a first preset rule; and when the long connection is selected according to a first preset rule aiming at a service request, selecting the long connection from the long connections except the long connection corresponding to the locked long connection identifier.
11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 2-7 when executing the program.
12. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the method of any one of claims 2-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910153670.XA CN111641664B (en) | 2019-03-01 | 2019-03-01 | Crawler equipment service request method, device and system and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910153670.XA CN111641664B (en) | 2019-03-01 | 2019-03-01 | Crawler equipment service request method, device and system and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111641664A true CN111641664A (en) | 2020-09-08 |
CN111641664B CN111641664B (en) | 2023-12-05 |
Family
ID=72330426
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910153670.XA Active CN111641664B (en) | 2019-03-01 | 2019-03-01 | Crawler equipment service request method, device and system and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111641664B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114143368A (en) * | 2021-12-21 | 2022-03-04 | 苏州万店掌网络科技有限公司 | Communication method and device |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103678311A (en) * | 2012-08-31 | 2014-03-26 | 腾讯科技(深圳)有限公司 | Webpage access method and system based on transfer mode and path capturing server |
CN103914568A (en) * | 2014-04-24 | 2014-07-09 | 厦门市美亚柏科信息股份有限公司 | Method and device for dispatching HTTP proxy |
CN105740384A (en) * | 2016-01-27 | 2016-07-06 | 浪潮软件集团有限公司 | Crawler agent automatic switching method and device |
CN107948329A (en) * | 2018-01-03 | 2018-04-20 | 湖南麓山云数据科技服务有限公司 | A kind of cross-domain processing method and system |
CN108345642A (en) * | 2018-01-12 | 2018-07-31 | 深圳壹账通智能科技有限公司 | Method, storage medium and the server of website data are crawled using Agent IP |
US20180225387A1 (en) * | 2015-10-30 | 2018-08-09 | Baidu Online Network Technology (Beijing) Co., Ltd | Method and apparatus for accessing webpage, apparatus and non-volatile computer storage medium |
-
2019
- 2019-03-01 CN CN201910153670.XA patent/CN111641664B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103678311A (en) * | 2012-08-31 | 2014-03-26 | 腾讯科技(深圳)有限公司 | Webpage access method and system based on transfer mode and path capturing server |
CN103914568A (en) * | 2014-04-24 | 2014-07-09 | 厦门市美亚柏科信息股份有限公司 | Method and device for dispatching HTTP proxy |
US20180225387A1 (en) * | 2015-10-30 | 2018-08-09 | Baidu Online Network Technology (Beijing) Co., Ltd | Method and apparatus for accessing webpage, apparatus and non-volatile computer storage medium |
CN105740384A (en) * | 2016-01-27 | 2016-07-06 | 浪潮软件集团有限公司 | Crawler agent automatic switching method and device |
CN107948329A (en) * | 2018-01-03 | 2018-04-20 | 湖南麓山云数据科技服务有限公司 | A kind of cross-domain processing method and system |
CN108345642A (en) * | 2018-01-12 | 2018-07-31 | 深圳壹账通智能科技有限公司 | Method, storage medium and the server of website data are crawled using Agent IP |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114143368A (en) * | 2021-12-21 | 2022-03-04 | 苏州万店掌网络科技有限公司 | Communication method and device |
CN114143368B (en) * | 2021-12-21 | 2022-12-30 | 苏州万店掌网络科技有限公司 | Communication method and device |
Also Published As
Publication number | Publication date |
---|---|
CN111641664B (en) | 2023-12-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3876607A1 (en) | Intelligent scheduling method, terminal device, edge node cluster and intelligent scheduling system | |
CN107483260B (en) | Fault processing method and device and electronic equipment | |
EP2939401B1 (en) | Method for guaranteeing service continuity in a telecommunication network and system thereof | |
US8099510B2 (en) | Relay device and program product, allowing continued communication via an alternative protocol | |
CN108712485B (en) | Resource subscription method and device for Internet of things equipment | |
EP2566135B1 (en) | Cloud-based mainframe integration system and method | |
US7072965B2 (en) | Communication distribution controlling method and apparatus having improved response performance | |
US10191760B2 (en) | Proxy response program, proxy response device and proxy response method | |
CN107528891B (en) | Websocket-based automatic clustering method and system | |
CN110012083B (en) | Data transmission method, server and data transmission device | |
JP2004280738A (en) | Proxy response device | |
CN108712457A (en) | Back-end server dynamic load method of adjustment and device based on Nginx reverse proxys | |
CN114189393A (en) | Data processing method, device, equipment and storage medium | |
CN1980232A (en) | Telnet session maitenance method, telnet proxy and computer network system | |
CN106970843B (en) | Remote calling method and device | |
CN111641664A (en) | Crawler equipment service request method, device and system | |
CN114490100A (en) | Message queue telemetry transmission load balancing method and device and server | |
US5894547A (en) | Virtual route synchronization | |
CN111416851A (en) | Method for session synchronization among multiple load balancers and load balancer | |
CN111385324A (en) | Data communication method, device, equipment and storage medium | |
US10182119B2 (en) | System and methods for facilitating communication among a subset of connections that connect to a web application | |
WO2015164441A1 (en) | Enhanced reliability for client-based web services | |
CN111427703A (en) | Industrial data real-time display method and system | |
CN107835225B (en) | Method, device and equipment for acquiring data information | |
CN111416852A (en) | Method for session synchronization among multiple load balancers and load balancer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |