CN111641664A - Crawler equipment service request method, device and system - Google Patents

Crawler equipment service request method, device and system Download PDF

Info

Publication number
CN111641664A
CN111641664A CN201910153670.XA CN201910153670A CN111641664A CN 111641664 A CN111641664 A CN 111641664A CN 201910153670 A CN201910153670 A CN 201910153670A CN 111641664 A CN111641664 A CN 111641664A
Authority
CN
China
Prior art keywords
service request
long connection
proxy
sending
target station
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910153670.XA
Other languages
Chinese (zh)
Other versions
CN111641664B (en
Inventor
刘佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201910153670.XA priority Critical patent/CN111641664B/en
Publication of CN111641664A publication Critical patent/CN111641664A/en
Application granted granted Critical
Publication of CN111641664B publication Critical patent/CN111641664B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
    • H04L67/63Routing a service request depending on the request content or context
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1014Server selection for load balancing based on the content of a request
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services

Abstract

The application provides a method, a device and a system for requesting a crawler device service, wherein when a load balancing device receives a service request sent by a crawler device deployed in an intranet, if the service request carries route cookies, the load balancing device sends the service request to a proxy server corresponding to the route cookies; the proxy server determines whether the mapping relation between the route cookie and the long connection identifier is stored locally or not when receiving the service request sent by the load balancing equipment, and if so, sends the service request to the corresponding proxy client through the corresponding long connection; otherwise, according to a first preset rule, selecting a long connection and sending the service request to a corresponding proxy client; and the proxy client sends the service request to the target station when receiving the service request sent by the proxy server. The scheme can reduce cost, improve safety and improve usability.

Description

Crawler equipment service request method, device and system
Technical Field
The invention relates to the technical field of internet, in particular to a crawler equipment service request method, a crawler equipment service request device and a crawler equipment service request system.
Background
Many applications of the internet require the use of crawler technology, and some frequently executed operations are manually completed by using crawler equipment, such as a crawler robot agent.
At present, in general network deployment, robots are deployed in an external network, but due to security problems, most development resources of operation customers are not open to the outside, and thus, crawler devices deployed in the external network lose the right to use the resources.
In order to solve the above problems, a set of development environment resources needs to be established in the public network again, for example: the system comprises a redis cluster, an MQ cluster, an RPC scheduling center and a monitoring system. In addition, the robot deployed on the external network is also unsafe, and extra operation, maintenance and safety protection efforts are required to be invested.
Disclosure of Invention
In view of this, the present application provides a method, an apparatus, and a system for requesting a service of a crawler device, which can reduce cost, and improve security and usability.
In order to solve the technical problem, the technical scheme of the application is realized as follows:
a crawler device service request system, the system comprising: the system comprises crawler equipment, load balancing equipment, a plurality of proxy servers and a plurality of proxy clients;
the system comprises a load balancing device and a proxy server, wherein the load balancing device is used for sending a service request to the proxy server corresponding to a route cookie if the service request carries the route cookie when receiving the service request sent by a crawler device deployed in an intranet;
the proxy server determines whether the mapping relation between the routecookie and the long connection identifier is stored locally or not when receiving the service request sent by the load balancing equipment, and if so, sends the service request to the corresponding proxy client through the corresponding long connection; otherwise, according to a first preset rule, selecting a long connection and sending the service request to a corresponding proxy client;
and the proxy client sends the service request to the target station when receiving the service request sent by the proxy server.
A service request method of crawler equipment is applied to any proxy server in a system comprising the crawler equipment, load balancing equipment, a plurality of proxy servers and a plurality of proxy clients, and comprises the following steps:
when a service request transmitted by crawler equipment deployed in an intranet and forwarded by load balancing equipment is received, determining whether a mapping relation between route cookies carried by the service request and long connection identifiers is stored locally, if so, transmitting the service request to a corresponding proxy client through corresponding long connection, and enabling the proxy client to transmit the service request to a target station; otherwise, according to a first preset rule, selecting a long connection and sending the service request to a corresponding proxy client, so that the proxy client sends the service request to a target station.
A crawler device service request device is applied to any proxy server in a system comprising a crawler device, a load balancing device, a plurality of proxy servers and a plurality of proxy clients, and comprises the following components: a receiving unit, a determining unit and a transmitting unit;
the receiving unit is used for receiving a service request which is forwarded by the load balancing equipment and sent by the crawler equipment deployed in the intranet;
the determining unit is configured to determine whether to locally store a mapping relationship between a route cookie carried by the service request and the long connection identifier when the receiving unit receives the service request;
the sending unit is used for sending the service request to a corresponding proxy client through a corresponding long connection when the determining unit determines to store the mapping relation between the route cookie carried by the service request and the long connection identifier, so that the proxy client sends the service request to a target station; otherwise, according to a first preset rule, selecting a long connection and sending the service request to a corresponding proxy client, so that the proxy client sends the service request to a target station.
An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the steps of the crawler device service request method being implemented when the processor executes the program.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the crawler service request method.
According to the technical scheme, by introducing the high-availability distributed agent cluster, the system consists of a plurality of agent servers and agent clients, service requests sent by the crawler equipment are scattered to multiple machines, and the defect that the crawler equipment is deployed by using an intranet and a single-machine agent outlet is overcome; and the same IP outlet is used by the same request of the same route cookie to realize that one group of requests uses one IP outlet as much as possible, so that the scheme can reduce the cost and improve the usability.
Drawings
Fig. 1 is a schematic diagram of a service request system of crawler equipment in an embodiment of the present application;
FIG. 2 is a schematic view illustrating a service request flow of a crawler device in an embodiment of the present application;
fig. 3 is a schematic structural diagram of an apparatus applied to the above-described technology in the embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more clearly apparent, the technical solutions of the present invention are described in detail below with reference to the accompanying drawings and examples.
A system for requesting a service of a crawler device is provided in an embodiment of the present application, and referring to fig. 1, fig. 1 is a schematic diagram of the system for requesting a service of a crawler device in an embodiment of the present application. The system comprises: the system comprises a crawler device, a load balancing device, a plurality of proxy servers and a plurality of proxy clients.
Wherein, the crawler equipment can be the equipment that crawler robot etc. can realize the crawler function.
In the embodiment of the application, the crawler device, the load balancing device and the proxy server are deployed in an intranet, and the proxy client is deployed in a public network.
Before the crawler equipment sends a service request, the proxy client establishes long connection with the proxy server when the proxy client is online;
the proxy server stores the mapping relation between the long connection identifier and the proxy client identifier when the long connection between the proxy server and the proxy client is completed; wherein, one proxy server establishes long connection with 1 or more proxy terminals; a proxy client establishes long connections with 1 or more proxy servers.
That is, one proxy server or one proxy client can establish 1 or more long connections with the opposite end.
In fig. 1, two proxy servers and 3 proxy clients are taken as examples, and the two proxy servers are a proxy server 1, a proxy server 2, a proxy client 1, a proxy client 2, and a proxy client 3, respectively.
Assuming that the proxy server 1 establishes long connections with the proxy client 1 and the proxy client 2, respectively, and the corresponding long connection identifiers are 1 and 2, respectively, the mapping relationship stored on the proxy server 1 is as follows:
the long connection identifier 1 is a proxy client identifier 1, and the long connection identifier 2 is a proxy client identifier 2.
Assuming that the proxy server 2 establishes long connections with the proxy client 2 (two proxy clients) and the proxy client 3, and the corresponding long connection identifiers are 3, 5, and 4, respectively, the mapping relationship stored on the proxy server 2 is:
the long connection identifier 3 is the proxy client identifier 2, the long connection identifier 5 is the proxy client identifier 2, and the long connection identifier 4 is the proxy client identifier 3.
Multiple long connections may also be established between two devices.
The crawler equipment sends a service request, and if the service request is directed to a group of requests, such as a form filling request, a form extracting request and the like, the requests can be respectively regarded as a group of requests; if the service request in the group of requests is sent for the first time, the routing data (route cookie) is not carried; if the service request in the group of requests is not sent for the first time and the route cookie fed back by the load balancing device is received, the sent service request carries the corresponding route cookie.
Cookie is a piece of data that the server temporarily stores on your computer, and is preferably used by the server to identify your computer. When you are browsing the website, the Web server will first send a small data to be placed on your computer, and the Cookie will help you to record the words or some choices made on the website. When you next visit the same website, the Web server will first see if there is any Cookie data left by the Web server last time, and if there is any, the Web server will judge the user according to the contents in the Cookie and send the specific webpage contents to you.
The load balancing equipment is used for determining whether the service request carries route cookie or not when receiving the service request sent by the crawler equipment deployed in the intranet;
if the service request carries a route cookie, sending the service request to a proxy server corresponding to the route cookie;
further, if it is determined that the service request does not carry a route cookie, generating a route cookie for the service request, and returning the route cookie to the crawler device;
selecting a proxy server according to a second preset rule, sending the request to the selected proxy server, and carrying the generated route cookie in the service request sent to the proxy server; and establishing a mapping relation between the route cookie and the identifier of the selected proxy server, and storing the mapping relation.
The second preset rule may be a load balancing rule, and if a polling rule is adopted to select a proxy server, if the embodiment of the present application does not implement this, a rule for reasonably selecting a proxy server is configured according to actual needs.
The proxy server determines whether the mapping relation between the routecookie and the long connection identifier is stored locally or not when receiving the service request sent by the load balancing equipment, and if so, sends the service request to the corresponding proxy client through the corresponding long connection; otherwise, according to a first preset rule, selecting a long connection and sending the service request to a corresponding proxy client;
the proxy server further selects a long connection and sends the service request to a corresponding proxy client according to a first preset rule, establishes a mapping relation between a route cookie carried in the service request and a long connection identifier of the selected long connection, and locks the long connection identifier;
and when the long connection is selected according to a first preset rule aiming at a service request, selecting the long connection from the long connections except the long connection corresponding to the locked long connection identifier. That is, the present application is directed to long connections that are already locked, and when a new set of requests arrives, the locked long connections are not used.
The first preset rule may be a load balancing rule, and if a polling rule is adopted to select the proxy client, if the rule is not embodied, the rule for selecting the proxy client is configured reasonably according to actual needs.
When the polling selection proxy client is adopted, the method selects the client which establishes long connection with the proxy server, and selects the long connection from the unlocked long connections, namely, if a plurality of long connections are established between one proxy server and one proxy client, one of the long connections is locked without influencing the selection of other long connections:
if a long connection 3 and a long connection 5 are connected between the proxy server 2 and the proxy client 2, the long connection 5 may still be the long connection to be selected if the long connection 3 is locked.
And the proxy client sends the service request to the target station when receiving the service request sent by the proxy server.
In order to reduce cost and achieve high availability, the method that an intranet uses a reverse proxy HTTP + WebSocket service cluster + an ADSL host WebSocket client cluster to perform reverse proxy is adopted, and the ADSL host is low in price and rich in IP resources and has a dynamic IP function.
By introducing a high-availability distributed agent cluster which consists of a plurality of agent servers and agent clients, service requests sent by crawler equipment are scattered to a plurality of machines, and the defect that intranet deployment crawler equipment uses a single machine agent outlet is overcome; and the same IP outlet is used by the same request of the same route cookie to realize that one group of requests uses one IP outlet as much as possible, so that the scheme can reduce the cost and improve the usability.
In the embodiment of the present application, in order to prevent the blocking, an implementation scheme of the blocking prevention performed by the proxy client through the following two cases is provided, which is specifically as follows:
in the first case:
in the embodiment of the application, the proxy client needs to add a function of disconnection reconnection to achieve the purpose of regularly replacing the IP address. The method comprises the following specific steps:
the proxy client configures a switching timer aiming at an IP address used when a service request is sent to a target station; and when the switching timer is up, the switching IP address sends the service request.
For example, the timer timing time is set to 1 hour and 2 hours, and the timer timing here is set for each IP address.
Taking windows as an example, we can let the client call the rasdial program, and the script is implemented as follows:
@echo off
initializing connection data
set adsName ═ broadband connection
set adslUsername=05711937xxxx
set adslPassword=348124
:start
The dial-up connection
rasdial%adslName%%adslUsername%%adslPassword%
echo adsl connecting
IP after successful output connection
For/f "tokens ═ 2 defims:%% i in (' ipconfig ^ findstr ' IPv4 address ' ″) deset ip =% i
::echo IP adress:%ip%
Disconnection and reconnection every 60 minutes
ping 127.0.0.1-n 900
In that the connection is broken
rasdial%adslName%/disconnect
echo adsl disconnect
::
goto start
In the second case:
the agent client sends a connection request to the target station before sending the received service request to the target station;
if the response of the target station is not received within the first preset time, or the response rejection of the target station is received, switching the currently used IP address and then sending a connection request to the target station;
and sending the service request to the target station by using the IP address sending the connection request until receiving an acceptance response sent by the target station within first preset time after sending the connection request.
After a service request is sent to a target station, if a response of the target station is not received within a second preset time or the received response carries an error keyword configured by the proxy client, switching the currently used IP address;
after switching the IP address used currently, detecting the target station; if the detection fails, switching the IP address again, and detecting again; and sending the service request by using the IP address used when the detection is successful until the detection is successful.
That is, before sending a service request, a connection request is sent; if the confirmation response is received, namely the connection is successful, the corresponding IP address is used again to send the service request;
after receiving the corresponding service response, determining that the service request is successfully processed; otherwise, switching the IP address until the service request is successfully processed; and in the service request process, if the used IP address is timed out, switching the IP address.
Based on the same inventive concept, the embodiment of the application further provides a crawler device service request method, which is applied to any proxy server in a system comprising the crawler device, the load balancing device, a plurality of proxy servers and a plurality of proxy clients.
When the proxy client is online, long connection is established between the proxy client and the proxy server;
the proxy server stores the mapping relation between the long connection identifier and the proxy client identifier when the long connection between the proxy server and the proxy client is completed; wherein, one proxy server establishes long connection with 1 or more proxy terminals; a proxy client establishes long connections with 1 or more proxy servers.
Referring to fig. 2, fig. 2 is a schematic view of a service request flow of a crawler device in an embodiment of the present application. The method comprises the following specific steps:
step 201, a proxy server receives a service request sent by a crawler device deployed in an intranet and forwarded by a load balancing device.
The system comprises a load balancing device and a proxy server, wherein the load balancing device is used for sending a service request to the proxy server corresponding to a route cookie if the service request carries the route cookie when receiving the service request sent by a crawler device deployed in an intranet;
the load balancing equipment is used for further generating a route cookie for the service request and returning the route cookie to the crawler equipment if the service request is determined not to carry the route cookie;
selecting a proxy server according to a second preset rule, sending the request to the selected proxy server, and carrying the generated route cookie in the service request sent to the proxy server;
and establishing a mapping relation between the route cookie and the identifier of the selected proxy server, and storing the mapping relation.
Step 202, the proxy server determines whether the local stores the mapping relationship between the route cookie carried by the service request and the long connection identifier, if yes, step 203 is executed; otherwise, step 204 is performed.
Step 203, the proxy server sends the service request to the corresponding proxy client through the corresponding long connection, so that the proxy client sends the service request to the target station, and the process is ended.
And step 204, the proxy server selects a long connection and sends the service request to a corresponding proxy client according to a first preset rule, so that the proxy client sends the service request to a target station.
The proxy server further selects a long connection and sends the service request to a corresponding proxy client according to a first preset rule, establishes a mapping relation between a route cookie carried in the service request and a long connection identifier of the selected long connection, and locks the long connection identifier;
and when the long connection is selected according to a first preset rule aiming at a service request, selecting the long connection from the long connections except the long connection corresponding to the locked long connection identifier.
To prevent the IP address from being masked, the processing by the proxy client further includes:
the proxy server enables the proxy client to send a connection request to the target station before sending the service request to the target station; if the response of the target station is not received within the first preset time, or the response rejection of the target station is received, switching the currently used IP address and then sending a connection request to the target station; and sending the service request to the target station by using the IP address sending the connection request until receiving an acceptance response sent by the target station within first preset time after sending the connection request.
After the proxy server enables the proxy client to send the service request to the target station by using the IP address sending the connection request, if the response of the target station is not received within a second preset time or the received response carries the error keywords configured by the proxy client, the currently used IP address is switched;
the proxy client detects the target station after switching the currently used IP address; if the detection fails, switching the IP address again, and detecting again; and sending the service request by using the IP address used when the detection is successful until the detection is successful.
In combination with the foregoing anti-blocking processing procedure, an implementation scheme for periodically replacing an IP address is also provided, which specifically includes:
the proxy server enables the proxy client to configure a switching timer aiming at an IP address used when a service request is sent to a target station; and when the switching timer is up, the switching IP address sends the service request.
Based on the same inventive concept, the embodiment of the present application further provides a crawler device service request apparatus, which is applied to any proxy server in a system including a crawler device, a load balancing device, a plurality of proxy servers, and a plurality of proxy clients. Referring to fig. 3, fig. 3 is a schematic structural diagram of an apparatus applied to the above technology in the embodiment of the present application. The device includes: a receiving unit 301, a determining unit 302, and a transmitting unit 303;
a receiving unit 301, configured to receive a service request sent by a crawler device deployed in an intranet and forwarded by a load balancing device;
a determining unit 302, configured to determine, when the receiving unit 301 receives the service request, whether to locally store a mapping relationship between a route cookie carried by the service request and the long connection identifier;
a sending unit 303, configured to send the service request to a corresponding proxy client through a corresponding long connection when the determining unit 302 determines to store the mapping relationship between the route cookie and the long connection identifier carried in the service request, so that the proxy client sends the service request to a target station; otherwise, according to a first preset rule, selecting a long connection and sending the service request to a corresponding proxy client, so that the proxy client sends the service request to a target station.
Preferably, the apparatus further comprises: a building unit 304;
when the establishment unit 304 completes the establishment of the long connection with the proxy client, storing the mapping relationship between the long connection identifier and the proxy client identifier; wherein, one proxy server establishes long connection with 1 or more proxy terminals; a proxy client establishes long connections with 1 or more proxy servers.
Preferably, the apparatus further comprises:
a building unit 304, configured to, after the sending unit 303 selects a long connection according to a first preset rule and sends the service request to a corresponding proxy client, build a mapping relationship between a route cookie carried in the service request and a long connection identifier of the selected long connection, and lock the long connection identifier; and when the long connection is selected according to a first preset rule aiming at a service request, selecting the long connection from the long connections except the long connection corresponding to the locked long connection identifier.
The units of the above embodiments may be integrated into one body, or may be separately deployed; may be combined into one unit or further divided into a plurality of sub-units.
In addition, an electronic device is further provided in an embodiment of the present application, and includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the crawler device service request method when executing the program.
In addition, a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the crawler device service request method.
In summary, the application introduces a highly available distributed agent cluster, which is composed of a plurality of agent servers and agent clients, so that service requests sent by crawler equipment are scattered to multiple machines, and the defect that intranet deployment crawler equipment uses a single machine agent outlet is avoided; and the same IP outlet is used for the requests of the same route cookie to realize that one IP outlet is used as much as possible for one group of requests, and the scheme can reduce the cost, improve the safety and improve the usability.
And causes the proxy client to periodically switch the IP address sending the service request so as to avoid shadowing.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (12)

1. A crawler service request system, the system comprising: the system comprises crawler equipment, load balancing equipment, a plurality of proxy servers and a plurality of proxy clients;
the system comprises a load balancing device and a proxy server, wherein the load balancing device is used for sending a service request to the proxy server corresponding to a route cookie if the service request carries the route cookie when receiving the service request sent by a crawler device deployed in an intranet;
the proxy server determines whether the mapping relation between the routecookie and the long connection identifier is stored locally or not when receiving the service request sent by the load balancing equipment, and if so, sends the service request to the corresponding proxy client through the corresponding long connection; otherwise, according to a first preset rule, selecting a long connection and sending the service request to a corresponding proxy client;
and the proxy client sends the service request to the target station when receiving the service request sent by the proxy server.
2. A service request method for crawler equipment is applied to any proxy server in a system comprising the crawler equipment, load balancing equipment, a plurality of proxy servers and a plurality of proxy clients, and comprises the following steps:
when a service request transmitted by crawler equipment deployed in an intranet and forwarded by load balancing equipment is received, determining whether a mapping relation between route cookies carried by the service request and long connection identifiers is stored locally, if so, transmitting the service request to a corresponding proxy client through corresponding long connection, and enabling the proxy client to transmit the service request to a target station; otherwise, according to a first preset rule, selecting a long connection and sending the service request to a corresponding proxy client, so that the proxy client sends the service request to a target station.
3. The method of claim 2,
when the long connection with the agent client is completed, storing the mapping relation between the long connection identifier and the agent client identifier; wherein, one proxy server establishes long connection with 1 or more proxy terminals; a proxy client establishes long connections with 1 or more proxy servers.
4. The method of claim 2, further comprising:
after selecting a long connection and sending the service request to a corresponding proxy client according to a first preset rule, establishing a mapping relation between a route cookie carried in the service request and a long connection identifier of the selected long connection, and locking the long connection identifier;
and when the long connection is selected according to a first preset rule aiming at a service request, selecting the long connection from the long connections except the long connection corresponding to the locked long connection identifier.
5. The method of claim 2, further comprising:
before the agent client sends the service request to the target station, sending a connection request to the target station; if the response of the target station is not received within the first preset time, or the response rejection of the target station is received, switching the currently used IP address and then sending a connection request to the target station; and sending the service request to the target station by using the IP address sending the connection request until receiving an acceptance response sent by the target station within first preset time after sending the connection request.
6. The method of claim 5, further comprising:
after the agent client sends the service request to the target station by using the IP address sending the connection request, if the response of the target station is not received within a second preset time or the received response carries an error keyword configured by the agent client, switching the currently used IP address;
after the agent client switches the currently used IP address, the target station is detected; if the detection fails, switching the IP address again, and detecting again; and sending the service request by using the IP address used when the detection is successful until the detection is successful.
7. The method of claims 2-6, further comprising:
enabling the proxy client to configure a switching timer aiming at an IP address used when a service request is sent to a target station; and when the switching timer is up, the switching IP address sends the service request.
8. A service request device of a crawler device is applied to any proxy server in a system comprising the crawler device, a load balancing device, a plurality of proxy servers and a plurality of proxy clients, and comprises the following components: a receiving unit, a determining unit and a transmitting unit;
the receiving unit is used for receiving a service request which is forwarded by the load balancing equipment and sent by the crawler equipment deployed in the intranet;
the determining unit is configured to determine whether to locally store a mapping relationship between a route cookie carried by the service request and the long connection identifier when the receiving unit receives the service request;
the sending unit is used for sending the service request to a corresponding proxy client through a corresponding long connection when the determining unit determines to store the mapping relation between the route cookie carried by the service request and the long connection identifier, so that the proxy client sends the service request to a target station; otherwise, according to a first preset rule, selecting a long connection and sending the service request to a corresponding proxy client, so that the proxy client sends the service request to a target station.
9. The method of claim 8, wherein the apparatus further comprises: a building unit;
when the establishment unit establishes the long connection with the agent client, the mapping relation between the long connection identifier and the agent client identifier is stored; wherein, one proxy server establishes long connection with 1 or more proxy terminals; a proxy client establishes long connections with 1 or more proxy servers.
10. The apparatus of claim 8, further comprising:
the establishing unit is used for establishing a mapping relation between route cookies carried in the service request and the long connection identifier of the selected long connection and locking the long connection identifier after the sending unit selects one long connection and sends the service request to the corresponding proxy client according to a first preset rule; and when the long connection is selected according to a first preset rule aiming at a service request, selecting the long connection from the long connections except the long connection corresponding to the locked long connection identifier.
11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 2-7 when executing the program.
12. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the method of any one of claims 2-7.
CN201910153670.XA 2019-03-01 2019-03-01 Crawler equipment service request method, device and system and storage medium Active CN111641664B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910153670.XA CN111641664B (en) 2019-03-01 2019-03-01 Crawler equipment service request method, device and system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910153670.XA CN111641664B (en) 2019-03-01 2019-03-01 Crawler equipment service request method, device and system and storage medium

Publications (2)

Publication Number Publication Date
CN111641664A true CN111641664A (en) 2020-09-08
CN111641664B CN111641664B (en) 2023-12-05

Family

ID=72330426

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910153670.XA Active CN111641664B (en) 2019-03-01 2019-03-01 Crawler equipment service request method, device and system and storage medium

Country Status (1)

Country Link
CN (1) CN111641664B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114143368A (en) * 2021-12-21 2022-03-04 苏州万店掌网络科技有限公司 Communication method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678311A (en) * 2012-08-31 2014-03-26 腾讯科技(深圳)有限公司 Webpage access method and system based on transfer mode and path capturing server
CN103914568A (en) * 2014-04-24 2014-07-09 厦门市美亚柏科信息股份有限公司 Method and device for dispatching HTTP proxy
CN105740384A (en) * 2016-01-27 2016-07-06 浪潮软件集团有限公司 Crawler agent automatic switching method and device
CN107948329A (en) * 2018-01-03 2018-04-20 湖南麓山云数据科技服务有限公司 A kind of cross-domain processing method and system
CN108345642A (en) * 2018-01-12 2018-07-31 深圳壹账通智能科技有限公司 Method, storage medium and the server of website data are crawled using Agent IP
US20180225387A1 (en) * 2015-10-30 2018-08-09 Baidu Online Network Technology (Beijing) Co., Ltd Method and apparatus for accessing webpage, apparatus and non-volatile computer storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678311A (en) * 2012-08-31 2014-03-26 腾讯科技(深圳)有限公司 Webpage access method and system based on transfer mode and path capturing server
CN103914568A (en) * 2014-04-24 2014-07-09 厦门市美亚柏科信息股份有限公司 Method and device for dispatching HTTP proxy
US20180225387A1 (en) * 2015-10-30 2018-08-09 Baidu Online Network Technology (Beijing) Co., Ltd Method and apparatus for accessing webpage, apparatus and non-volatile computer storage medium
CN105740384A (en) * 2016-01-27 2016-07-06 浪潮软件集团有限公司 Crawler agent automatic switching method and device
CN107948329A (en) * 2018-01-03 2018-04-20 湖南麓山云数据科技服务有限公司 A kind of cross-domain processing method and system
CN108345642A (en) * 2018-01-12 2018-07-31 深圳壹账通智能科技有限公司 Method, storage medium and the server of website data are crawled using Agent IP

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114143368A (en) * 2021-12-21 2022-03-04 苏州万店掌网络科技有限公司 Communication method and device
CN114143368B (en) * 2021-12-21 2022-12-30 苏州万店掌网络科技有限公司 Communication method and device

Also Published As

Publication number Publication date
CN111641664B (en) 2023-12-05

Similar Documents

Publication Publication Date Title
EP3876607A1 (en) Intelligent scheduling method, terminal device, edge node cluster and intelligent scheduling system
CN107483260B (en) Fault processing method and device and electronic equipment
EP2939401B1 (en) Method for guaranteeing service continuity in a telecommunication network and system thereof
US8099510B2 (en) Relay device and program product, allowing continued communication via an alternative protocol
CN108712485B (en) Resource subscription method and device for Internet of things equipment
EP2566135B1 (en) Cloud-based mainframe integration system and method
US7072965B2 (en) Communication distribution controlling method and apparatus having improved response performance
US10191760B2 (en) Proxy response program, proxy response device and proxy response method
CN107528891B (en) Websocket-based automatic clustering method and system
CN110012083B (en) Data transmission method, server and data transmission device
JP2004280738A (en) Proxy response device
CN108712457A (en) Back-end server dynamic load method of adjustment and device based on Nginx reverse proxys
CN114189393A (en) Data processing method, device, equipment and storage medium
CN1980232A (en) Telnet session maitenance method, telnet proxy and computer network system
CN106970843B (en) Remote calling method and device
CN111641664A (en) Crawler equipment service request method, device and system
CN114490100A (en) Message queue telemetry transmission load balancing method and device and server
US5894547A (en) Virtual route synchronization
CN111416851A (en) Method for session synchronization among multiple load balancers and load balancer
CN111385324A (en) Data communication method, device, equipment and storage medium
US10182119B2 (en) System and methods for facilitating communication among a subset of connections that connect to a web application
WO2015164441A1 (en) Enhanced reliability for client-based web services
CN111427703A (en) Industrial data real-time display method and system
CN107835225B (en) Method, device and equipment for acquiring data information
CN111416852A (en) Method for session synchronization among multiple load balancers and load balancer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant