CN106778196A - Network station simulated login method and device and electronic equipment - Google Patents
Network station simulated login method and device and electronic equipment Download PDFInfo
- Publication number
- CN106778196A CN106778196A CN201510818702.5A CN201510818702A CN106778196A CN 106778196 A CN106778196 A CN 106778196A CN 201510818702 A CN201510818702 A CN 201510818702A CN 106778196 A CN106778196 A CN 106778196A
- Authority
- CN
- China
- Prior art keywords
- information
- verification code
- target
- target picture
- code information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 238000012795 verification Methods 0.000 claims abstract description 133
- 230000009193 crawling Effects 0.000 claims abstract description 62
- 235000014510 cooky Nutrition 0.000 claims description 59
- 238000004088 simulation Methods 0.000 claims description 46
- 230000010365 information processing Effects 0.000 claims description 12
- 230000005540 biological transmission Effects 0.000 claims description 11
- 230000002452 interceptive effect Effects 0.000 claims description 6
- 230000014759 maintenance of location Effects 0.000 claims description 3
- 230000008569 process Effects 0.000 description 10
- 230000000875 corresponding effect Effects 0.000 description 9
- 230000036541 health Effects 0.000 description 7
- 230000001960 triggered effect Effects 0.000 description 7
- 238000012545 processing Methods 0.000 description 6
- 230000009471 action Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 230000003862 health status Effects 0.000 description 2
- 238000010295 mobile communication Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/30—Authentication, i.e. establishing the identity or authorisation of security principals
- G06F21/31—User authentication
- G06F21/36—User authentication by graphic or iconic representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The embodiment of the invention provides a network site simulated login method, a network site simulated login device and electronic equipment. The method is applied to the electronic equipment in the information crawling system, wherein the information crawling system comprises the following steps: the method comprises the following steps that a central control device and at least one electronic device which is internally provided with a web crawler for crawling information, and comprises the following steps: the web crawler downloads a current target picture of a simulated login target website, wherein the target picture comprises verification code information; transmitting the target picture to central control equipment, so that the central control equipment displays prompt information about verification code information included in the target picture after receiving the target picture, and feeds the verification code information back to the electronic equipment after receiving the verification code information manually input based on the prompt information; receiving the verification code information; and simulating to log in the target network station based on the received verification code information. According to the scheme, the success rate of the web crawler for simulating login of the website can be improved.
Description
Technical Field
The invention relates to the technical field of information crawling, in particular to a website simulation login method, a website simulation login device and electronic equipment.
Background
In the big data era, crawling information by web crawlers is one of the main ways to obtain a large amount of sample data.
For the web crawler to capture the web page information to obtain the sample data, some web sites want to form a closed loop, do not want to put the data out of the site but only want the logged-in user to access the data, so in order to capture the web page information of the web site of the website, the web crawler needs to simulate the user to log in the web site.
However, when the web site is simulated to log in, the web site usually has a requirement for inputting the verification code information, and the verification code information is dynamically changed, at this time, because the web crawler cannot effectively identify the verification code, the login of the web crawler often fails, and the normal work and the capturing efficiency of the web crawler are seriously affected.
Disclosure of Invention
The embodiment of the invention aims to provide a website simulation login method, a website simulation login device and electronic equipment so as to improve the success rate of simulating login of a website by a web crawler. The specific technical scheme is as follows:
in a first aspect, an embodiment of the present invention provides a website simulation login method, which is applied to an electronic device in an information crawling system, where the information crawling system includes: a central control device and at least one electronic device having a web crawler built therein for crawling information, the method comprising:
the method comprises the steps that a web crawler downloads a current target picture of a target website which is simulated to log in, wherein the target picture comprises verification code information;
transmitting the downloaded target picture to the central control device, so that the central control device displays prompt information about verification code information included in the target picture manually input after receiving the target picture, and feeds back the verification code information to the electronic device after receiving the verification code information manually input based on the prompt information;
receiving the verification code information;
and simulating to log in the target network station based on the received verification code information.
Optionally, before the web crawler downloads the current target picture of the target website that is simulated to log in, the method further includes:
and receiving a crawling task which is sent by the central control equipment through a real-time interactive interface and is about simulating to log in a target network site.
Optionally, the method for simulating login of a website provided in the embodiment of the present invention further includes:
and after the target network site is successfully simulated and logged in, information crawling is carried out on the target network site.
Optionally, the method for simulating login of a website provided in the embodiment of the present invention further includes:
and after the simulation login of the target network site fails, re-executing the step of downloading the current target picture of the simulated login target network site.
Optionally, the transmitting the downloaded target picture to the central control device includes:
transmitting the downloaded target picture to the central control device through a socket channel;
or,
transmitting the downloaded target picture to the central control equipment through a Transmission Control Protocol (TCP) channel;
or,
and transmitting the downloaded target picture to the central control equipment through a user data packet protocol (UDP) channel.
Optionally, the method for simulating login of a website provided in the embodiment of the present invention further includes:
and feeding back simulation login result information to the central control equipment, so that the central control equipment outputs the simulation login result information after receiving the simulation login result information.
Optionally, the method for simulating login of a website provided in the embodiment of the present invention further includes:
and after the target network site is successfully simulated to log in, refreshing page cookie information of the target network site according to a login state keeping strategy.
Optionally, the refreshing the page cookie information of the target network site according to the login state holding policy includes:
refreshing page cookie information of the target network station according to a preset refreshing period;
or,
obtaining the effective cookie duration of the target network site;
determining a target refreshing period corresponding to the target network station based on the cookie effective duration, wherein the target refreshing period is smaller than the cookie effective duration;
and refreshing the page cookie information of the target network site according to the target refreshing period.
Optionally, the method for simulating login of a website provided in the embodiment of the present invention further includes:
and in the information crawling process, feeding back the health state information of the web crawler to the central control equipment.
In a second aspect, an embodiment of the present invention further provides a website simulation login apparatus, which is applied to an electronic device in an information crawling system, where the information crawling system includes: a central control device and at least one electronic device having a web crawler built therein for crawling information, the apparatus comprising:
the target picture downloading module is used for downloading a current target picture of a target network site which is simulated to log in, wherein the target picture comprises verification code information;
the target picture transmission module is used for transmitting the downloaded target picture to the central control equipment, so that the central control equipment displays prompt information about verification code information contained in the target picture which is manually input after receiving the target picture, and feeds the verification code information back to the electronic equipment after receiving the verification code information which is manually input based on the prompt information;
the verification code information receiving module is used for receiving the verification code information;
and the verification code information processing module is used for simulating to log in the target network station based on the received verification code information.
Optionally, the website simulation login apparatus provided in the embodiment of the present invention further includes:
and the crawling task receiving module is used for receiving a crawling task which is sent by the central control equipment through a real-time interactive interface and is about the simulated login target website before downloading the current target picture of the simulated login target website.
Optionally, the website simulation login apparatus provided in the embodiment of the present invention further includes:
and the information crawling module is used for performing information crawling on the target network site after the target network site is successfully simulated and logged in.
Optionally, the website simulation login apparatus provided in the embodiment of the present invention further includes:
and the login failure processing module is used for triggering the target picture downloading module after the simulation login of the target network site fails.
Optionally, the target picture transmission module is specifically configured to:
transmitting the downloaded target picture to the central control device through a socket channel;
or,
transmitting the downloaded target picture to the central control equipment through a Transmission Control Protocol (TCP) channel;
or,
and transmitting the downloaded target picture to the central control equipment through a user data packet protocol (UDP) channel.
Optionally, the website simulation login apparatus provided in the embodiment of the present invention further includes:
and the result feedback module is used for feeding back simulation login result information to the central control equipment, so that the central control equipment outputs the simulation login result information after receiving the simulation login result information.
Optionally, the website simulation login apparatus provided in the embodiment of the present invention further includes:
and the cookie information refreshing module is used for refreshing the page cookie information of the target network site according to the login state keeping strategy after the target network site is successfully simulated and logged in.
Optionally, the cookie information refreshing module includes: the first information refreshing submodule or the second information refreshing submodule;
the first information refreshing submodule is used for refreshing page cookie information of the target network site according to a preset refreshing period;
the second information refreshing submodule includes:
a cookie valid duration obtaining unit, configured to obtain a cookie valid duration of the target network site;
a target refresh period determining unit, configured to determine a target refresh period corresponding to the target network site based on the cookie valid duration, where the target refresh period is smaller than the cookie valid duration;
and the information refreshing unit is used for refreshing the page cookie information of the target network site according to the target refreshing period.
Optionally, the website simulation login apparatus provided in the embodiment of the present invention further includes:
and the health state feedback module is used for feeding back the health state information of the web crawler to the central control equipment in the information crawling process.
In a third aspect, an embodiment of the present invention further provides an electronic device, which is located in an information crawling system, where the information crawling system includes: the system comprises a central control device and at least one electronic device with a built-in web crawler for crawling information, wherein the electronic device comprises: the device comprises a shell, a processor, a memory, a circuit board and a power circuit, wherein the circuit board is arranged in a space enclosed by the shell, and the processor and the memory are arranged on the circuit board; a power supply circuit for supplying power to each circuit or device of the electronic apparatus; the memory is used for storing executable program codes; the processor runs a program corresponding to the executable program code by reading the executable program code stored in the memory for performing the steps of:
downloading a current target picture of a simulated logged target network site, wherein the target picture comprises verification code information;
transmitting the downloaded target picture to the central control device, so that the central control device displays prompt information about verification code information included in the target picture manually input after receiving the target picture, and feeds back the verification code information to the electronic device after receiving the verification code information manually input based on the prompt information;
receiving the verification code information;
and simulating to log in the target network station based on the received verification code information. In this embodiment, after downloading a current target picture of a target website which is simulated to log in, a web crawler built in an electronic device transmits the downloaded target picture to the central control device, so that the central control device displays prompt information about verification code information included in a manually input target picture after receiving the target picture, and feeds back the verification code information to the electronic device after receiving the verification code information manually input based on the prompt information; receiving the verification code information; and simulating to log in the target network station based on the received verification code information. Therefore, according to the scheme, the verification code information can be input in a manual mode, the accuracy of verification code input is improved, and the success rate of the web crawler to simulate to log in the website is further improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a network station simulated login method according to an embodiment of the present invention;
fig. 2 is another flowchart of a network station simulated login method according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a website simulation login apparatus according to an embodiment of the present invention;
fig. 4 is another schematic structural diagram of a website simulation login apparatus according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to solve the problem of the prior art, embodiments of the present invention provide a website login simulation method, device and electronic device, so as to improve the success rate of the web crawler in simulating login of the website, thereby ensuring the normal work and capture efficiency of the web crawler.
First, a network site simulation login method provided by the embodiment of the present invention is described below.
The website simulation login method provided by the embodiment of the invention is applied to electronic equipment in an information crawling system, wherein the information crawling system can comprise: the system comprises a central control device and at least one electronic device with a built-in web crawler for crawling information, and in practical application, the central control device and the electronic device can be: desktop, laptop, or server, etc. It should be noted that, when the information crawling system only includes the central control device and one electronic device, the central control device may belong to a separate device or the same device as the electronic device, which is reasonable.
Moreover, the functional software for executing the website simulation login method provided by the embodiment of the invention is a web crawler.
As shown in fig. 1, a network station simulated login method provided in an embodiment of the present invention may include the following steps:
s101, a web crawler downloads a current target picture of a target website which is simulated to log in, wherein the target picture comprises verification code information;
when the web crawler simulates to log in the website and needs to input the verification code information, the verification code information required to be input is usually prompted to a user in a picture form, so that the web crawler can download a target picture which is currently included in the verification code information of the simulated and logged target website, and further execute subsequent processing.
Also, it is emphasized that the crawling tasks of the web crawler with respect to simulating the log-in web site can be triggered by itself, such as: the trigger is triggered by itself when the preset time point is reached, or the execution is triggered according to a preset network station task table, of course, the trigger can also be triggered by the outside, which is reasonable.
S102, transmitting the downloaded target picture to the central control equipment, so that the central control equipment displays prompt information about verification code information included in the target picture manually input after receiving the target picture, and feeds back the verification code information to the electronic equipment after receiving the verification code information manually input based on the prompt information;
after downloading the current target picture of the simulated login target website, the web crawler can transmit the downloaded target picture to the central control device in order to manually input verification code information, so that the central control device displays prompt information about the verification code information manually input by the target picture after receiving the target picture, and feeds the verification code information back to the electronic device after receiving the verification code information manually input based on the prompt information.
Wherein, the prompt message at least includes: the target picture and the input box of the verification code information, and the specific form of the prompt information can be in a bullet box form or a web interface, which is reasonable. It is emphasized that the information crawling process and the website simulation login usually occur on the network side, not the user side, so that the administrator on the network side can manually input the verification code information based on the prompt information.
There are various ways to transmit the downloaded target pictures to the central control device, and for clarity of the layout of the scheme, examples will be described later.
S103, receiving the verification code information;
after the central control device feeds the verification code information back to the electronic device, the web crawler may receive the verification code information and then perform subsequent processing.
And S104, simulating to log in the target network station based on the received verification code information.
After receiving the verification code information, the web crawler may log in the target web site based on the received verification code information. Certainly, when the target website is simulated to log in, the user account and the user password are also required to be used, and the determination and input mode of the user account and the user password may be implemented by using the prior art, which is not described herein again.
In this embodiment, after downloading a current target picture of a target website which is simulated to log in, a web crawler built in an electronic device transmits the downloaded target picture to the central control device, so that the central control device displays prompt information about verification code information included in a manually input target picture after receiving the target picture, and feeds back the verification code information to the electronic device after receiving the verification code information manually input based on the prompt information; receiving the verification code information; and simulating to log in the target network station based on the received verification code information. Therefore, according to the scheme, the verification code information can be input in a manual mode, the accuracy of verification code input is improved, and the success rate of the web crawler to simulate to log in the website is further improved.
Further, as shown in fig. 2, on the basis of the above embodiment including S101 to S104, before the web crawler downloads the current target picture of the target website to be simulated and logged in (S101), the method for simulating and logging in a website provided by the embodiment of the present invention may further include:
and S100, receiving a crawling task which is sent by the central control equipment through a real-time interactive interface and is about simulating a login target network site.
The crawling task of the web crawler for simulating the login of the target web site can be actively triggered by the central control device, so that the operability of the crawling task is improved; moreover, the crawling task issued by the central control device on simulating the login target website can be triggered by the central control device itself, for example: the method is reasonable when the preset time point is reached, or the method is triggered to execute according to a preset network station task table, and can be issued by an administrator. Moreover, the real-time interactive interface is used for the active control of the central control device on the electronic device, and may be determined by the prior art, which is not described herein again.
Furthermore, after the target website is successfully simulated and logged in, the web crawler can further crawl information of the target website. The information crawling mode after the simulated login can adopt the prior art, and is not a design point of the embodiment of the invention, so that the detailed description is omitted.
In addition, in order to ensure the permanent login of the web crawler, after the login of the target website is successfully simulated, the page cookie information of the target website may be refreshed according to the login state retention policy, wherein it needs to be emphasized that the cookie information may include: the web crawler can obtain related login information by reading cookie information through effective cookie information when the web crawler comes to the web site again, and can make corresponding actions, if the web crawler can directly log in without inputting the user account and the user password, at the moment, the requirement of inputting verification code information does not exist due to no need of logging in, and the login efficiency of the target web site is improved.
Specifically, there are various specific implementation manners for refreshing the page cookie information of the target website according to the login state retention policy, and for the sake of clarity of the scheme, two specific implementation manners are introduced.
In a first implementation manner, the page cookie information of the target website may be refreshed according to a preset refresh period, where the preset refresh period is determined based on the counted cookie valid durations of all the common websites, and the preset refresh period is smaller than the counted cookie valid durations of all the common websites. Wherein, the effective cookie duration of each common network site can be counted by adopting the prior art, which is not described herein again; on the premise of ensuring that the preset refresh period is smaller than the counted cookie valid durations of all the common websites, the specific implementation manner of determining the preset refresh period based on the counted cookie valid durations of each common website is not limited herein. For example: the cookie valid duration of the common network station is as follows: the cookie valid duration of the website a is 3 hours, the cookie valid duration of the website B is 10 hours, and the cookie valid duration of the website C is 8 hours, so that the preset refresh period may be 2 hours, that is, the page cookie information is refreshed every 2 hours, or the preset refresh period may be 1 hour, that is, the page cookie information is refreshed every 1 hour, and so on. In a second implementation, the cookie valid duration of the target network site may be obtained; determining a target refreshing period corresponding to the target network station based on the cookie effective duration, wherein the target refreshing period is smaller than the cookie effective duration; and refreshing the page cookie information of the target network site according to the target refreshing period. Wherein, the cookie valid duration of the target network site can be obtained by adopting the prior art, which is not described herein again; in addition, on the premise of ensuring that the target refresh period is less than the cookie effective duration, the specific implementation manner of determining the target refresh period corresponding to the target network site based on the cookie effective duration is not limited herein. For example, the cookie validity duration of the target network site is 3 hours, the target refresh period may be 1 hour, i.e., the page cookie information is refreshed every 1 hour, or the target refresh period may be 0.5 hour, i.e., the page cookie information is refreshed every 0.5 hour, and so on.
In addition, in order to facilitate the management of the central control device on the web crawler, in the information crawling process, the health state information of the web crawler may be fed back to the central control device, where the health state information may include: the health status of the web crawler may be set according to actual conditions, the judgment basis of the health status may also be set according to actual conditions, and the specific setting manner may adopt the prior art, which is not described herein.
Further, it is understood that in some cases, the verification code information entered by the administrator may not be consistent with the verification code information in the target picture, such as: the definition of the target picture is not enough, or the verification code information in the target picture is easy to be confused, at this moment, the web crawler may not successfully log in the target website at one time. For the case that the web crawler cannot successfully simulate logging in the target website at one time, after the simulation logging in the target website fails, the web crawler may re-execute the step of downloading the current target picture of the simulated logged target website, that is, re-execute the whole process of the website simulation logging method provided by the embodiment of the present invention.
Specifically, there are various ways to transmit the downloaded target picture to the central control device, and the following describes a specific transmission way by way of example. Specifically, the transmitting the downloaded target picture to the central control device may include:
transmitting the downloaded target picture to the central control device through a socket channel;
or,
transmitting the downloaded target picture to the central Control device through a Transmission Control Protocol (TCP) channel;
or,
and transmitting the downloaded target picture to the central control device through a User Datagram Protocol (UDP) channel.
It will be understood by those skilled in the art that the socket channel, the TCP channel and the UDP channel may be selected according to circumstances, for example, when the requirement for data transmission stability is high, the socket channel may be selected preferentially.
Furthermore, in order to make the administrator know the login result of the web crawler, after the web crawler logs in the target website in a simulated manner based on the received verification code information, simulated login result information may be fed back to the central control device, so that the central control device outputs the simulated login result information after receiving the simulated login result information.
Corresponding to the above method embodiment, an embodiment of the present invention provides a website simulation login apparatus, which is applied to an electronic device in an information crawling system, where the information crawling system includes: a central control device and at least one electronic device with a built-in web crawler for crawling information, as shown in fig. 3, the apparatus may include:
a target picture downloading module 310, configured to download a current target picture of a target network site that is simulated to log in, where the target picture includes verification code information;
a target picture transmission module 320, configured to transmit the downloaded target picture to the central control device, so that the central control device, after receiving the target picture, displays prompt information about verification code information included in the target picture that is manually input, and after receiving verification code information that is manually input based on the prompt information, feeds back the verification code information to the electronic device;
a verification code information receiving module 330, configured to receive the verification code information;
and the verification code information processing module 340 is configured to simulate logging in the target network station based on the received verification code information.
In this embodiment, after downloading a current target picture of a target website which is simulated to log in, a web crawler built in an electronic device transmits the downloaded target picture to the central control device, so that the central control device displays prompt information about verification code information included in a manually input target picture after receiving the target picture, and feeds back the verification code information to the electronic device after receiving the verification code information manually input based on the prompt information; receiving the verification code information; and simulating to log in the target network station based on the received verification code information. Therefore, according to the scheme, the verification code information can be input in a manual mode, the accuracy of verification code input is improved, and the success rate of the web crawler to simulate to log in the website is further improved.
Further, as shown in fig. 4, on the basis of the above embodiment including the target picture downloading module 310, the target picture transmitting module 320, the verification code information receiving module 330, and the verification code information processing module 340, the website simulation login apparatus provided in the embodiment of the present invention may further include:
and the crawling task receiving module 300 is configured to receive a crawling task about the simulated login target website, which is sent by the central control device through the real-time interactive interface, before downloading the current target picture of the simulated login target website.
Furthermore, on the basis of the above-mentioned embodiment including the target picture downloading module 310, the target picture transmitting module 320, the verification code information receiving module 330, and the verification code information processing module 340 or the above-mentioned embodiment including the crawling task receiving module 300, the target picture downloading module 310, the target picture transmitting module 320, the verification code information receiving module 330, and the verification code information processing module 340, the website simulation login apparatus provided in the embodiment of the present invention may further include:
and the information crawling module is used for performing information crawling on the target network site after the target network site is successfully simulated and logged in.
Furthermore, on the basis of the above-mentioned embodiment including the target picture downloading module 310, the target picture transmitting module 320, the verification code information receiving module 330, and the verification code information processing module 340 or the above-mentioned embodiment including the crawling task receiving module 300, the target picture downloading module 310, the target picture transmitting module 320, the verification code information receiving module 330, and the verification code information processing module 340, the website simulation login apparatus provided in the embodiment of the present invention may further include:
and the login failure processing module is used for triggering the target picture downloading module after the simulation login of the target network site fails.
Specifically, in the above-mentioned embodiment including the target picture downloading module 310, the target picture transmitting module 320, the verification code information receiving module 330, and the verification code information processing module 340, or the above-mentioned embodiment including the crawling task receiving module 300, the target picture downloading module 310, the target picture transmitting module 320, the verification code information receiving module 330, and the verification code information processing module 340, the target picture transmitting module 320 is specifically configured to:
transmitting the downloaded target picture to the central control device through a socket channel;
or,
transmitting the downloaded target picture to the central control equipment through a Transmission Control Protocol (TCP) channel;
or,
and transmitting the downloaded target picture to the central control equipment through a user data packet protocol (UDP) channel.
Furthermore, on the basis of the above-mentioned embodiment including the target picture downloading module 310, the target picture transmitting module 320, the verification code information receiving module 330, and the verification code information processing module 340 or the above-mentioned embodiment including the crawling task receiving module 300, the target picture downloading module 310, the target picture transmitting module 320, the verification code information receiving module 330, and the verification code information processing module 340, the website simulation login apparatus provided in the embodiment of the present invention may further include:
and the result feedback module is used for feeding back simulation login result information to the central control equipment, so that the central control equipment outputs the simulation login result information after receiving the simulation login result information.
Further, on the basis of the above embodiment including the information crawling module, the website simulation login apparatus provided in the embodiment of the present invention may further include:
and the cookie information refreshing module is used for refreshing the page cookie information of the target network site according to the login state keeping strategy after the target network site is successfully simulated and logged in.
Specifically, the cookie information refreshing module may include: the first information refreshing submodule or the second information refreshing submodule;
the first information refreshing submodule is used for refreshing page cookie information of the target network site according to a preset refreshing period;
the second information refreshing sub-module may include:
a cookie valid duration obtaining unit, configured to obtain a cookie valid duration of the target network site;
a target refresh period determining unit, configured to determine a target refresh period corresponding to the target network site based on the cookie valid duration, where the target refresh period is smaller than the cookie valid duration;
and the information refreshing unit is used for refreshing the page cookie information of the target network site according to the target refreshing period.
Further, on the basis of the above embodiment including the information crawling module, the website simulation login apparatus provided in the embodiment of the present invention may further include:
and the health state feedback module is used for feeding back the health state information of the web crawler to the central control equipment in the information crawling process.
In addition, an embodiment of the present invention further provides an electronic device, which is located in an information crawling system, where the information crawling system includes: a central control device and at least one electronic device with a built-in web crawler for crawling information, as shown in fig. 5, the electronic device comprising: the device comprises a shell 501, a processor 502, a memory 503, a circuit board 504 and a power supply circuit 505, wherein the circuit board 504 is arranged inside a space enclosed by the shell 501, and the processor 502 and the memory 503 are arranged on the circuit board 504; a power supply circuit 505 for supplying power to each circuit or device of the electronic apparatus; the memory 503 is used to store executable program code; the processor 502 runs a program corresponding to the executable program code by reading the executable program code stored in the memory 503, for performing the steps of:
downloading a current target picture of a simulated logged target network site, wherein the target picture comprises verification code information;
transmitting the downloaded target picture to the central control device, so that the central control device displays prompt information about verification code information included in the target picture manually input after receiving the target picture, and feeds back the verification code information to the electronic device after receiving the verification code information manually input based on the prompt information;
receiving the verification code information;
and simulating to log in the target network station based on the received verification code information. For the specific execution process of the above steps by the processor 502 and the further steps executed by the processor 502 by running the executable program code, reference may be made to the description of the embodiments shown in fig. 1 to 4 of the present invention, which is not described herein again.
As can be seen from the above description, in the embodiment of the present invention, after downloading a current target picture of a target website which is simulated to log in, a web crawler built in an electronic device transmits the downloaded target picture to the central control device, so that the central control device displays, after receiving the target picture, prompt information about verification code information included in the target picture which is manually input, and after receiving the verification code information which is manually input based on the prompt information, feeds back the verification code information to the electronic device; receiving the verification code information; and simulating to log in the target network station based on the received verification code information. Therefore, according to the scheme, the verification code information can be input in a manual mode, the accuracy of verification code input is improved, and the success rate of the web crawler to simulate to log in the website is further improved.
The electronic device exists in a variety of forms, including but not limited to:
(1) a mobile communication device: such devices are characterized by mobile communications capabilities and are primarily targeted at providing voice, data communications. Such terminals include: smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.
(2) Ultra mobile personal computer device: the equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include: PDA, MID, and UMPC devices, etc., such as ipads.
(3) A portable entertainment device: such devices can display and play multimedia content. This type of device comprises: audio, video players (e.g., ipods), handheld game consoles, electronic books, and smart toys and portable car navigation devices.
(4) A server: the device for providing the computing service comprises a processor, a hard disk, a memory, a system bus and the like, and the server is similar to a general computer architecture, but has higher requirements on processing capacity, stability, reliability, safety, expandability, manageability and the like because of the need of providing high-reliability service.
(5) And other electronic devices with data interaction functions.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.
Claims (10)
1. A website simulation login method is applied to electronic equipment in an information crawling system, wherein the information crawling system comprises: a central control device and at least one electronic device having a web crawler built therein for crawling information, the method comprising:
the method comprises the steps that a web crawler downloads a current target picture of a target website which is simulated to log in, wherein the target picture comprises verification code information;
transmitting the downloaded target picture to the central control device, so that the central control device displays prompt information about verification code information included in the target picture manually input after receiving the target picture, and feeds back the verification code information to the electronic device after receiving the verification code information manually input based on the prompt information;
receiving the verification code information;
and simulating to log in the target network station based on the received verification code information.
2. The method of claim 1, wherein before the web crawler downloads the current target picture of the simulated logged target web site, the method further comprises:
and receiving a crawling task which is sent by the central control equipment through a real-time interactive interface and is about simulating to log in a target network site.
3. The method of claim 1 or 2, further comprising:
and after the target network site is successfully simulated and logged in, information crawling is carried out on the target network site.
4. The method of claim 1 or 2, further comprising:
and after the simulation login of the target network site fails, re-executing the step of downloading the current target picture of the simulated login target network site.
5. The method according to claim 1 or 2, wherein the transmitting the downloaded target picture to the central control device comprises:
transmitting the downloaded target picture to the central control device through a socket channel;
or,
transmitting the downloaded target picture to the central control equipment through a Transmission Control Protocol (TCP) channel;
or,
and transmitting the downloaded target picture to the central control equipment through a user data packet protocol (UDP) channel.
6. The method of claim 1 or 2, further comprising:
and feeding back simulation login result information to the central control equipment, so that the central control equipment outputs the simulation login result information after receiving the simulation login result information.
7. The method of claim 3, further comprising:
and after the target network site is successfully simulated to log in, refreshing page cookie information of the target network site according to a login state keeping strategy.
8. The method of claim 7, wherein refreshing page cookie information for the target network site according to a login state retention policy comprises:
refreshing page cookie information of the target network station according to a preset refreshing period;
or,
obtaining the effective cookie duration of the target network site;
determining a target refreshing period corresponding to the target network station based on the cookie effective duration, wherein the target refreshing period is smaller than the cookie effective duration;
and refreshing the page cookie information of the target network site according to the target refreshing period.
9. The utility model provides a website simulation login device which characterized in that, is applied to the electronic equipment in the information crawling system, wherein, the information crawling system includes: a central control device and at least one electronic device having a web crawler built therein for crawling information, the apparatus comprising:
the target picture downloading module is used for downloading a current target picture of a target network site which is simulated to log in, wherein the target picture comprises verification code information;
the target picture transmission module is used for transmitting the downloaded target picture to the central control equipment, so that the central control equipment displays prompt information about verification code information contained in the target picture which is manually input after receiving the target picture, and feeds the verification code information back to the electronic equipment after receiving the verification code information which is manually input based on the prompt information;
the verification code information receiving module is used for receiving the verification code information;
and the verification code information processing module is used for simulating to log in the target network station based on the received verification code information.
10. An electronic device, located in an information crawling system, wherein the information crawling system comprises: the system comprises a central control device and at least one electronic device with a built-in web crawler for crawling information, wherein the electronic device comprises: the device comprises a shell, a processor, a memory, a circuit board and a power circuit, wherein the circuit board is arranged in a space enclosed by the shell, and the processor and the memory are arranged on the circuit board; a power supply circuit for supplying power to each circuit or device of the electronic apparatus; the memory is used for storing executable program codes; the processor runs a program corresponding to the executable program code by reading the executable program code stored in the memory for performing the steps of:
downloading a current target picture of a simulated logged target network site, wherein the target picture comprises verification code information;
transmitting the downloaded target picture to the central control device, so that the central control device displays prompt information about verification code information included in the target picture manually input after receiving the target picture, and feeds back the verification code information to the electronic device after receiving the verification code information manually input based on the prompt information;
receiving the verification code information;
and simulating to log in the target network station based on the received verification code information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510818702.5A CN106778196A (en) | 2015-11-23 | 2015-11-23 | Network station simulated login method and device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510818702.5A CN106778196A (en) | 2015-11-23 | 2015-11-23 | Network station simulated login method and device and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106778196A true CN106778196A (en) | 2017-05-31 |
Family
ID=58962994
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510818702.5A Withdrawn CN106778196A (en) | 2015-11-23 | 2015-11-23 | Network station simulated login method and device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106778196A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107733969A (en) * | 2017-07-25 | 2018-02-23 | 上海壹账通金融科技有限公司 | Website simulation login method, device, service end and readable storage medium storing program for executing |
CN109783714A (en) * | 2019-01-08 | 2019-05-21 | 上海因致信息科技有限公司 | Interface data acquisition methods and system |
CN110012022A (en) * | 2019-04-15 | 2019-07-12 | 重庆天蓬网络有限公司 | Auth method, device, server and storage medium based on crawler technology |
CN110138719A (en) * | 2019-03-05 | 2019-08-16 | 北京车和家信息技术有限公司 | A kind of detection method of network security, device and electronic equipment |
CN110445746A (en) * | 2018-05-04 | 2019-11-12 | 腾讯科技(深圳)有限公司 | Cookie acquisition methods, device and storage equipment |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103532920A (en) * | 2012-07-06 | 2014-01-22 | 腾讯科技(深圳)有限公司 | Cookie update method and cookie update system |
CN104298716A (en) * | 2014-06-19 | 2015-01-21 | 中国科学院信息工程研究所 | Web crawler system and web crawler implementation method capable of supporting artificial session grafting |
-
2015
- 2015-11-23 CN CN201510818702.5A patent/CN106778196A/en not_active Withdrawn
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103532920A (en) * | 2012-07-06 | 2014-01-22 | 腾讯科技(深圳)有限公司 | Cookie update method and cookie update system |
CN104298716A (en) * | 2014-06-19 | 2015-01-21 | 中国科学院信息工程研究所 | Web crawler system and web crawler implementation method capable of supporting artificial session grafting |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107733969A (en) * | 2017-07-25 | 2018-02-23 | 上海壹账通金融科技有限公司 | Website simulation login method, device, service end and readable storage medium storing program for executing |
WO2019019675A1 (en) * | 2017-07-25 | 2019-01-31 | 深圳壹账通智能科技有限公司 | Simulated website login method and apparatus, server end and readable storage medium |
CN110445746A (en) * | 2018-05-04 | 2019-11-12 | 腾讯科技(深圳)有限公司 | Cookie acquisition methods, device and storage equipment |
CN110445746B (en) * | 2018-05-04 | 2022-01-07 | 腾讯科技(深圳)有限公司 | Cookie obtaining method and device and storage equipment |
CN109783714A (en) * | 2019-01-08 | 2019-05-21 | 上海因致信息科技有限公司 | Interface data acquisition methods and system |
CN110138719A (en) * | 2019-03-05 | 2019-08-16 | 北京车和家信息技术有限公司 | A kind of detection method of network security, device and electronic equipment |
CN110138719B (en) * | 2019-03-05 | 2022-05-27 | 北京车和家信息技术有限公司 | Network security detection method and device and electronic equipment |
CN110012022A (en) * | 2019-04-15 | 2019-07-12 | 重庆天蓬网络有限公司 | Auth method, device, server and storage medium based on crawler technology |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106778196A (en) | Network station simulated login method and device and electronic equipment | |
CN110351269B (en) | Method for logging in open platform through third-party server | |
CN108920366B (en) | Sub-application debugging method, device and system | |
CN106445621A (en) | Upgrading method and device of application and electronic device | |
CN109960521B (en) | Application program upgrading method and device, electronic equipment and storage medium | |
CN109224435B (en) | Online game acceleration method and device, electronic equipment and storage medium | |
US20170171329A1 (en) | Video recommendaton method and system, and server | |
CN114139135B (en) | Equipment login management method, device and storage medium | |
CN111030837B (en) | Network environment current situation assessment method and device, electronic equipment and storage medium | |
CN107182042B (en) | Short message channel quality evaluation method, device, medium and system | |
CN111355723A (en) | Single sign-on method, device, equipment and readable storage medium | |
CN110868383A (en) | Website risk assessment method and device, electronic equipment and storage medium | |
WO2019006595A1 (en) | Control method and apparatus, and electronic device | |
CN106803188A (en) | The display methods of business operation, device and electronic equipment | |
TW200618653A (en) | The operation method of a wireless entertaining system | |
JP5838248B1 (en) | System and method for providing a predetermined service to a user | |
JP5838250B1 (en) | System and method for providing a predetermined service to a user | |
CN112085208B (en) | Method and device for training model by cloud | |
CN106971296B (en) | Method for processing information object, electronic equipment and storage medium | |
CN105320777A (en) | Application program recommendation method and device | |
CN109344052B (en) | Interface automation test method and device and electronic equipment | |
CN110022327B (en) | Short message authentication test method and device | |
CN106790445A (en) | Data transmission method, device and electronic equipment based on Samba agreements | |
US20170169211A1 (en) | One kind of website passwords generating method and apparatus | |
CN110292777B (en) | Game cheating detection method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20170531 |