CN108076067B - Method and system for authorized crawler configuration simulation login - Google Patents

Method and system for authorized crawler configuration simulation login Download PDF

Info

Publication number
CN108076067B
CN108076067B CN201711446333.7A CN201711446333A CN108076067B CN 108076067 B CN108076067 B CN 108076067B CN 201711446333 A CN201711446333 A CN 201711446333A CN 108076067 B CN108076067 B CN 108076067B
Authority
CN
China
Prior art keywords
authorization request
operation unit
authorization
user
crawler
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711446333.7A
Other languages
Chinese (zh)
Other versions
CN108076067A (en
Inventor
刘爽
李界鹏
王能
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongguancun Kejin Technology Co Ltd
Original Assignee
Beijing Zhongguancun Kejin Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongguancun Kejin Technology Co Ltd filed Critical Beijing Zhongguancun Kejin Technology Co Ltd
Priority to CN201711446333.7A priority Critical patent/CN108076067B/en
Publication of CN108076067A publication Critical patent/CN108076067A/en
Application granted granted Critical
Publication of CN108076067B publication Critical patent/CN108076067B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/083Network architectures or network communication protocols for network security for authentication of entities using passwords
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/083Network architectures or network communication protocols for network security for authentication of entities using passwords
    • H04L63/0838Network architectures or network communication protocols for network security for authentication of entities using passwords using one-time-passwords
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/10Network architectures or network communication protocols for network security for controlling access to devices or network resources
    • H04L63/102Entity profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2141Access rights, e.g. capability lists, access control lists, access tables, access matrices

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a method for simulating login by authorized crawler configuration, which comprises the following steps: generating an operation unit according to a configuration file input by a user, and adding the operation unit to the rear end of the crawler system; when the front end of the crawler system receives an authorization request, the authorization request and authorization parameters input by a user are sent to the back end; and the back end calls the corresponding operation unit to complete the authorization request according to the authorization parameters. According to the method, the characteristic that the online crawler system can dynamically read the configuration is utilized, when the configuration file input by a user is received, the corresponding operation unit is generated according to the configuration file, and the operation unit is added to the rear end of the crawler system, so that the effect that the configuration file can be updated at any time to change the authorization flow in the crawler system is achieved. The application also provides a system, a server and a computer readable storage medium for authorized crawler configuration simulation login, and the system, the server and the computer readable storage medium have the beneficial effects.

Description

Method and system for authorized crawler configuration simulation login
Technical Field
The present application relates to the field of web crawlers, and in particular, to a method, a system, a server, and a computer-readable storage medium for authorized crawler configuration-based simulated login.
Background
With the rapid development of the internet technology, a big data era has come, and data acquisition becomes a crucial link. The crawler system plays an irreplaceable role as an important source of data acquisition.
When data acquisition is performed on a website needing authorization, the authorization crawler faces a relatively complicated authorization login interaction problem, such as inputting a short message verification code, inputting a picture verification code or inputting a user name and a password for login. However, with the increasingly strong website crawling strategy, the authorization process of the website is often changed, which results in that if data collection of the website is to be continued, the original authorization process in the crawler system needs to be modified, that is, a code for the authorization process needs to be developed again and the code needs to be online, and the process is extremely tedious and has a long period.
Therefore, how to simplify the modification work of the authorization process in the crawler system is a technical problem to be solved by those skilled in the art at present.
Disclosure of Invention
The method can simplify modification work of an authorization process in a crawler system.
In order to solve the above technical problem, the present application provides a method for authorized crawler configuration simulation login, including:
generating an operation unit according to a configuration file input by a user, and adding the operation unit to the rear end of the crawler system;
when the front end of the crawler system receives an authorization request, the authorization request and authorization parameters input by a user are sent to the back end;
and the back end calls a corresponding operation unit to complete the authorization request according to the authorization parameters.
Optionally, the generating an operation unit according to a configuration file input by a user, and adding the operation unit to the crawler system includes:
pushing a configuration file input by a user to a specified path of zookeeper software so that the zookeeper software acquires configuration information in the configuration file;
connecting to the zookeeper software, and reading the configuration information;
analyzing the configuration information according to a configuration rule to generate an operation unit;
adding the operating unit to the crawler system.
Optionally, the authorization request includes at least one of a request for entering a login page, a request for inputting a short message verification code, a request for inputting a picture verification code, a request for inputting a user name and a password;
the operation unit correspondingly comprises at least one of a login page entering operation unit, a short message verification code input operation unit, a picture verification code input operation unit and a user name and password input operation unit.
Optionally, when the front end of the crawler system receives an authorization request, the front end sends the authorization request and authorization parameters input by a user to the back end, including:
the front end receives an authorization request;
judging whether the authorization request is an authorization request sent by a website or an authorization request sent by the back end;
when the authorization request received by the front end is an authorization request sent by a website, sending the authorization request sent by the website and authorization parameters input by a user to the rear end;
when the authorization request received by the front end is the authorization request sent by the back end, analyzing the authorization request sent by the back end to obtain a parameter requirement, and outputting the parameter requirement so as to enable a user to input a corresponding parameter according to the parameter requirement;
and when receiving the parameters input by the user, the front end sends the parameters to the back end.
The present application further provides a system for authorized crawler configuration simulation login, the system comprising:
the generating and adding module is used for generating an operation unit according to a configuration file input by a user and adding the operation unit to the rear end of the crawler system;
the system comprises a sending module, a back end and a front end, wherein the sending module is used for sending an authorization request and authorization parameters input by a user to the back end when the front end of the crawler system receives the authorization request;
and the calling module is used for calling the corresponding operation unit by the back end to complete the authorization request according to the authorization parameter.
Optionally, the generating and adding module includes:
the pushing submodule is used for pushing the configuration file input by the user to a specified path of the zookeeper software so that the zookeeper software can acquire the configuration information in the configuration file;
the reading sub-module is used for connecting to the zookeeper software and reading the configuration information;
the generating submodule is used for analyzing the configuration information according to the configuration rule to generate an operation unit;
and the adding submodule is used for adding the operation unit into the crawler system.
Optionally, the sending module includes:
a receiving submodule, configured to receive an authorization request by the front end;
the judging submodule is used for judging that the authorization request is an authorization request sent by a website or an authorization request sent by the back end;
the first sending sub-module is used for sending the authorization request sent by the website and the authorization parameters input by the user to the back end when the authorization request received by the front end is the authorization request sent by the website;
the analysis and output sub-module is used for analyzing the authorization request sent by the back end to obtain a parameter requirement and outputting the parameter requirement when the authorization request received by the front end is the authorization request sent by the back end, so that a user inputs a corresponding parameter according to the parameter requirement;
and the second sending submodule is used for sending the parameters to the back end by the front end when the parameters input by the user are received.
The present application further provides a server for authorized crawler configuration simulation login, the server comprising:
a memory for storing a computer program;
a processor for implementing the steps of the method for authorizing crawler configuration-based mock login as described in any of the above when executing the computer program.
The present application further provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of authorizing a crawler to configure a simulated login as claimed in any one of the above.
The method for simulating login of authorized crawler configuration comprises the steps of generating an operation unit according to a configuration file input by a user, and adding the operation unit to the rear end of a crawler system; when the front end of the crawler system receives an authorization request, the authorization request and authorization parameters input by a user are sent to the back end; and the back end calls the corresponding operation unit to complete the authorization request according to the authorization parameters.
According to the technical scheme, the characteristic that the online executed crawler system can dynamically read the configuration is utilized, when the configuration file input by a user is received, the corresponding operation unit is generated according to the configuration file, and the operation unit is added to the rear end of the crawler system, so that the effect that the configuration file can be updated at any time to change the authorization flow in the crawler system is achieved. Therefore, when the authorization process of the website is changed, the crawler system can quickly make adaptive changes after receiving the corresponding configuration files, so that the authorization change problem is quickly repaired, and the online cost is greatly reduced. The application also provides a system, a server and a computer readable storage medium for authorized crawler configuration simulation login, which have the beneficial effects and are not repeated herein.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flowchart of a method for authorizing crawler configuration to simulate login according to an embodiment of the present application;
FIG. 2 is a flow chart of an actual representation of S101 in the method for authorized crawler configuration simulation login provided in FIG. 1;
FIG. 3 is a flow chart of an actual representation of S102 in the method for authorizing crawler configuration to simulate login provided in FIG. 1;
FIG. 4 is a block diagram of a system for authorizing crawler configuration to simulate login according to an embodiment of the present application;
FIG. 5 is a block diagram of another system for authorizing a crawler to configure a simulated login according to an embodiment of the present application;
fig. 6 is a block diagram of an authorized crawler configuration simulation login server according to an embodiment of the present application.
Detailed Description
The core of the application is to provide a method, a system, a server and a computer readable storage medium for authorized crawler configuration simulation login, wherein the method can simplify modification work of an authorization process in a crawler system.
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, fig. 1 is a flowchart illustrating a method for simulating login of authorized crawler configuration according to an embodiment of the present disclosure.
The method specifically comprises the following steps:
s101: generating an operation unit according to a configuration file input by a user, and adding the operation unit to the rear end of the crawler system;
in the prior art, with the increasingly strong website reverse-crawling strategy, the authorization process of the website is often changed, which results in that if data acquisition is to be continuously performed on the website, the original authorization process in the crawler system needs to be modified, that is, codes for the authorization process need to be developed again and the codes need to be online, and the process is extremely complicated and has a long period. Based on the method, the method for authorized crawler configuration simulation login can simplify modification work of an authorization process in a crawler system;
because the authorization login of the website is a continuous and ordered process, all authorization steps of the website are packaged into a fixed operation unit;
when a configuration file input by a user is received, generating a corresponding operation unit according to the configuration file, and adding the operation unit to the back end of the crawler system, wherein the user can be a system developer;
optionally, the authorization request mentioned here includes at least one of a request for entering a login page, a request for inputting a short message authentication code, a request for inputting a picture authentication code, and a request for inputting a user name and a password; the operation unit correspondingly comprises at least one of a login page entering operation unit, a short message verification code input operation unit, a picture verification code input operation unit and a user name and password input operation unit;
certainly, the authorization request and the operation unit are not invariable, and in order to enhance the robustness of the method provided by the application, the user can also investigate and write the operation used by the authorization of the common website into the configuration file;
alternatively, the configuration file mentioned here may be an authorized operation procedure edited according to the XML extensible markup language.
S102: when the front end of the crawler system receives an authorization request, the authorization request and authorization parameters input by a user are sent to the back end;
when the crawler system logs in a website needing authorization, the crawler system receives an authorization request sent by the website, and when the authorization passes, the crawler system can acquire data.
S103: and the back end calls the corresponding operation unit to complete the authorization request according to the authorization parameters.
When the back end receives the authorization request and the authorization parameters sent by the front end, the corresponding operation unit is called, and the authorization parameters are input into the operation unit, so that the operation unit completes the authorization request.
Based on the technical scheme, the method for simulating login by authorizing the crawler configuration provided by the application utilizes the characteristic that the online executed crawler system can dynamically read the configuration, generates the corresponding operation unit according to the configuration file when receiving the configuration file input by a user, and adds the operation unit to the rear end of the crawler system, so that the effect of updating the configuration file at any time to change the authorization process in the crawler system is achieved. Therefore, when the authorization process of the website is changed, the crawler system can quickly make adaptive changes after receiving the corresponding configuration files, so that the authorization change problem is quickly repaired, and the online cost is greatly reduced.
Based on the above embodiments, please refer to fig. 2, fig. 2 is a flowchart of an actual representation manner of S101 in the method for authorized crawler configuration simulation login provided in fig. 1.
The present embodiment is directed to S101 of the previous embodiment, and a description is made of a specific implementation manner of the content described in S101, where the following is a flowchart shown in fig. 2, and the flowchart specifically includes the following steps:
s201: pushing a configuration file input by a user to a specified path of zookeeper software so that the zookeeper software acquires configuration information in the configuration file;
the ZooKeeper provides software of a consistency service for distributed application and provides a coordination service for a distributed application program of an open source code, and the provided functions comprise: configuration maintenance, domain name service, distributed synchronization, group service, and the like;
the ZooKeeper aims at packaging complex key services which are easy to make mistakes and providing a simple and easy-to-use interface and a system with high performance and stable function for a user; therefore, in the embodiment of the application, the ZooKeeper program is used for acquiring the configuration information in the configuration file;
optionally, the configuration file may be converted into a data file in a json format, and then the data file in the json format is pushed to a specified path of the zookeeper software;
alternatively, the Xml edited configuration file may be converted into a json formatted data file by the Xml2json2 tool.
S202: connecting to zookeeper software, and reading configuration information;
the online executing crawler system may connect to the zookeeper software to dynamically read the configuration information.
S203: analyzing the configuration information according to the configuration rule to generate an operation unit;
optionally, the configuration rule mentioned here may specifically be composed of at least one of objects such as process, input, anddunit, expression, group, url, output, success, error, and the like;
wherein, the preprocesses are used for encapsulating the functional logic of the large module, such as logic, getMessageCode, getPicCode, etc.; the preprocesses contain unique identification id attributes, so that different preprocesses can be found according to the id; the preprocess also comprises one or more inputs, and each input is specific submodule functional logic;
the Input is used for packaging the sub-module logic and comprises anduinit logic judgment configuration and group sub-logic specific configuration, anduinit judges whether the incoming parameters meet the execution conditions, if so, the Input is executed, otherwise, the Input is not executed;
anddunit is a logic and judgment condition, the lower side of which comprises a plurality of expressions, and the anddunit represents an and relation;
the Expression is a logic or judgment condition, exists under anduinit, and a plurality of expressions represent or are in relation;
the Group is a specific operation logic for packaging, such as cookie operation, parameter extraction and the like, and includes url for realizing the processing of a certain request;
the url is used for encapsulating a complete http request and comprises parameters, cookies, a request header, retry times, timeout time, logic processing after the request and the like;
output is used for processing the response of url request;
success is used for returning a result to the front end after request and response processing is successful;
the error is used for returning the result to the front end after the request and response processing fails.
S204: an operating unit is added to the crawler system.
Referring to fig. 3, fig. 3 is a flowchart illustrating an actual representation manner of S102 in the method for simulating login for configuring an authorized crawler in fig. 1.
The present embodiment is directed to S102 of the previous embodiment, and a description is made of a specific implementation manner of the content described in S102, where the following is a flowchart shown in fig. 2, and the flowchart specifically includes the following steps:
s301: the front end receives an authorization request;
s302: judging whether the authorization request is an authorization request sent by a website or an authorization request sent by a back end;
if the authorization request is an authorization request sent by the website, the method goes to step S303; if the authorization request is an authorization request sent by the backend, the process proceeds to step S304.
S303: sending the authorization request and the authorization parameters input by the user to the back end;
when the authorization request is the authorization request sent by the website, the authorization request is the first authorization request in the website authorization process, the authorization request is sent to the front end, the front end activates the back end, and the received authorization parameters and the authorization request are sent to the back end together, so that the back end completes the authorization request.
S304: analyzing an authorization request sent by a back end to obtain a parameter requirement, and outputting the parameter requirement;
when the authorization request is an authorization request sent by the back end, the authorization request is indicated to be a non-first authorization request in the website authorization process, namely the back end directly receives the authorization request sent by the website, analyzes the authorization request to obtain a parameter requirement, and sends the parameter requirement and the authorization request to the front end together, so that the front end analyzes the authorization request sent by the back end to obtain the parameter requirement and outputs the parameter requirement.
S305: when receiving the parameters input by the user, the front end sends the parameters to the back end.
When receiving the parameters input by the user, the front end sends the parameters to the back end so that the back end completes the authorization request.
Referring to fig. 4, fig. 4 is a block diagram of a system for simulating login with authorized crawler configuration according to an embodiment of the present disclosure.
The system may include:
the generating and adding module 100 is used for generating an operation unit according to a configuration file input by a user and adding the operation unit to the rear end of the crawler system;
a sending module 200, configured to send, when the front end of the crawler system receives an authorization request, the authorization request and authorization parameters input by a user to the back end;
and the calling module 300 is configured to call, at the back end, the corresponding operation unit to complete the authorization request according to the authorization parameter.
Referring to fig. 5, fig. 5 is a block diagram of another authorized crawler configuration simulation login system according to an embodiment of the present application.
The generating and adding module 100 may include:
the pushing submodule is used for pushing the configuration file input by the user to a specified path of the zookeeper software so that the zookeeper software can acquire the configuration information in the configuration file;
the reading submodule is used for connecting to zookeeper software and reading configuration information;
the generating submodule is used for analyzing the configuration information according to the configuration rule to generate an operation unit;
and the adding submodule is used for adding the operation unit into the crawler system.
The transmitting module 200 may include:
the receiving submodule is used for receiving the authorization request at the front end;
the judging submodule is used for judging whether the authorization request is an authorization request sent by a website or an authorization request sent by a back end;
the first sending submodule is used for sending the authorization request sent by the website and the authorization parameters input by the user to the back end when the authorization request received by the front end is the authorization request sent by the website;
the analysis and output sub-module is used for analyzing the authorization request sent by the back end to obtain a parameter requirement when the authorization request received by the front end is the authorization request sent by the back end, and outputting the parameter requirement so that a user inputs a corresponding parameter according to the parameter requirement;
and the second sending submodule is used for sending the parameters to the back end by the front end when the parameters input by the user are received.
The components of the above system can be applied to one practical process as follows:
the pushing sub-module pushes the configuration file input by the user to a specified path of the zookeeper software so that the zookeeper software can acquire configuration information in the configuration file; the reading sub-module is connected to the zookeeper software and reads configuration information; the generation submodule analyzes the configuration information according to the configuration rule to generate an operation unit; the adding submodule adds the operation unit into the crawler system;
when the front end of the receiving submodule receives an authorization request, the judging submodule judges that the authorization request is an authorization request sent by a website or an authorization request sent by a rear end; when the authorization request received by the front end is an authorization request sent by a website, the first sending submodule sends the authorization request sent by the website and authorization parameters input by a user to the rear end; when the authorization request received by the front end is the authorization request sent by the back end, the analysis and output sub-module analyzes the authorization request sent by the back end to obtain a parameter requirement, and outputs the parameter requirement so that a user inputs a corresponding parameter according to the parameter requirement; when the second sending submodule receives the parameters input by the user, the parameters are sent to the rear end; and the calling module calls the corresponding operation unit to complete the authorization request according to the authorization parameters.
Referring to fig. 6, fig. 6 is a structural diagram of an authorized crawler configuration simulation login server according to an embodiment of the present application.
The server may vary significantly due to configuration or performance, and may include one or more processors (CPUs) 422 (e.g., one or more processors) and memory 432, one or more storage media 430 (e.g., one or more mass storage servers) storing applications 442 or data 444. Wherein the memory 432 and storage medium 430 may be transient or persistent storage. The program stored on the storage medium 430 may include one or more modules (not shown), each of which may include a sequence of instructions operating on the system. Still further, the central processor 422 may be configured to communicate with the storage medium 430 to execute a series of instruction operations in the storage medium 430 on the authorized crawler configuration simulation login server 400.
The authorized crawler configuration analog login server 400 may also include one or more power supplies 424, one or more wired or wireless network interfaces 450, one or more input-output interfaces 458, and/or one or more operating systems 441, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and so forth.
The steps in the method for authorizing crawler configuration simulation login described in fig. 1 to 3 above are implemented by the authorization crawler configuration simulation login server based on the structure shown in fig. 6.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the system and the module described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, server and method may be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, a division of modules is merely a logical division, and an actual implementation may have another division, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, systems or modules, and may be in an electrical, mechanical or other form.
Modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.
The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer server (which may be a personal computer, a function call system, or a network server) to execute all or part of the steps of the method of the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The method, system, server and computer readable storage medium for authorized crawler configuration simulation login provided by the present application are described in detail above. The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or server that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or server. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or server that comprises the element.

Claims (8)

1. A method for simulating authorization by authorization crawler configuration is characterized by comprising the following steps:
generating an operation unit according to a configuration file input by a user, and adding the operation unit to the rear end of the crawler system;
when the front end of the crawler system receives an authorization request, the authorization request and authorization parameters input by a user are sent to the back end;
the back end calls a corresponding operation unit to complete the authorization request according to the authorization parameters;
the authorization request comprises at least one of a login page entering request, a short message verification code input request, a picture verification code input request, a user name input request and a password input request;
the operation unit correspondingly comprises at least one of a login page entering operation unit, a short message verification code input operation unit, a picture verification code input operation unit and a user name and password input operation unit.
2. The method of claim 1, wherein generating an operation unit according to a configuration file input by a user and adding the operation unit to a crawler system comprises:
pushing a configuration file input by a user to a specified path of zookeeper software so that the zookeeper software acquires configuration information in the configuration file;
connecting to the zookeeper software, and reading the configuration information;
analyzing the configuration information according to a configuration rule to generate an operation unit;
adding the operating unit to the crawler system.
3. The method of claim 1 or 2, wherein sending the authorization request and the authorization parameters entered by the user to the back-end when the front-end of the crawler system receives the authorization request comprises:
the front end receives an authorization request;
judging whether the authorization request is an authorization request sent by a website or an authorization request sent by the back end;
when the authorization request received by the front end is an authorization request sent by a website, sending the authorization request sent by the website and authorization parameters input by a user to the rear end;
when the authorization request received by the front end is the authorization request sent by the back end, analyzing the authorization request sent by the back end to obtain a parameter requirement, and outputting the parameter requirement so as to enable a user to input a corresponding parameter according to the parameter requirement;
and when receiving the parameters input by the user, the front end sends the parameters to the back end.
4. A system for authorizing crawler configuration to simulate login, comprising:
the generating and adding module is used for generating an operation unit according to a configuration file input by a user and adding the operation unit to the rear end of the crawler system;
the system comprises a sending module, a back end and a front end, wherein the sending module is used for sending an authorization request and authorization parameters input by a user to the back end when the front end of the crawler system receives the authorization request;
the calling module is used for calling the corresponding operation unit by the back end to complete the authorization request according to the authorization parameter;
the authorization request comprises at least one of a login page entering request, a short message verification code input request, a picture verification code input request, a user name input request and a password input request;
the operation unit correspondingly comprises at least one of a login page entering operation unit, a short message verification code input operation unit, a picture verification code input operation unit and a user name and password input operation unit.
5. The system of claim 4, wherein the generate and add module comprises:
the pushing submodule is used for pushing the configuration file input by the user to a specified path of the zookeeper software so that the zookeeper software can acquire the configuration information in the configuration file;
the reading sub-module is used for connecting to the zookeeper software and reading the configuration information;
the generating submodule is used for analyzing the configuration information according to the configuration rule to generate an operation unit;
and the adding submodule is used for adding the operation unit into the crawler system.
6. The system of claim 4 or 5, wherein the sending module comprises:
a receiving submodule, configured to receive an authorization request by the front end;
the judging submodule is used for judging that the authorization request is an authorization request sent by a website or an authorization request sent by the back end;
the first sending sub-module is used for sending the authorization request sent by the website and the authorization parameters input by the user to the back end when the authorization request received by the front end is the authorization request sent by the website;
the analysis and output sub-module is used for analyzing the authorization request sent by the back end to obtain a parameter requirement and outputting the parameter requirement when the authorization request received by the front end is the authorization request sent by the back end, so that a user inputs a corresponding parameter according to the parameter requirement;
and the second sending submodule is used for sending the parameters to the back end by the front end when the parameters input by the user are received.
7. A server for authorizing a crawler to configure a simulated login, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the method of authorizing a crawler configuration simulation login according to any one of claims 1 to 3 when executing the computer program.
8. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of authorizing a crawler to configure a simulated login according to any one of claims 1 to 3.
CN201711446333.7A 2017-12-27 2017-12-27 Method and system for authorized crawler configuration simulation login Active CN108076067B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711446333.7A CN108076067B (en) 2017-12-27 2017-12-27 Method and system for authorized crawler configuration simulation login

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711446333.7A CN108076067B (en) 2017-12-27 2017-12-27 Method and system for authorized crawler configuration simulation login

Publications (2)

Publication Number Publication Date
CN108076067A CN108076067A (en) 2018-05-25
CN108076067B true CN108076067B (en) 2021-05-18

Family

ID=62155328

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711446333.7A Active CN108076067B (en) 2017-12-27 2017-12-27 Method and system for authorized crawler configuration simulation login

Country Status (1)

Country Link
CN (1) CN108076067B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109815380A (en) * 2018-12-20 2019-05-28 山东中创软件工程股份有限公司 A kind of information crawler method, apparatus, equipment and computer readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103490896A (en) * 2013-09-16 2014-01-01 北京鹏宇成软件技术有限公司 Multi-user website automatic logger and achieving method thereof
CN103514171A (en) * 2012-06-20 2014-01-15 同程网络科技股份有限公司 Method for implementing self-defined crawler based on optical character recognition and vertical search
CN103984719A (en) * 2014-05-12 2014-08-13 浪潮电子信息产业股份有限公司 Method for acquiring by using crawler to simulate login
CN105631030A (en) * 2015-12-30 2016-06-01 福建亿榕信息技术有限公司 Universal web crawler login simulation method and system
CN106598991A (en) * 2015-10-19 2017-04-26 上海引跑信息科技有限公司 Web crawler system capable of realizing website interaction and automatic form extraction by conversational mode
CN106897357A (en) * 2017-01-04 2017-06-27 北京京拍档科技股份有限公司 A kind of method for crawling the network information for band checking distributed intelligence

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8978121B2 (en) * 2013-01-04 2015-03-10 Gary Stephen Shuster Cognitive-based CAPTCHA system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103514171A (en) * 2012-06-20 2014-01-15 同程网络科技股份有限公司 Method for implementing self-defined crawler based on optical character recognition and vertical search
CN103490896A (en) * 2013-09-16 2014-01-01 北京鹏宇成软件技术有限公司 Multi-user website automatic logger and achieving method thereof
CN103984719A (en) * 2014-05-12 2014-08-13 浪潮电子信息产业股份有限公司 Method for acquiring by using crawler to simulate login
CN106598991A (en) * 2015-10-19 2017-04-26 上海引跑信息科技有限公司 Web crawler system capable of realizing website interaction and automatic form extraction by conversational mode
CN105631030A (en) * 2015-12-30 2016-06-01 福建亿榕信息技术有限公司 Universal web crawler login simulation method and system
CN106897357A (en) * 2017-01-04 2017-06-27 北京京拍档科技股份有限公司 A kind of method for crawling the network information for band checking distributed intelligence

Also Published As

Publication number Publication date
CN108076067A (en) 2018-05-25

Similar Documents

Publication Publication Date Title
US20200151170A1 (en) Spark query method and system supporting trusted computing
CN110297944B (en) Distributed XML data processing method and system
CN108984202B (en) Electronic resource sharing method and device and storage medium
CN110858172A (en) Automatic test code generation method and device
CN105573733A (en) Communication method for browser and web front end and web front end and system
CN111443901B (en) Java reflection-based service expansion method and device
CN110955409A (en) Method and device for creating resources on cloud platform
CN107517188A (en) A kind of data processing method and device based on Android system
CN111651140A (en) Service method and device based on workflow
CN113778897A (en) Automatic test method, device, equipment and storage medium of interface
CN108076067B (en) Method and system for authorized crawler configuration simulation login
US20240000192A1 (en) Methods, systems and computer readable media for providing a user interface for html sap applications
CN111314355B (en) Authentication method, device, equipment and medium of VPN (virtual private network) server
CN108052842B (en) Signature data storage and verification method and device
CN116502283A (en) Privacy data processing method and device
US20170286074A1 (en) Electronic Device and Method for Multiple Compiling Platforms
CN112433752A (en) Page parsing method, device, medium and electronic equipment
CN113641359A (en) Data processing method and device
CN115687064A (en) Intelligent contract testing method based on block chain and related equipment
CN112929453A (en) Method and device for sharing session data
CN112714148A (en) Interface configuration method, device, equipment and medium
CN114428723A (en) Test system, system test method, related device and storage medium
CN113364773B (en) Security identification method and device and electronic equipment
US20170302541A1 (en) System and method for monitoring service
CN110740151A (en) micro-service adjusting method, device, server and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant