CN113254744A

CN113254744A - Method for acquiring data information of security equipment by using web crawler technology

Info

Publication number: CN113254744A
Application number: CN202110444788.5A
Authority: CN
Inventors: 司磊; 李刚; 韩文善
Original assignee: Cyberspace Great Wall System Application Guangdong Co ltd
Current assignee: Cyberspace Great Wall System Application Guangdong Co ltd
Priority date: 2021-04-24
Filing date: 2021-04-24
Publication date: 2021-08-13

Abstract

The invention belongs to the technical field of computer software, in particular to a method for acquiring data information of safety equipment by using a web crawler technology, which comprises the steps of firstly sending an acquisition request to a webpage to be acquired, then setting login-related entity parameters of the webpage to be acquired by using an exchange method in a Resttemplate tool, then linking the webpage to be acquired by using the Resttemplate tool, simulating login, linking to the webpage to be acquired, calling preset entity parameters by the Resttemplate tool to perform login authentication operation, crawling webpage data after login and storing the webpage data in a local database, finally counting and displaying the data in the local database to a front-end page, bypassing the identity authentication limit by the mode, crawling the wanted data information without performing identity login authentication by using the crawler technology, avoiding the defect of manual participation, and obtaining a login verification interface without secondary development, the development workload is reduced, and the cost is reduced.

Description

Method for acquiring data information of security equipment by using web crawler technology

[ technical field ] A method for producing a semiconductor device

The invention belongs to the technical field of computer software, and particularly relates to a method for acquiring data information of security equipment by using a web crawler technology.

[ background of the invention ]

The web crawler is used as a common tool for crawling network information in a network, but some websites limit the web crawler and can acquire information only by identity authentication, if the web crawler is used for directly acquiring the websites, a user login page which is usually jumped to after user information authentication fails is obtained, but not page content which is actually required to be acquired, when the user wants to bypass identity login authentication to crawl required data, an interface which does not need login authentication is provided at present, but a docking system needs to be developed secondarily, and the implementation is difficult.

[ summary of the invention ]

The invention provides a method for acquiring data information of security equipment by using a web crawler technology, and aims to solve the problem that information acquisition can be performed only by identity authentication when the web crawler is used in the prior art.

The invention is realized by the following technical scheme:

a method for acquiring data information of a security device by using a web crawler technology comprises the following steps:

step S1, sending an acquisition request to a page to be acquired;

step S2, using the exchange method in the RestTemplate tool to set the entity parameters related to login of the webpage to be collected, then linking the webpage through the RestTemplate tool and performing simulated login, wherein the method comprises the following operations:

step S21, linking to the web page to be collected;

step S22, calling preset entity parameters by a RestTemplate tool to perform login authentication operation, crawling webpage data after login and storing the webpage data in a local database;

step S23, count and display the data in the local database to the front page.

In step S22, after the simulated login is completed, the entity parameters are written in the Cookie of the browser, and the Cookie is recorded in the http messages variable.

According to the method for acquiring the data information of the safety equipment by using the web crawler technology, login is simulated by using a RestTemplate tool to bypass authentication limitation, information crawling is performed on the linked page by using the crawler technology, and the webpage data is cleaned to acquire the required original data.

A method for acquiring data information of a security device by using web crawler technology as described above, in step S22, the following steps are executed:

the web crawler calls a browser to access a webpage API, and a website login address to be accessed is transmitted to the browser;

the method comprises the following steps that a website login webpage is loaded by a browser, a web crawler calls an acquisition webpage API of the browser, and html content of the webpage is acquired;

searching entity parameters related to login by html content obtained by web crawler analysis, calling a submission form API of a browser, and submitting verification information to a website for verification;

and after the submitted verification information is successfully authenticated, the web crawler calls the browser to obtain a cookie interface, and cookie information of the site is obtained and stored through the cookie interface.

In step S22, the method for obtaining data information of a security device using web crawler technology sends an http request to access a site, sets the obtained cookie information in the http request, and crawls web page data of the site without logging in the access site before the cookie fails.

The method for acquiring the data information of the security device by using the web crawler technology is described above, wherein the login-related entity parameters comprise a user name and a password.

Compared with the prior art, the invention has the following advantages:

the invention provides a method for acquiring data information of safety equipment by using a web crawler technology, which comprises the steps of firstly sending an acquisition request to a webpage to be acquired, then setting login related entity parameters of the webpage to be acquired by using an exchange method in a Resttemplate tool, then linking the web pages through a RestTemplate tool, carrying out simulated login, linking the web pages to be collected, calling preset entity parameters by the RestTemplate tool to carry out login authentication operation, crawling the webpage data after logging in and storing the webpage data into a local database, counting and displaying the data in the local database to a front-end page, by the method, the limitation of identity authentication is bypassed, the data information which is wanted by people can be crawled without identity login authentication through a crawler technology, the defect of manual participation is avoided, and a login verification interface is not required to be obtained through secondary development, so that the development workload is reduced, and the cost is reduced.

[ description of the drawings ]

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below.

FIG. 1 is a schematic flow chart of the method of the present invention.

[ detailed description ] embodiments

In order to make the technical problems, technical solutions and advantageous effects solved by the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

When embodiments of the present invention refer to the ordinal numbers "first", "second", etc., it should be understood that the words are used for distinguishing between them unless the context clearly dictates otherwise.

In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

In a specific embodiment, as shown in fig. 1, a method for obtaining data information of a security device by using a web crawler technology includes the following steps:

step S1, sending an acquisition request to a page to be acquired;

step S21, linking to the web page to be collected;

step S23, count and display the data in the local database to the front page. Accessing third party REST services in a Spring application is related to using a Spring RestTemplate class. The design principle of the RestTemplate class is the same as that of many other Spring-related template classes, such as JdbcTemplate and JmsTemplate, and provides a simplified method with default behavior for executing complex tasks. RestTemplate relies by default on the ability of JDK to provide http connections, such as http urlconnection, and may be replaced by setrequest factory methods if necessary with other http libraries, such as Apache http libraries, Netty or OkHttp. The RestTemplate is a client side provided by Spring and used for accessing the Rest service, provides various methods for conveniently accessing the remote Http service, is simpler and more convenient compared with the traditional Http client side of apache, and can greatly improve the writing efficiency of the client side by adopting the RestTemplate. The RestTemplate provided by the spring framework can be used for calling rest service in application, simplifies the communication mode with http service, unifies the standard of RESTful, encapsulates http links, and only needs to transmit url and return value types.

Specifically, in step S22, after the simulated login is completed, the entity parameters are written in the Cookie of the browser, and the Cookie is recorded in the http headers variable;

more specifically, a RestTemplate tool is used for simulating login to bypass authentication limitation, information crawling is carried out on a linked page through a crawler technology, and webpage data are cleaned to obtain required original data;

further, in step S22, the following steps are performed:

Further, in step S22, an http request is sent to access the site, the obtained cookie information is set in the http request, and the site is exempted from login before the cookie expires, so as to crawl the web page data of the site.

Specifically, the entity parameters related to login include a user name, a password, a key, an identity card, and the like.

The invention relates to a method for acquiring data information of safety equipment by using a web crawler technology, which comprises the steps of firstly sending an acquisition request to a webpage to be acquired, then setting login related entity parameters of the webpage to be acquired by using an exchange method in a RestTemplate tool, then linking the web pages through a RestTemplate tool, carrying out simulated login, linking the web pages to be collected, calling preset entity parameters by the RestTemplate tool to carry out login authentication operation, crawling the webpage data after logging in and storing the webpage data into a local database, counting and displaying the data in the local database to a front-end page, by the method, the limitation of identity authentication is bypassed, the data information which is wanted by people can be crawled without identity login authentication through a crawler technology, the defect of manual participation is avoided, and a login verification interface is not required to be obtained through secondary development, so that the development workload is reduced, and the cost is reduced.

The above description is provided as an embodiment of the present invention, and the embodiments of the present invention are not limited to the description, and the present invention is not limited to the above nomenclature and the English nomenclature because the trade names are different. Similar or identical methods, structures and the like as those of the present invention or several technical deductions or substitutions made on the premise of the conception of the present invention should be considered as the protection scope of the present invention.

Claims

1. A method for acquiring data information of a security device by using a web crawler technology is characterized by comprising the following steps: the method comprises the following steps:

step S1, sending an acquisition request to a page to be acquired;

step S21, linking to the web page to be collected;

step S23, count and display the data in the local database to the front page.

2. The method for acquiring data information of a security device by using web crawler technology as claimed in claim 1, wherein: in step S22, after the simulated login is completed, the entity parameters are written in the Cookie of the browser, and the Cookie is recorded in the http headers variable.

3. The method for acquiring data information of a security device by using web crawler technology as claimed in claim 2, wherein: and simulating login by using a RestTemplate tool to bypass authentication limitation, performing information crawling on the linked page by using a crawler technology, and cleaning the webpage data to acquire the required original data.

4. The method for acquiring data information of a security device by using web crawler technology as claimed in claim 3, wherein: in step S22, the following steps are performed:

5. The method for acquiring data information of a security device by using web crawler technology as claimed in claim 4, wherein: in step S22, an http request is sent to access the site, the obtained cookie information is set in the http request, and the site is accessed without logging in before the cookie expires, and web page data of the site is crawled.

6. A method for acquiring data information of a security device using web crawler technology according to any of claims 1-5, wherein: the login related entity parameters include a user name and a password.