CN113254744A - Method for acquiring data information of security equipment by using web crawler technology - Google Patents

Method for acquiring data information of security equipment by using web crawler technology Download PDF

Info

Publication number
CN113254744A
CN113254744A CN202110444788.5A CN202110444788A CN113254744A CN 113254744 A CN113254744 A CN 113254744A CN 202110444788 A CN202110444788 A CN 202110444788A CN 113254744 A CN113254744 A CN 113254744A
Authority
CN
China
Prior art keywords
login
webpage
web crawler
resttemplate
browser
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110444788.5A
Other languages
Chinese (zh)
Inventor
司磊
李刚
韩文善
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cyberspace Great Wall System Application Guangdong Co ltd
Original Assignee
Cyberspace Great Wall System Application Guangdong Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cyberspace Great Wall System Application Guangdong Co ltd filed Critical Cyberspace Great Wall System Application Guangdong Co ltd
Priority to CN202110444788.5A priority Critical patent/CN113254744A/en
Publication of CN113254744A publication Critical patent/CN113254744A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention belongs to the technical field of computer software, in particular to a method for acquiring data information of safety equipment by using a web crawler technology, which comprises the steps of firstly sending an acquisition request to a webpage to be acquired, then setting login-related entity parameters of the webpage to be acquired by using an exchange method in a Resttemplate tool, then linking the webpage to be acquired by using the Resttemplate tool, simulating login, linking to the webpage to be acquired, calling preset entity parameters by the Resttemplate tool to perform login authentication operation, crawling webpage data after login and storing the webpage data in a local database, finally counting and displaying the data in the local database to a front-end page, bypassing the identity authentication limit by the mode, crawling the wanted data information without performing identity login authentication by using the crawler technology, avoiding the defect of manual participation, and obtaining a login verification interface without secondary development, the development workload is reduced, and the cost is reduced.

Description

Method for acquiring data information of security equipment by using web crawler technology
[ technical field ] A method for producing a semiconductor device
The invention belongs to the technical field of computer software, and particularly relates to a method for acquiring data information of security equipment by using a web crawler technology.
[ background of the invention ]
The web crawler is used as a common tool for crawling network information in a network, but some websites limit the web crawler and can acquire information only by identity authentication, if the web crawler is used for directly acquiring the websites, a user login page which is usually jumped to after user information authentication fails is obtained, but not page content which is actually required to be acquired, when the user wants to bypass identity login authentication to crawl required data, an interface which does not need login authentication is provided at present, but a docking system needs to be developed secondarily, and the implementation is difficult.
[ summary of the invention ]
The invention provides a method for acquiring data information of security equipment by using a web crawler technology, and aims to solve the problem that information acquisition can be performed only by identity authentication when the web crawler is used in the prior art.
The invention is realized by the following technical scheme:
a method for acquiring data information of a security device by using a web crawler technology comprises the following steps:
step S1, sending an acquisition request to a page to be acquired;
step S2, using the exchange method in the RestTemplate tool to set the entity parameters related to login of the webpage to be collected, then linking the webpage through the RestTemplate tool and performing simulated login, wherein the method comprises the following operations:
step S21, linking to the web page to be collected;
step S22, calling preset entity parameters by a RestTemplate tool to perform login authentication operation, crawling webpage data after login and storing the webpage data in a local database;
step S23, count and display the data in the local database to the front page.
In step S22, after the simulated login is completed, the entity parameters are written in the Cookie of the browser, and the Cookie is recorded in the http messages variable.
According to the method for acquiring the data information of the safety equipment by using the web crawler technology, login is simulated by using a RestTemplate tool to bypass authentication limitation, information crawling is performed on the linked page by using the crawler technology, and the webpage data is cleaned to acquire the required original data.
A method for acquiring data information of a security device by using web crawler technology as described above, in step S22, the following steps are executed:
the web crawler calls a browser to access a webpage API, and a website login address to be accessed is transmitted to the browser;
the method comprises the following steps that a website login webpage is loaded by a browser, a web crawler calls an acquisition webpage API of the browser, and html content of the webpage is acquired;
searching entity parameters related to login by html content obtained by web crawler analysis, calling a submission form API of a browser, and submitting verification information to a website for verification;
and after the submitted verification information is successfully authenticated, the web crawler calls the browser to obtain a cookie interface, and cookie information of the site is obtained and stored through the cookie interface.
In step S22, the method for obtaining data information of a security device using web crawler technology sends an http request to access a site, sets the obtained cookie information in the http request, and crawls web page data of the site without logging in the access site before the cookie fails.
The method for acquiring the data information of the security device by using the web crawler technology is described above, wherein the login-related entity parameters comprise a user name and a password.
Compared with the prior art, the invention has the following advantages:
the invention provides a method for acquiring data information of safety equipment by using a web crawler technology, which comprises the steps of firstly sending an acquisition request to a webpage to be acquired, then setting login related entity parameters of the webpage to be acquired by using an exchange method in a Resttemplate tool, then linking the web pages through a RestTemplate tool, carrying out simulated login, linking the web pages to be collected, calling preset entity parameters by the RestTemplate tool to carry out login authentication operation, crawling the webpage data after logging in and storing the webpage data into a local database, counting and displaying the data in the local database to a front-end page, by the method, the limitation of identity authentication is bypassed, the data information which is wanted by people can be crawled without identity login authentication through a crawler technology, the defect of manual participation is avoided, and a login verification interface is not required to be obtained through secondary development, so that the development workload is reduced, and the cost is reduced.
[ description of the drawings ]
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below.
FIG. 1 is a schematic flow chart of the method of the present invention.
[ detailed description ] embodiments
In order to make the technical problems, technical solutions and advantageous effects solved by the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
When embodiments of the present invention refer to the ordinal numbers "first", "second", etc., it should be understood that the words are used for distinguishing between them unless the context clearly dictates otherwise.
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
In a specific embodiment, as shown in fig. 1, a method for obtaining data information of a security device by using a web crawler technology includes the following steps:
step S1, sending an acquisition request to a page to be acquired;
step S2, using the exchange method in the RestTemplate tool to set the entity parameters related to login of the webpage to be collected, then linking the webpage through the RestTemplate tool and performing simulated login, wherein the method comprises the following operations:
step S21, linking to the web page to be collected;
step S22, calling preset entity parameters by a RestTemplate tool to perform login authentication operation, crawling webpage data after login and storing the webpage data in a local database;
step S23, count and display the data in the local database to the front page. Accessing third party REST services in a Spring application is related to using a Spring RestTemplate class. The design principle of the RestTemplate class is the same as that of many other Spring-related template classes, such as JdbcTemplate and JmsTemplate, and provides a simplified method with default behavior for executing complex tasks. RestTemplate relies by default on the ability of JDK to provide http connections, such as http urlconnection, and may be replaced by setrequest factory methods if necessary with other http libraries, such as Apache http libraries, Netty or OkHttp. The RestTemplate is a client side provided by Spring and used for accessing the Rest service, provides various methods for conveniently accessing the remote Http service, is simpler and more convenient compared with the traditional Http client side of apache, and can greatly improve the writing efficiency of the client side by adopting the RestTemplate. The RestTemplate provided by the spring framework can be used for calling rest service in application, simplifies the communication mode with http service, unifies the standard of RESTful, encapsulates http links, and only needs to transmit url and return value types.
Specifically, in step S22, after the simulated login is completed, the entity parameters are written in the Cookie of the browser, and the Cookie is recorded in the http headers variable;
more specifically, a RestTemplate tool is used for simulating login to bypass authentication limitation, information crawling is carried out on a linked page through a crawler technology, and webpage data are cleaned to obtain required original data;
further, in step S22, the following steps are performed:
the web crawler calls a browser to access a webpage API, and a website login address to be accessed is transmitted to the browser;
the method comprises the following steps that a website login webpage is loaded by a browser, a web crawler calls an acquisition webpage API of the browser, and html content of the webpage is acquired;
searching entity parameters related to login by html content obtained by web crawler analysis, calling a submission form API of a browser, and submitting verification information to a website for verification;
and after the submitted verification information is successfully authenticated, the web crawler calls the browser to obtain a cookie interface, and cookie information of the site is obtained and stored through the cookie interface.
Further, in step S22, an http request is sent to access the site, the obtained cookie information is set in the http request, and the site is exempted from login before the cookie expires, so as to crawl the web page data of the site.
Specifically, the entity parameters related to login include a user name, a password, a key, an identity card, and the like.
The invention relates to a method for acquiring data information of safety equipment by using a web crawler technology, which comprises the steps of firstly sending an acquisition request to a webpage to be acquired, then setting login related entity parameters of the webpage to be acquired by using an exchange method in a RestTemplate tool, then linking the web pages through a RestTemplate tool, carrying out simulated login, linking the web pages to be collected, calling preset entity parameters by the RestTemplate tool to carry out login authentication operation, crawling the webpage data after logging in and storing the webpage data into a local database, counting and displaying the data in the local database to a front-end page, by the method, the limitation of identity authentication is bypassed, the data information which is wanted by people can be crawled without identity login authentication through a crawler technology, the defect of manual participation is avoided, and a login verification interface is not required to be obtained through secondary development, so that the development workload is reduced, and the cost is reduced.
The above description is provided as an embodiment of the present invention, and the embodiments of the present invention are not limited to the description, and the present invention is not limited to the above nomenclature and the English nomenclature because the trade names are different. Similar or identical methods, structures and the like as those of the present invention or several technical deductions or substitutions made on the premise of the conception of the present invention should be considered as the protection scope of the present invention.

Claims (6)

1. A method for acquiring data information of a security device by using a web crawler technology is characterized by comprising the following steps: the method comprises the following steps:
step S1, sending an acquisition request to a page to be acquired;
step S2, using the exchange method in the RestTemplate tool to set the entity parameters related to login of the webpage to be collected, then linking the webpage through the RestTemplate tool and performing simulated login, wherein the method comprises the following operations:
step S21, linking to the web page to be collected;
step S22, calling preset entity parameters by a RestTemplate tool to perform login authentication operation, crawling webpage data after login and storing the webpage data in a local database;
step S23, count and display the data in the local database to the front page.
2. The method for acquiring data information of a security device by using web crawler technology as claimed in claim 1, wherein: in step S22, after the simulated login is completed, the entity parameters are written in the Cookie of the browser, and the Cookie is recorded in the http headers variable.
3. The method for acquiring data information of a security device by using web crawler technology as claimed in claim 2, wherein: and simulating login by using a RestTemplate tool to bypass authentication limitation, performing information crawling on the linked page by using a crawler technology, and cleaning the webpage data to acquire the required original data.
4. The method for acquiring data information of a security device by using web crawler technology as claimed in claim 3, wherein: in step S22, the following steps are performed:
the web crawler calls a browser to access a webpage API, and a website login address to be accessed is transmitted to the browser;
the method comprises the following steps that a website login webpage is loaded by a browser, a web crawler calls an acquisition webpage API of the browser, and html content of the webpage is acquired;
searching entity parameters related to login by html content obtained by web crawler analysis, calling a submission form API of a browser, and submitting verification information to a website for verification;
and after the submitted verification information is successfully authenticated, the web crawler calls the browser to obtain a cookie interface, and cookie information of the site is obtained and stored through the cookie interface.
5. The method for acquiring data information of a security device by using web crawler technology as claimed in claim 4, wherein: in step S22, an http request is sent to access the site, the obtained cookie information is set in the http request, and the site is accessed without logging in before the cookie expires, and web page data of the site is crawled.
6. A method for acquiring data information of a security device using web crawler technology according to any of claims 1-5, wherein: the login related entity parameters include a user name and a password.
CN202110444788.5A 2021-04-24 2021-04-24 Method for acquiring data information of security equipment by using web crawler technology Pending CN113254744A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110444788.5A CN113254744A (en) 2021-04-24 2021-04-24 Method for acquiring data information of security equipment by using web crawler technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110444788.5A CN113254744A (en) 2021-04-24 2021-04-24 Method for acquiring data information of security equipment by using web crawler technology

Publications (1)

Publication Number Publication Date
CN113254744A true CN113254744A (en) 2021-08-13

Family

ID=77222344

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110444788.5A Pending CN113254744A (en) 2021-04-24 2021-04-24 Method for acquiring data information of security equipment by using web crawler technology

Country Status (1)

Country Link
CN (1) CN113254744A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070208703A1 (en) * 2006-03-03 2007-09-06 Microsoft Corporation Web forum crawler
CN105631030A (en) * 2015-12-30 2016-06-01 福建亿榕信息技术有限公司 Universal web crawler login simulation method and system
CN108287813A (en) * 2018-02-26 2018-07-17 北京奇艺世纪科技有限公司 A kind of information submits method, apparatus and electronic equipment
WO2019019344A1 (en) * 2017-07-26 2019-01-31 上海壹账通金融科技有限公司 Webpage data crawling method and device, user terminal, and readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070208703A1 (en) * 2006-03-03 2007-09-06 Microsoft Corporation Web forum crawler
CN105631030A (en) * 2015-12-30 2016-06-01 福建亿榕信息技术有限公司 Universal web crawler login simulation method and system
WO2019019344A1 (en) * 2017-07-26 2019-01-31 上海壹账通金融科技有限公司 Webpage data crawling method and device, user terminal, and readable storage medium
CN108287813A (en) * 2018-02-26 2018-07-17 北京奇艺世纪科技有限公司 A kind of information submits method, apparatus and electronic equipment

Similar Documents

Publication Publication Date Title
CN103023710B (en) A kind of safety test system and method
CN110442326B (en) Method and system for simplifying front-end and back-end separation authority control based on Vue
JP3992250B2 (en) Communication control method and apparatus
CN103888490B (en) A kind of man-machine knowledge method for distinguishing of full automatic WEB client side
CN102868719B (en) A kind of Network Access Method based on buffer memory and server
CN102143016B (en) Website automation test method and system
CN102752300B (en) Dynamic antitheft link system and dynamic antitheft link method
CN109688280A (en) Request processing method, request processing equipment, browser and storage medium
WO2013111027A1 (en) Dynamically scanning a web application through use of web traffic information
JPH10198616A (en) Network system with distributed log batch management function
CN107070945A (en) Identity logs method and apparatus
US7024691B1 (en) User policy for trusting web sites
US20170093828A1 (en) System and method for detecting whether automatic login to a website has succeeded
US8407766B1 (en) Method and apparatus for monitoring sensitive data on a computer network
CN103490896B (en) Multi-user website automatic logger and achieving method thereof
CN101355587A (en) Method and apparatus for obtaining URL information as well as method and system for implementing searching engine
CN100486196C (en) Method for realizing cross-domain access by using local domain proxy server
CN105915529A (en) Message generation method and device
CN105959278B (en) A kind of method, apparatus and system for calling VPN
WO2019026172A1 (en) Security diagnostic device and security diagnostic method
CN112039888B (en) Domain name access control access method, device, equipment and medium
CN113254744A (en) Method for acquiring data information of security equipment by using web crawler technology
US20140304097A1 (en) Method & System for the automated population of data fields, with personal information, in enrollment/registration forms of service providers
JP4400787B2 (en) Web access monitoring system and administrator client computer
US9723017B1 (en) Method, apparatus and computer program product for detecting risky communications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210813