CN113254744A - Method for acquiring data information of security equipment by using web crawler technology - Google Patents
Method for acquiring data information of security equipment by using web crawler technology Download PDFInfo
- Publication number
- CN113254744A CN113254744A CN202110444788.5A CN202110444788A CN113254744A CN 113254744 A CN113254744 A CN 113254744A CN 202110444788 A CN202110444788 A CN 202110444788A CN 113254744 A CN113254744 A CN 113254744A
- Authority
- CN
- China
- Prior art keywords
- login
- webpage
- web crawler
- resttemplate
- browser
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 238000005516 engineering process Methods 0.000 title claims abstract description 25
- 238000012795 verification Methods 0.000 claims abstract description 12
- 230000009193 crawling Effects 0.000 claims abstract description 11
- 235000014510 cooky Nutrition 0.000 claims description 21
- 238000004458 analytical method Methods 0.000 claims description 3
- 238000004140 cleaning Methods 0.000 claims 1
- 238000011161 development Methods 0.000 abstract description 6
- 230000007547 defect Effects 0.000 abstract description 3
- 238000004891 communication Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000003032 molecular docking Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention belongs to the technical field of computer software, in particular to a method for acquiring data information of safety equipment by using a web crawler technology, which comprises the steps of firstly sending an acquisition request to a webpage to be acquired, then setting login-related entity parameters of the webpage to be acquired by using an exchange method in a Resttemplate tool, then linking the webpage to be acquired by using the Resttemplate tool, simulating login, linking to the webpage to be acquired, calling preset entity parameters by the Resttemplate tool to perform login authentication operation, crawling webpage data after login and storing the webpage data in a local database, finally counting and displaying the data in the local database to a front-end page, bypassing the identity authentication limit by the mode, crawling the wanted data information without performing identity login authentication by using the crawler technology, avoiding the defect of manual participation, and obtaining a login verification interface without secondary development, the development workload is reduced, and the cost is reduced.
Description
[ technical field ] A method for producing a semiconductor device
The invention belongs to the technical field of computer software, and particularly relates to a method for acquiring data information of security equipment by using a web crawler technology.
[ background of the invention ]
The web crawler is used as a common tool for crawling network information in a network, but some websites limit the web crawler and can acquire information only by identity authentication, if the web crawler is used for directly acquiring the websites, a user login page which is usually jumped to after user information authentication fails is obtained, but not page content which is actually required to be acquired, when the user wants to bypass identity login authentication to crawl required data, an interface which does not need login authentication is provided at present, but a docking system needs to be developed secondarily, and the implementation is difficult.
[ summary of the invention ]
The invention provides a method for acquiring data information of security equipment by using a web crawler technology, and aims to solve the problem that information acquisition can be performed only by identity authentication when the web crawler is used in the prior art.
The invention is realized by the following technical scheme:
a method for acquiring data information of a security device by using a web crawler technology comprises the following steps:
step S1, sending an acquisition request to a page to be acquired;
step S2, using the exchange method in the RestTemplate tool to set the entity parameters related to login of the webpage to be collected, then linking the webpage through the RestTemplate tool and performing simulated login, wherein the method comprises the following operations:
step S21, linking to the web page to be collected;
step S22, calling preset entity parameters by a RestTemplate tool to perform login authentication operation, crawling webpage data after login and storing the webpage data in a local database;
step S23, count and display the data in the local database to the front page.
In step S22, after the simulated login is completed, the entity parameters are written in the Cookie of the browser, and the Cookie is recorded in the http messages variable.
According to the method for acquiring the data information of the safety equipment by using the web crawler technology, login is simulated by using a RestTemplate tool to bypass authentication limitation, information crawling is performed on the linked page by using the crawler technology, and the webpage data is cleaned to acquire the required original data.
A method for acquiring data information of a security device by using web crawler technology as described above, in step S22, the following steps are executed:
the web crawler calls a browser to access a webpage API, and a website login address to be accessed is transmitted to the browser;
the method comprises the following steps that a website login webpage is loaded by a browser, a web crawler calls an acquisition webpage API of the browser, and html content of the webpage is acquired;
searching entity parameters related to login by html content obtained by web crawler analysis, calling a submission form API of a browser, and submitting verification information to a website for verification;
and after the submitted verification information is successfully authenticated, the web crawler calls the browser to obtain a cookie interface, and cookie information of the site is obtained and stored through the cookie interface.
In step S22, the method for obtaining data information of a security device using web crawler technology sends an http request to access a site, sets the obtained cookie information in the http request, and crawls web page data of the site without logging in the access site before the cookie fails.
The method for acquiring the data information of the security device by using the web crawler technology is described above, wherein the login-related entity parameters comprise a user name and a password.
Compared with the prior art, the invention has the following advantages:
the invention provides a method for acquiring data information of safety equipment by using a web crawler technology, which comprises the steps of firstly sending an acquisition request to a webpage to be acquired, then setting login related entity parameters of the webpage to be acquired by using an exchange method in a Resttemplate tool, then linking the web pages through a RestTemplate tool, carrying out simulated login, linking the web pages to be collected, calling preset entity parameters by the RestTemplate tool to carry out login authentication operation, crawling the webpage data after logging in and storing the webpage data into a local database, counting and displaying the data in the local database to a front-end page, by the method, the limitation of identity authentication is bypassed, the data information which is wanted by people can be crawled without identity login authentication through a crawler technology, the defect of manual participation is avoided, and a login verification interface is not required to be obtained through secondary development, so that the development workload is reduced, and the cost is reduced.
[ description of the drawings ]
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below.
FIG. 1 is a schematic flow chart of the method of the present invention.
[ detailed description ] embodiments
In order to make the technical problems, technical solutions and advantageous effects solved by the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
When embodiments of the present invention refer to the ordinal numbers "first", "second", etc., it should be understood that the words are used for distinguishing between them unless the context clearly dictates otherwise.
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
In a specific embodiment, as shown in fig. 1, a method for obtaining data information of a security device by using a web crawler technology includes the following steps:
step S1, sending an acquisition request to a page to be acquired;
step S2, using the exchange method in the RestTemplate tool to set the entity parameters related to login of the webpage to be collected, then linking the webpage through the RestTemplate tool and performing simulated login, wherein the method comprises the following operations:
step S21, linking to the web page to be collected;
step S22, calling preset entity parameters by a RestTemplate tool to perform login authentication operation, crawling webpage data after login and storing the webpage data in a local database;
step S23, count and display the data in the local database to the front page. Accessing third party REST services in a Spring application is related to using a Spring RestTemplate class. The design principle of the RestTemplate class is the same as that of many other Spring-related template classes, such as JdbcTemplate and JmsTemplate, and provides a simplified method with default behavior for executing complex tasks. RestTemplate relies by default on the ability of JDK to provide http connections, such as http urlconnection, and may be replaced by setrequest factory methods if necessary with other http libraries, such as Apache http libraries, Netty or OkHttp. The RestTemplate is a client side provided by Spring and used for accessing the Rest service, provides various methods for conveniently accessing the remote Http service, is simpler and more convenient compared with the traditional Http client side of apache, and can greatly improve the writing efficiency of the client side by adopting the RestTemplate. The RestTemplate provided by the spring framework can be used for calling rest service in application, simplifies the communication mode with http service, unifies the standard of RESTful, encapsulates http links, and only needs to transmit url and return value types.
Specifically, in step S22, after the simulated login is completed, the entity parameters are written in the Cookie of the browser, and the Cookie is recorded in the http headers variable;
more specifically, a RestTemplate tool is used for simulating login to bypass authentication limitation, information crawling is carried out on a linked page through a crawler technology, and webpage data are cleaned to obtain required original data;
further, in step S22, the following steps are performed:
the web crawler calls a browser to access a webpage API, and a website login address to be accessed is transmitted to the browser;
the method comprises the following steps that a website login webpage is loaded by a browser, a web crawler calls an acquisition webpage API of the browser, and html content of the webpage is acquired;
searching entity parameters related to login by html content obtained by web crawler analysis, calling a submission form API of a browser, and submitting verification information to a website for verification;
and after the submitted verification information is successfully authenticated, the web crawler calls the browser to obtain a cookie interface, and cookie information of the site is obtained and stored through the cookie interface.
Further, in step S22, an http request is sent to access the site, the obtained cookie information is set in the http request, and the site is exempted from login before the cookie expires, so as to crawl the web page data of the site.
Specifically, the entity parameters related to login include a user name, a password, a key, an identity card, and the like.
The invention relates to a method for acquiring data information of safety equipment by using a web crawler technology, which comprises the steps of firstly sending an acquisition request to a webpage to be acquired, then setting login related entity parameters of the webpage to be acquired by using an exchange method in a RestTemplate tool, then linking the web pages through a RestTemplate tool, carrying out simulated login, linking the web pages to be collected, calling preset entity parameters by the RestTemplate tool to carry out login authentication operation, crawling the webpage data after logging in and storing the webpage data into a local database, counting and displaying the data in the local database to a front-end page, by the method, the limitation of identity authentication is bypassed, the data information which is wanted by people can be crawled without identity login authentication through a crawler technology, the defect of manual participation is avoided, and a login verification interface is not required to be obtained through secondary development, so that the development workload is reduced, and the cost is reduced.
The above description is provided as an embodiment of the present invention, and the embodiments of the present invention are not limited to the description, and the present invention is not limited to the above nomenclature and the English nomenclature because the trade names are different. Similar or identical methods, structures and the like as those of the present invention or several technical deductions or substitutions made on the premise of the conception of the present invention should be considered as the protection scope of the present invention.
Claims (6)
1. A method for acquiring data information of a security device by using a web crawler technology is characterized by comprising the following steps: the method comprises the following steps:
step S1, sending an acquisition request to a page to be acquired;
step S2, using the exchange method in the RestTemplate tool to set the entity parameters related to login of the webpage to be collected, then linking the webpage through the RestTemplate tool and performing simulated login, wherein the method comprises the following operations:
step S21, linking to the web page to be collected;
step S22, calling preset entity parameters by a RestTemplate tool to perform login authentication operation, crawling webpage data after login and storing the webpage data in a local database;
step S23, count and display the data in the local database to the front page.
2. The method for acquiring data information of a security device by using web crawler technology as claimed in claim 1, wherein: in step S22, after the simulated login is completed, the entity parameters are written in the Cookie of the browser, and the Cookie is recorded in the http headers variable.
3. The method for acquiring data information of a security device by using web crawler technology as claimed in claim 2, wherein: and simulating login by using a RestTemplate tool to bypass authentication limitation, performing information crawling on the linked page by using a crawler technology, and cleaning the webpage data to acquire the required original data.
4. The method for acquiring data information of a security device by using web crawler technology as claimed in claim 3, wherein: in step S22, the following steps are performed:
the web crawler calls a browser to access a webpage API, and a website login address to be accessed is transmitted to the browser;
the method comprises the following steps that a website login webpage is loaded by a browser, a web crawler calls an acquisition webpage API of the browser, and html content of the webpage is acquired;
searching entity parameters related to login by html content obtained by web crawler analysis, calling a submission form API of a browser, and submitting verification information to a website for verification;
and after the submitted verification information is successfully authenticated, the web crawler calls the browser to obtain a cookie interface, and cookie information of the site is obtained and stored through the cookie interface.
5. The method for acquiring data information of a security device by using web crawler technology as claimed in claim 4, wherein: in step S22, an http request is sent to access the site, the obtained cookie information is set in the http request, and the site is accessed without logging in before the cookie expires, and web page data of the site is crawled.
6. A method for acquiring data information of a security device using web crawler technology according to any of claims 1-5, wherein: the login related entity parameters include a user name and a password.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110444788.5A CN113254744A (en) | 2021-04-24 | 2021-04-24 | Method for acquiring data information of security equipment by using web crawler technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110444788.5A CN113254744A (en) | 2021-04-24 | 2021-04-24 | Method for acquiring data information of security equipment by using web crawler technology |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113254744A true CN113254744A (en) | 2021-08-13 |
Family
ID=77222344
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110444788.5A Pending CN113254744A (en) | 2021-04-24 | 2021-04-24 | Method for acquiring data information of security equipment by using web crawler technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113254744A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070208703A1 (en) * | 2006-03-03 | 2007-09-06 | Microsoft Corporation | Web forum crawler |
CN105631030A (en) * | 2015-12-30 | 2016-06-01 | 福建亿榕信息技术有限公司 | Universal web crawler login simulation method and system |
CN108287813A (en) * | 2018-02-26 | 2018-07-17 | 北京奇艺世纪科技有限公司 | A kind of information submits method, apparatus and electronic equipment |
WO2019019344A1 (en) * | 2017-07-26 | 2019-01-31 | 上海壹账通金融科技有限公司 | Webpage data crawling method and device, user terminal, and readable storage medium |
-
2021
- 2021-04-24 CN CN202110444788.5A patent/CN113254744A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070208703A1 (en) * | 2006-03-03 | 2007-09-06 | Microsoft Corporation | Web forum crawler |
CN105631030A (en) * | 2015-12-30 | 2016-06-01 | 福建亿榕信息技术有限公司 | Universal web crawler login simulation method and system |
WO2019019344A1 (en) * | 2017-07-26 | 2019-01-31 | 上海壹账通金融科技有限公司 | Webpage data crawling method and device, user terminal, and readable storage medium |
CN108287813A (en) * | 2018-02-26 | 2018-07-17 | 北京奇艺世纪科技有限公司 | A kind of information submits method, apparatus and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103023710B (en) | A kind of safety test system and method | |
CN110442326B (en) | Method and system for simplifying front-end and back-end separation authority control based on Vue | |
JP3992250B2 (en) | Communication control method and apparatus | |
CN103888490B (en) | A kind of man-machine knowledge method for distinguishing of full automatic WEB client side | |
CN102868719B (en) | A kind of Network Access Method based on buffer memory and server | |
CN102143016B (en) | Website automation test method and system | |
CN102752300B (en) | Dynamic antitheft link system and dynamic antitheft link method | |
CN109688280A (en) | Request processing method, request processing equipment, browser and storage medium | |
WO2013111027A1 (en) | Dynamically scanning a web application through use of web traffic information | |
JPH10198616A (en) | Network system with distributed log batch management function | |
CN107070945A (en) | Identity logs method and apparatus | |
US7024691B1 (en) | User policy for trusting web sites | |
US20170093828A1 (en) | System and method for detecting whether automatic login to a website has succeeded | |
US8407766B1 (en) | Method and apparatus for monitoring sensitive data on a computer network | |
CN103490896B (en) | Multi-user website automatic logger and achieving method thereof | |
CN101355587A (en) | Method and apparatus for obtaining URL information as well as method and system for implementing searching engine | |
CN100486196C (en) | Method for realizing cross-domain access by using local domain proxy server | |
CN105915529A (en) | Message generation method and device | |
CN105959278B (en) | A kind of method, apparatus and system for calling VPN | |
WO2019026172A1 (en) | Security diagnostic device and security diagnostic method | |
CN112039888B (en) | Domain name access control access method, device, equipment and medium | |
CN113254744A (en) | Method for acquiring data information of security equipment by using web crawler technology | |
US20140304097A1 (en) | Method & System for the automated population of data fields, with personal information, in enrollment/registration forms of service providers | |
JP4400787B2 (en) | Web access monitoring system and administrator client computer | |
US9723017B1 (en) | Method, apparatus and computer program product for detecting risky communications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210813 |