CN110781367B - Internet data acquisition method and system based on middleman - Google Patents
Internet data acquisition method and system based on middleman Download PDFInfo
- Publication number
- CN110781367B CN110781367B CN201910909270.7A CN201910909270A CN110781367B CN 110781367 B CN110781367 B CN 110781367B CN 201910909270 A CN201910909270 A CN 201910909270A CN 110781367 B CN110781367 B CN 110781367B
- Authority
- CN
- China
- Prior art keywords
- webpage
- acquisition
- task
- page
- analyzed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
- G06F16/9566—URL specific, e.g. using aliases, detecting broken or misspelled links
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention provides an internet data acquisition method and system based on a middleman, comprising the following steps: the method comprises the steps that a man-in-the-middle proxy certificate is installed to a webpage information acquisition device, a man-in-the-middle of the webpage information acquisition device is established, and when the webpage information acquisition device accesses webpage information in the Internet, the man-in-the-middle proxy agent agents all network flow of the webpage information acquisition device; acquiring an acquisition task containing a URL regular expression of a webpage to be acquired by an intermediate person, capturing the flows which accord with the URL regular expression in all network flows, taking the acquired flows as intermediate flows, injecting the acquisition task into an HTML page of the intermediate flows, obtaining a page to be analyzed, and storing the page to be analyzed into a first database; and the analysis module distributes the page to be analyzed to the analyzer instance for analysis according to the URL information of the page to be analyzed in the first database, acquires a webpage acquisition result containing the structured data from the analysis module and stores the webpage acquisition result into the second database. The invention can support the data acquisition of all applications which rely on the integrated browser kernel function to provide information.
Description
Technical Field
The invention relates to the field of web crawlers, in particular to a data acquisition method and system based on man-in-the-middle attack, which can continuously inject different task codes into different application programs in a manner of modifying flow data attack by man-in-the-middle agent to complete requests for different pages and acquire related data.
Background
The web crawler can effectively use various existing resources to automatically capture a large amount of web page information in the internet, and is sometimes called as a web Spider (Spider). However, with the popularization of the mobile internet, more traffic is directly distributed through various different terminal applications, and the WEB access is not provided or is limited, so that great difficulty is brought to data acquisition.
In the crawling process of the WEB crawler, the method comprises the steps of acquiring a request URL, sending a WEB request to download a page, analyzing structured data from the WEB page, filtering repeated data and processing seed tasks, wherein the total number of links is 5, each link consumes resources differently, and the efficiency and stability of the whole crawler system can be influenced when each link has a problem. In addition, with the growing trend of internet technology, more and more information can not be obtained through traditional WEB channels, a large amount of information is spread through specific application programs, such as mobile information application, and the like, and a large amount of data is asynchronous request and encrypted data by using HTTPS, so that a general data acquisition system compatible with various applications and various data is lacking.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides an internet data acquisition method based on a middleman, which comprises the following steps:
step 1, establishing an intermediate person of the webpage information acquisition equipment by installing an intermediate person proxy certificate to the webpage information acquisition equipment, wherein the intermediate person proxies all network traffic of the webpage information acquisition equipment when the webpage information acquisition equipment accesses webpage information in the Internet;
step 2, the man-in-the-middle acquires an acquisition task containing a URL regular expression of a webpage to be acquired, captures the flow conforming to the URL regular expression in all network flows, uses the flow as an intermediate flow, injects the acquisition task into an HTML page of the intermediate flow, and obtains the page to be analyzed and stores the page into a first database;
and 3, distributing the page to be analyzed to an analyzer instance for analysis by the analysis module according to the URL information of the page to be analyzed in the first database, acquiring a webpage acquisition result containing the structured data from the webpage acquisition result and storing the webpage acquisition result into the second database.
The internet data acquisition method based on the middleman comprises the following steps: and the man-in-the-middle decrypts the encrypted content in the network traffic according to the HTTPS security certificate configured by the webpage information acquisition equipment.
The internet data acquisition method based on the middleman, wherein the generation process of the acquisition task in the step 2 comprises the following steps: and generating the acquisition task according to the pre-configured seed information, or generating a new acquisition task according to the acquired webpage acquisition result.
The internet data acquisition method based on the middleman comprises the following steps: intercepting part of HTTP/HTTPS requests according to the configured URL regular expression, and returning empty content so as to improve acquisition efficiency.
The internet data acquisition method based on the middleman comprises the following steps of: an HTML page acquisition task and a dynamic content acquisition task; the HTML page acquisition task comprises a jump code, and jumps to the URL to be acquired next time; the dynamic content acquisition task not only comprises a jump code, but also comprises the step of acquiring corresponding interface parameters by using JavaScript codes and the step of using JavaScript codes of acquired pages.
The invention also provides an internet data acquisition system based on the middleman, which comprises:
the method comprises the steps that 1, an intermediate person of the webpage information acquisition equipment is established by installing an intermediate person proxy certificate to the webpage information acquisition equipment, and when the webpage information acquisition equipment accesses webpage information in the Internet, the intermediate person proxies all network flow of the webpage information acquisition equipment;
the module 2, the man-in-the-middle acquires an acquisition task containing a URL regular expression of a webpage to be acquired, captures the flow which accords with the URL regular expression in all network flows, uses the flow as an intermediate flow, injects the acquisition task into an HTML page of the intermediate flow, and obtains the page to be analyzed and stores the page into a first database;
and the module 3, the analysis module distributes the page to be analyzed to the analyzer instance for analysis according to the URL information of the page to be analyzed in the first database, acquires the webpage acquisition result containing the structured data from the analysis module and stores the webpage acquisition result into the second database.
The internet data acquisition system based on the middleman, wherein the module 2 comprises: and the man-in-the-middle decrypts the encrypted content in the network traffic according to the HTTPS security certificate configured by the webpage information acquisition equipment.
The internet data acquisition system based on the middleman, wherein the generation process of the acquisition task in the module 2 comprises the following steps: and generating the acquisition task according to the pre-configured seed information, or generating a new acquisition task according to the acquired webpage acquisition result.
The internet data acquisition system based on the middleman, wherein the module 2 comprises: intercepting part of HTTP/HTTPS requests according to the configured URL regular expression, and returning empty content so as to improve acquisition efficiency.
The internet data acquisition system based on the middleman comprises the following acquisition tasks in a module 2: an HTML page acquisition task and a dynamic content acquisition task; the HTML page acquisition task comprises a jump code, and jumps to the URL to be acquired next time; the dynamic content acquisition task not only comprises a jump code, but also comprises the step of acquiring corresponding interface parameters by using JavaScript codes and the step of using JavaScript codes of acquired pages.
The advantages of the invention are as follows:
the invention provides a data acquisition method and a system based on man-in-the-middle attack, which can support the data acquisition of all applications which provide information by means of integrating the kernel functions of a browser, and comprise various webpage request modes, and the analysis and configuration of structured data are flexible. The system has modularized and functional acquisition process, greatly improves the data grabbing efficiency, and greatly reduces the difficulty in data acquisition of various application programs.
The invention uses the man-in-the-middle attack technology in the data acquisition system, modularizes each processing link and has single function of each module, thereby improving the working efficiency of the whole system and enabling the horizontal expansion of the system to be more convenient and simpler. On the other hand, the invention introduces the Redis message queue for decoupling among modules, and the Redis also has the characteristics of high throughput, high availability and easy expansion, so the efficiency and the stability of the invention are greatly improved by introducing the Redis storage medium.
Drawings
FIG. 1 is a block diagram of a crawler system according to an embodiment of the present invention.
Detailed Description
In order to make the above features and effects of the present invention more clearly understood, the following specific examples are given with reference to the accompanying drawings.
The invention introduces an Anyproxy proxy tool to proxy all HTTP/HTTPS traffic of the client application, and ensures the decryption of HTTPS encrypted data by a method of installing a security certificate on the corresponding acquisition equipment in advance.
The technical scheme of the invention is as follows:
a data acquisition method based on man-in-the-middle attack comprises the following steps:
1) The part of the application and the equipment needing to be acquired is an acquisition main body, but only the agent of the man in the middle is configured, the agent certificate of the man in the middle is installed, and any page initialization is accessed on the application needing to be acquired.
2) The man-in-the-middle agent module is mainly responsible for implementing man-in-the-middle attack and comprises the following main contents:
a) Intercepting requests for filtering invalid traffic, including but not limited to CSS files, javaScript files, picture files, and other resource files.
b) The encrypted content is decrypted using a pre-configured HTTPS certificate.
c) Capturing the flow to be acquired according to the URL regular expression and storing the captured flow into a Redis database cache.
d) And requesting a new task from the task scheduling module, and injecting the task into the HTML page conforming to the URL regular expression. The tasks include web pages to be crawled next, but because of the client's closeness, it is difficult for the client to get to the new task directly, so the present invention communicates the task to the client in the form of a web page injected back to the client.
3) The analysis module distributes different pages to different analyzer examples for analysis according to URL information of Response data, structured data are obtained from the analysis module and stored in the MongoDB database, and data related to new tasks are stored in the Redis database for the task scheduling module to use to generate new tasks. Where Response data refers to the Response data of the application server to the client, this data is returned by the server.
4) The task scheduling module generates task content according to the basic information preconfigured according to the seed task and the like, and can also generate a new task according to the data analyzed by the analysis module; and de-duplicating the task according to the URL of the task, and recovering the failure of the task.
5) The data storage module is mainly responsible for storing related data and decoupling the functions of the modules. The decoupling is embodied in that the data storage module can record the data required to be written by other functional modules and can also enable the other functional modules to read the data required by the other functional modules, so that each functional module does not need to directly interact with the other functional modules and only needs to share the data storage module.
Further, step 1) mainly installs and configures the network address, the network port and the security certificate of the man-in-the-middle agent in the acquisition environment, and can start the whole acquisition process by accessing any initial page in the application needing acquisition.
Further, the man-in-the-middle agent module in step 2) comprises four main functions:
a) Because the client application device is provided with the security certificate issued by the man-in-the-middle agent module, the client application device can intercept the HTTPS encrypted traffic of the client and view the plaintext content as a man-in-the-middle.
b) When the client Request reaches the broker, the broker module checks whether the URL of the Request meets the condition of being filtered, and if so, intercepts the Request to directly return the empty content. Otherwise, the Request is forwarded to the target server. For example, according to the configured URL regular expression, intercepting part of HTTP/HTTPs requests, including CSS files, javaScript, and picture files, and so on, to return empty content, and directly return the empty content, thereby improving the collection efficiency. Because CSS files are used for rendering graphics and occupy a large amount of computing resources, in addition, javaScript files consume a large amount of resources for code browser execution, and finally resource files like pictures and audio occupy a large amount of network bandwidth, the collection efficiency and the collection stability can be greatly improved after filtering and interception.
c) When the target server returns the corresponding Response, the broker agent checks whether the URL is the content to be collected, and if so, stores the entire Response in the Redis database.
d) When the target server returns a corresponding Response, if the content is an HTML page, the broker agent checks whether its URL matches a specific regular expression, and if so, requests a corresponding task from the task scheduling module and injects it into the < script > tag in the HTML page. And then transmits the Response to the client application program.
Further, the 3) parsing module is configured to take out the content to be parsed from the Redis data cache, allocate the content to different parser instances according to the URL thereof, store the structured data into the MongoDB after the resolution is completed, and store the information related to the next acquisition, such as URL, cookies, etc., into the Redis database cache.
Further, step 4) the task scheduler: the method mainly comprises task generation, task scheduling, task deduplication and task recovery.
Further, task generation is mainly divided into two generation modes: a task is generated according to pre-configured seed information, and a new task is generated according to acquired information; in addition, tasks are also largely divided into two types: a task is collected for a simple HTML page, and the task is data such as JSON and the like for dynamic information, and the task needs related task parameters, cookies and the like, and can complete collection by executing JavaScript related codes.
Further, task scheduling is mainly to allocate different acquisition tasks to different application programs according to the different application programs, and control the acquisition rate of the tasks to avoid being blocked.
Further, task deduplication is mainly performed according to the URL, and only a unified deduplication queue is needed in the Redis database, and each time a task is generated, whether the URL has been accessed is queried.
Further, the task recovery function needs to identify the task whose acquisition failed and schedule its recovery in due course.
The beneficial effects of the invention are as follows:
the data acquisition method and system based on the man-in-the-middle attack can support the data acquisition task of the application taking the browser kernel as the core, can be flexibly configured according to the URL regularization, is modularized and functionally functional in the crawling process, greatly improves the efficiency of data acquisition of the application program, and has good applicability and universality.
In fig. 1, a client application device is a device on which an application requiring information acquisition is installed, and an intermediary agent configuration needs to be configured on the device. After the configuration is completed, the man-in-the-middle agent module can be seen to proxy all traffic between the client device and the application server and timely implement 'man-in-the-middle attack', and the man-in-the-middle agent module can interact with the task scheduling module and the Redis database. And finally, analyzing the acquired data by an analyzer and storing the analyzed data into a MongoDB database.
The data acquisition system based on man-in-the-middle attack comprises five modules in the acquisition process, the processing process is modularized, the functions are unified, and Redis data caching decoupling is adopted among the modules, so that the efficiency of the crawler system is greatly improved. The stability of data grabbing is guaranteed, and meanwhile, the operation and maintenance cost of the system is greatly reduced.
The following is a system example corresponding to the above method example, and this embodiment mode may be implemented in cooperation with the above embodiment mode. The related technical details mentioned in the above embodiments are still valid in this embodiment, and in order to reduce repetition, they are not repeated here. Accordingly, the related technical details mentioned in the present embodiment can also be applied to the above-described embodiments.
The invention also provides an internet data acquisition system based on the middleman, which comprises:
the method comprises the steps that 1, an intermediate person of the webpage information acquisition equipment is established by installing an intermediate person proxy certificate to the webpage information acquisition equipment, and when the webpage information acquisition equipment accesses webpage information in the Internet, the intermediate person proxies all network flow of the webpage information acquisition equipment;
the module 2, the man-in-the-middle acquires an acquisition task containing a URL regular expression of a webpage to be acquired, captures the flow which accords with the URL regular expression in all network flows, uses the flow as an intermediate flow, injects the acquisition task into an HTML page of the intermediate flow, and obtains the page to be analyzed and stores the page into a first database;
and the module 3, the analysis module distributes the page to be analyzed to the analyzer instance for analysis according to the URL information of the page to be analyzed in the first database, acquires the webpage acquisition result containing the structured data from the analysis module and stores the webpage acquisition result into the second database.
The internet data acquisition system based on the middleman, wherein the module 2 comprises: and the man-in-the-middle decrypts the encrypted content in the network traffic according to the HTTPS security certificate configured by the webpage information acquisition equipment.
The system for collecting the webpage information in the Internet, wherein the generation process of the collection task in the module 2 comprises the following steps: and generating the acquisition task according to the pre-configured seed information, or generating a new acquisition task according to the acquired webpage acquisition result.
The system for collecting webpage information in the internet, wherein the module 2 comprises: intercepting part of HTTP/HTTPS requests according to the configured URL regular expression, and returning empty content so as to improve acquisition efficiency.
The system for collecting the webpage information in the Internet, wherein the collection task in the module 2 comprises the following steps: an HTML page acquisition task and a dynamic content acquisition task; the HTML page acquisition task comprises a jump code, and jumps to the URL to be acquired next time; the dynamic content acquisition task not only comprises a jump code, but also comprises the step of acquiring corresponding interface parameters by using JavaScript codes and the step of using JavaScript codes of acquired pages.
Claims (2)
1. An internet data acquisition method based on a man-in-the-middle is characterized by comprising the following steps:
step 1, establishing an intermediate person of the webpage information acquisition equipment by installing an intermediate person proxy certificate to the webpage information acquisition equipment, wherein the intermediate person proxies all network traffic of the webpage information acquisition equipment when the webpage information acquisition equipment accesses webpage information in the Internet;
step 2, the man-in-the-middle acquires an acquisition task containing a URL regular expression of a webpage to be acquired, captures an HTML page conforming to the URL regular expression in all network traffic, stores the HTML page as a webpage to be analyzed into a first database, injects a webpage to be crawled into the webpage to be analyzed, and returns the webpage to the webpage information acquisition equipment;
step 3, the analysis module distributes the page to be analyzed to the analyzer instance for analysis according to the URL information of the page to be analyzed in the first database, and acquires a webpage acquisition result containing the structured data from the analysis result and stores the webpage acquisition result into the second database;
the intermediate decrypts the encrypted content in the network traffic according to the HTTPS security certificate configured by the webpage information acquisition equipment;
the generating process of the acquisition task in the step 2 comprises the following steps: generating the acquisition task according to the pre-configured seed information, or generating a new acquisition task according to the acquired webpage acquisition result; and step 2 comprises: intercepting part of HTTP/HTTPS requests according to the configured URL regular expression, and returning blank content to improve acquisition efficiency, wherein the acquisition task in the step 2 comprises the following steps: an HTML page acquisition task and a dynamic content acquisition task; the HTML page acquisition task comprises a jump code, and jumps to the URL to be acquired next time; the dynamic content acquisition task not only comprises a jump code, but also comprises the step of acquiring corresponding interface parameters by using JavaScript codes and the step of using JavaScript codes of acquired pages.
2. An internet data acquisition system based on man-in-the-middle, comprising:
the method comprises the steps that 1, an intermediate person of the webpage information acquisition equipment is established by installing an intermediate person proxy certificate to the webpage information acquisition equipment, and when the webpage information acquisition equipment accesses webpage information in the Internet, the intermediate person proxies all network flow of the webpage information acquisition equipment;
the module 2, the man-in-the-middle acquires an acquisition task containing a URL regular expression of a webpage to be acquired, captures an HTML page conforming to the URL regular expression in all network traffic, stores the HTML page as a webpage to be analyzed into a first database, and returns the webpage to be crawled after the webpage to be analyzed is injected into the webpage to be analyzed to webpage information acquisition equipment;
the module 3, the analysis module distributes the page to be analyzed to the analyzer instance for analysis according to the URL information of the page to be analyzed in the first database, acquires the web page acquisition result containing the structured data from the analysis module and stores the web page acquisition result into the second database;
the intermediate decrypts the encrypted content in the network traffic according to the HTTPS security certificate configured by the webpage information acquisition equipment;
the generation process of the acquisition task in the module 2 comprises the following steps: generating the acquisition task according to the pre-configured seed information or generating a new acquisition task according to the acquired webpage acquisition result, wherein the module 2 comprises: intercepting part of HTTP/HTTPS requests according to the configured URL regular expression, and returning empty content so as to improve acquisition efficiency; and the acquisition tasks in module 2 include: an HTML page acquisition task and a dynamic content acquisition task; the HTML page acquisition task comprises a jump code, and jumps to the URL to be acquired next time; the dynamic content acquisition task not only comprises a jump code, but also comprises the step of acquiring corresponding interface parameters by using JavaScript codes and the step of using JavaScript codes of acquired pages.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910909270.7A CN110781367B (en) | 2019-09-25 | 2019-09-25 | Internet data acquisition method and system based on middleman |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910909270.7A CN110781367B (en) | 2019-09-25 | 2019-09-25 | Internet data acquisition method and system based on middleman |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110781367A CN110781367A (en) | 2020-02-11 |
CN110781367B true CN110781367B (en) | 2023-10-20 |
Family
ID=69384426
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910909270.7A Active CN110781367B (en) | 2019-09-25 | 2019-09-25 | Internet data acquisition method and system based on middleman |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110781367B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113194044B (en) * | 2021-05-20 | 2023-01-03 | 深圳市联软科技股份有限公司 | Intelligent flow distribution method and system based on enterprise security |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105657046A (en) * | 2016-02-24 | 2016-06-08 | 中国科学技术大学 | Method for injecting advertisements based on Openwrt router |
CN105787750A (en) * | 2014-12-25 | 2016-07-20 | 杭州迪普科技有限公司 | Information pushing method and information pushing device |
CN107894888A (en) * | 2017-10-18 | 2018-04-10 | 南京邮数通信息科技有限公司 | A kind of web page contents replacement method and system based on js injections |
CN107948052A (en) * | 2017-11-14 | 2018-04-20 | 福建中金在线信息科技有限公司 | Information crawler method, apparatus, electronic equipment and system |
CN108287874A (en) * | 2017-12-19 | 2018-07-17 | 中国科学院声学研究所 | A kind of DB2 database management method and device |
CN108737332A (en) * | 2017-04-17 | 2018-11-02 | 南京邮电大学 | A kind of man-in-the-middle attack prediction technique based on machine learning |
CN109033838A (en) * | 2018-07-27 | 2018-12-18 | 平安科技(深圳)有限公司 | Website security detection method and device |
CN109543086A (en) * | 2018-11-23 | 2019-03-29 | 北京信息科技大学 | A kind of network data acquisition and methods of exhibiting towards multi-data source |
CN109710830A (en) * | 2018-12-28 | 2019-05-03 | 四川新网银行股份有限公司 | A kind of distributed network crawler method and system based on browser plug-in |
CN109831491A (en) * | 2019-01-15 | 2019-05-31 | 科大国创软件股份有限公司 | Intrusive social data acquisition method based on agency |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100205297A1 (en) * | 2009-02-11 | 2010-08-12 | Gurusamy Sarathy | Systems and methods for dynamic detection of anonymizing proxies |
JP2015523632A (en) * | 2012-05-17 | 2015-08-13 | アド−バンテージ ネットワークス,インコーポレイテッド | Content usage management system for Internet access providers and local operators |
-
2019
- 2019-09-25 CN CN201910909270.7A patent/CN110781367B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105787750A (en) * | 2014-12-25 | 2016-07-20 | 杭州迪普科技有限公司 | Information pushing method and information pushing device |
CN105657046A (en) * | 2016-02-24 | 2016-06-08 | 中国科学技术大学 | Method for injecting advertisements based on Openwrt router |
CN108737332A (en) * | 2017-04-17 | 2018-11-02 | 南京邮电大学 | A kind of man-in-the-middle attack prediction technique based on machine learning |
CN107894888A (en) * | 2017-10-18 | 2018-04-10 | 南京邮数通信息科技有限公司 | A kind of web page contents replacement method and system based on js injections |
CN107948052A (en) * | 2017-11-14 | 2018-04-20 | 福建中金在线信息科技有限公司 | Information crawler method, apparatus, electronic equipment and system |
CN108287874A (en) * | 2017-12-19 | 2018-07-17 | 中国科学院声学研究所 | A kind of DB2 database management method and device |
CN109033838A (en) * | 2018-07-27 | 2018-12-18 | 平安科技(深圳)有限公司 | Website security detection method and device |
CN109543086A (en) * | 2018-11-23 | 2019-03-29 | 北京信息科技大学 | A kind of network data acquisition and methods of exhibiting towards multi-data source |
CN109710830A (en) * | 2018-12-28 | 2019-05-03 | 四川新网银行股份有限公司 | A kind of distributed network crawler method and system based on browser plug-in |
CN109831491A (en) * | 2019-01-15 | 2019-05-31 | 科大国创软件股份有限公司 | Intrusive social data acquisition method based on agency |
Non-Patent Citations (7)
Title |
---|
EricD.Knapp等著,宁文元等译.第3章入侵智能电网 中间人攻击.《应用网络安全与智能电网 现代电力基础设施的安全控制》.国防工业出版社,2015,第82页. * |
一种基于无线路由器的IoT设备轻量级防御框架;严志涛;方滨兴;刘奇旭;崔翔;;中国科学院大学学报(第06期);全文 * |
使用AnyProxy自动爬取微信公众号数据-包括阅读数和点赞数;zsyoung的博客;《CSDN博客》;20171220;全文 * |
分布式微信公众平台爬虫系统的研究与应用;吴霖;《中国优秀硕士学位论文全文库》;20160430;第7-54页 * |
吴桦等.8.2.1 中间人攻击方法获得明文.《新一代互联网流媒体服务及路由关键技术》.2017,第170-171页. * |
大禹编程的博客.最好用的中间人___工具mitmproxy.《CSDN博客》.2018, * |
适用于网络内容审计的SSL/TLS保密数据高效明文采集方法;董海韬;田静;杨军;叶晓舟;宋磊;;计算机应用(第10期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN110781367A (en) | 2020-02-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20180041530A1 (en) | Method and system for detecting malicious web addresses | |
US9262300B1 (en) | Debugging computer programming code in a cloud debugger environment | |
CN107341160B (en) | Crawler intercepting method and device | |
US9519561B2 (en) | Method and system for configuration-controlled instrumentation of application programs | |
US11070648B2 (en) | Offline client replay and sync | |
US7343523B2 (en) | Web-based analysis of defective computer programs | |
US20130160130A1 (en) | Application security testing | |
US20030053420A1 (en) | Monitoring operation of and interaction with services provided over a network | |
KR20230054474A (en) | Micro-front-end system, sub-application loading method, electronic device, computer program product and computer-readable storage medium | |
WO2013091709A1 (en) | Method and apparatus for real-time dynamic transformation of the code of a web document | |
CN104750471A (en) | WEB page performance detection and analysis plug-in and method based on browser | |
US20210081263A1 (en) | System for offline object based storage and mocking of rest responses | |
CN110138818B (en) | Method, website application, system, device and service back-end for transmitting parameters | |
CN111177519B (en) | Webpage content acquisition method, device, storage medium and equipment | |
CN101562618A (en) | Method and device for detecting web Trojan | |
CN109359231A (en) | A kind of information crawler method, server and the storage medium of distributed network crawler | |
CN103838558A (en) | Website building system and method, website access method and webpage adaption system | |
CN103716319B (en) | A kind of apparatus and method of web access optimization | |
CN110781367B (en) | Internet data acquisition method and system based on middleman | |
CN105159992A (en) | Method and device for detecting page contents and network behaviors of application program | |
CN103593396A (en) | Network resource extracting method and device based on browser | |
CN104615597A (en) | Method, device and system for clearing cache file in browser | |
CN104954363A (en) | Method and device for generating interface document | |
CN110532455A (en) | A kind of Web page picture acquisition methods and system based on Chrome browser | |
CN110493250A (en) | A kind of WEB front-end ARCGIS resource request processing method and processing device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |