CN110781367A - Internet data acquisition method and system based on man-in-the-middle - Google Patents

Internet data acquisition method and system based on man-in-the-middle Download PDF

Info

Publication number
CN110781367A
CN110781367A CN201910909270.7A CN201910909270A CN110781367A CN 110781367 A CN110781367 A CN 110781367A CN 201910909270 A CN201910909270 A CN 201910909270A CN 110781367 A CN110781367 A CN 110781367A
Authority
CN
China
Prior art keywords
webpage
task
man
page
acquisition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910909270.7A
Other languages
Chinese (zh)
Other versions
CN110781367B (en
Inventor
程学旗
史存会
胡耀康
朱运昌
俞晓明
刘悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201910909270.7A priority Critical patent/CN110781367B/en
Publication of CN110781367A publication Critical patent/CN110781367A/en
Application granted granted Critical
Publication of CN110781367B publication Critical patent/CN110781367B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention provides an internet data acquisition method and system based on a man-in-the-middle, comprising the following steps: establishing a broker of the web page information acquisition equipment by installing a broker proxy certificate to the web page information acquisition equipment, wherein the broker proxies all network traffic of the web page information acquisition equipment when the web page information acquisition equipment accesses web page information in the internet; the method comprises the steps that a middle person obtains a collection task containing a URL regular expression of a webpage to be collected, captures flow which is in accordance with the URL regular expression in all network flow and serves as middle flow, the collection task is injected into an HTML page of the middle flow, and the page to be analyzed is obtained and stored in a first database; and the analysis module distributes the page to be analyzed to an analyzer example for analysis according to the URL information of the page to be analyzed in the first database, and acquires a webpage acquisition result containing the structured data and stores the webpage acquisition result in the second database. The invention can support data acquisition of all applications that rely on the integrated browser kernel functionality to provide information.

Description

Internet data acquisition method and system based on man-in-the-middle
Technical Field
The invention relates to the field of web crawlers, in particular to a data acquisition method and a data acquisition system based on man-in-the-middle attack, which can continuously inject different task codes into different application programs in a way of modifying flow data attack by a man-in-the-middle agent to complete requests for different pages and acquire related data.
Background
A web crawler can use various existing resources to automatically capture a large amount of web page information on the internet, and is sometimes called a "web Spider (Spider)". However, with the popularization of the mobile internet, more traffic is directly distributed through various different terminal applications, and WEB access is not provided or is limited by partial data, so that great difficulty is brought to data acquisition.
The crawling process of the WEB crawler includes the steps of obtaining a request URL, sending a WEB request to download a page, analyzing structured data from the page, filtering repeated data and processing a seed task, wherein 5 links are counted, each link consumes different resources, and the efficiency and the stability of the whole crawler system are affected when each link goes wrong. In addition, with the change of internet technology, more and more information that cannot be obtained through the traditional WEB channel is more and more, and a large amount of information is spread through a specific application program, typically mobile information application, and the like, and a large amount of data is asynchronous requests, and encrypted data using HTTPS, a universal data acquisition system compatible with various applications and various types of data is lacking at present.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides an internet data acquisition method based on a man-in-the-middle, which comprises the following steps:
step 1, establishing a broker of the webpage information acquisition equipment by installing a broker proxy certificate to the webpage information acquisition equipment, wherein the broker proxies all network traffic of the webpage information acquisition equipment when the webpage information acquisition equipment accesses webpage information in the internet;
step 2, the middle person obtains a collection task containing a URL regular expression of a webpage to be collected, captures the flow which is in accordance with the URL regular expression in all network flows and takes the flow as an intermediate flow, and injects the collection task into an HTML page of the intermediate flow to obtain a page to be analyzed and stores the page into a first database;
and 3, the analysis module distributes the page to be analyzed to an analyzer instance for analysis according to the URL information of the page to be analyzed in the first database, and acquires a webpage acquisition result containing the structured data and stores the webpage acquisition result in a second database.
The internet data acquisition method based on the man-in-the-middle, wherein the step 2 comprises the following steps: and the man-in-the-middle decrypts the encrypted content in the network flow according to the HTTPS security certificate configured by the webpage information acquisition equipment.
The internet data acquisition method based on the man-in-the-middle, wherein the generation process of the acquisition task in the step 2 comprises the following steps: and generating the acquisition task according to the preconfigured seed information, or generating a new acquisition task according to the acquired webpage acquisition result.
The internet data acquisition method based on the man-in-the-middle, wherein the step 2 comprises the following steps: intercepting part of HTTP/HTTPS requests according to the configured URL regular expression, and returning empty content to improve the collection efficiency.
The internet data acquisition method based on the man-in-the-middle, wherein the acquisition task in the step 2 comprises the following steps: HTML page collection task and dynamic content collection task; the HTML page acquisition task comprises a jump code, and a jump is made to a URL to be acquired next time; the dynamic content collection task not only comprises a jump code, but also comprises a JavaScript code which is used for obtaining corresponding interface parameters and a collected page.
The invention also provides an internet data acquisition system based on the man-in-the-middle, which comprises:
the module 1, through installing the broker's agent certificate to the information acquisition equipment of the webpage, set up the broker of the information acquisition equipment of the webpage, when the information acquisition equipment of the webpage visits the webpage information in the Internet, the broker acts on all network traffic of the information acquisition equipment of the webpage;
the module 2, the middle person obtains the collection task containing the URL regular expression of the webpage to be collected, captures the flow which is in accordance with the URL regular expression in all the network flows and is used as the middle flow, and injects the collection task into the HTML page of the middle flow to obtain the page to be analyzed and stores the page into the first database;
and the module 3 and the analysis module distribute the page to be analyzed to an analyzer instance for analysis according to the URL information of the page to be analyzed in the first database, and acquire a webpage acquisition result containing the structured data and store the webpage acquisition result in the second database.
The internet data acquisition system based on the man-in-the-middle, wherein the module 2 comprises: and the man-in-the-middle decrypts the encrypted content in the network flow according to the HTTPS security certificate configured by the webpage information acquisition equipment.
The internet data acquisition system based on the man-in-the-middle, wherein the generation process of the acquisition task in the module 2 comprises the following steps: and generating the acquisition task according to the preconfigured seed information, or generating a new acquisition task according to the acquired webpage acquisition result.
The internet data acquisition system based on the man-in-the-middle, wherein the module 2 comprises: intercepting part of HTTP/HTTPS requests according to the configured URL regular expression, and returning empty content to improve the collection efficiency.
The internet data acquisition system based on the man-in-the-middle, wherein the acquisition task in the module 2 comprises: HTML page collection task and dynamic content collection task; the HTML page acquisition task comprises a jump code, and a jump is made to a URL to be acquired next time; the dynamic content collection task not only comprises a jump code, but also comprises a JavaScript code which is used for obtaining corresponding interface parameters and a collected page.
According to the scheme, the invention has the advantages that:
the invention provides a data acquisition method and a data acquisition system based on man-in-the-middle attack, which can support the data acquisition of all applications which provide information by means of an integrated browser kernel function, and comprise various types of webpage request modes, and the structured data analysis configuration is flexible. The system has the advantages that the acquisition process is modularized and functional, the data capture efficiency is greatly improved, and the difficulty in acquiring various application program data is greatly reduced.
The invention applies the man-in-the-middle attack technology to the data acquisition system, modularizes each processing link and has single function of each module, thereby improving the working efficiency of the whole system and leading the horizontal expansion of the system to be more convenient and simpler. On the other hand, the Redis message queue is introduced to decouple modules, and the Redis has the characteristics of high throughput, high availability and easiness in expansion, so that the efficiency and the stability of the invention are greatly improved by introducing the Redis storage medium.
Drawings
FIG. 1 is an architecture diagram of a crawler system according to an embodiment of the present invention.
Detailed Description
In order to make the aforementioned features and effects of the present invention more comprehensible, embodiments accompanied with figures are described in detail below.
The invention introduces an Anyproxy agent tool to act all HTTP/HTTPS flow of the client application, and ensures the decryption of HTTPS encrypted data by a method of installing a security certificate on corresponding acquisition equipment in advance.
The technical scheme of the invention is as follows:
a data acquisition method based on man-in-the-middle attack comprises the following steps:
1) the part of the application and the device which need to be collected is a collection main body, but the application and the device only need to be configured with a broker agent and install a broker agent certificate, and the application which needs to be collected can access any page initialization.
2) The man-in-the-middle agent module is mainly responsible for implementing man-in-the-middle attack and comprises the following main contents:
a) intercepting requests for filtering invalid traffic, wherein the requests include but are not limited to resource files such as CSS files, JavaScript files, picture files and the like.
b) The encrypted content is decrypted using the preconfigured HTTPS certificate.
c) And capturing the flow to be acquired according to the URL regular expression and storing the flow into a Redis database cache.
d) And requesting a new task from the task scheduling module, and injecting the task into an HTML page which accords with the URL regular expression. The tasks include the next web pages to be crawled, but because of the closeness of the client, the client has difficulty in directly taking new tasks, so the present invention communicates the tasks to the client in the form of injections into the web pages returned to the client.
3) And the analysis module is used for distributing different pages to different analyzer examples for analysis according to the URL information of the Response data, acquiring structured data from the different analyzer examples and storing the structured data into the MongoDB database, and storing data related to a new task into the Redis database for the task scheduling module to generate the new task. The Response data refers to the Response data of the application server to the client, so that the data is returned by the server.
4) The task scheduling module generates task content according to basic information pre-configured according to seed tasks and the like, and can also generate new tasks according to data analyzed by the analysis module; and according to the URL of the task, the task is subjected to duplicate removal, and the failure recovery of the task is realized.
5) The data storage module is mainly responsible for storing relevant data and decoupling the functions of the modules. The decoupling is embodied in that the data storage module can record data required to be written by other functional modules and can also enable the other functional modules to read the data required by the other functional modules, so that each functional module does not need to be directly interacted with other functional modules and only needs a shared data storage module.
Further, step 1) is mainly to install and configure the network address, the network port and the security certificate of the broker agent in the acquisition environment, and the whole acquisition process can be started when the application to be acquired accesses any initial page.
Further, the broker agent module in step 2) includes four main functions:
a) because the client application equipment is provided with the security certificate issued by the broker agent module, the broker can intercept HTTPS encrypted traffic of the client and view plaintext contents.
b) When the Request of the client terminal reaches the broker agent, the broker agent module checks whether the URL of the Request meets the condition to be filtered, and if the URL of the Request meets the condition to be filtered, the Request is intercepted and the empty content is directly returned. Otherwise, the Request is forwarded to the target server. For example, a part of HTTP/HTTPS requests are intercepted according to the configured URL regular expression, empty content is returned, the part of HTTP/HTTPS requests comprise CSS files, JavaScript, picture files and other requests which can reduce the collection efficiency and useless flow are intercepted, and the empty content is directly returned, so that the collection efficiency is improved. Because the CSS file is used for rendering graphics, a large amount of computing resources are occupied, in addition, the JavaScript file consumes a large amount of resources for code browser execution, and finally, resource files like pictures and audio occupy a large amount of network bandwidth, so that the acquisition efficiency and the acquisition stability can be greatly improved after filtering and intercepting.
c) When the target server returns the corresponding Response, the broker agent checks whether the URL is the content required to be acquired, and if so, stores the whole Response into a Redis database.
d) When the target server returns the corresponding Response, if the content is an HTML page, the broker agent checks whether the URL of the broker agent is matched with a specific regular expression, and if so, requests the corresponding task from the task scheduling module and injects the task into a < script > tag in the HTML page. And then transmits the Response to the client application program.
Further, the parsing module 3) is configured to take out content to be parsed from the Redis data cache, allocate the content to different parser instances according to URLs of the content, and store the structured data into the MongoDB and information related to next collection, such as URLs and Cookies, in the Redis database cache after parsing is completed.
Further, step 4) the task scheduler: the method mainly comprises task generation, task scheduling, task deduplication and task recovery.
Further, task generation is mainly divided into two generation modes: one for generating tasks according to preconfigured seed information and one for generating new tasks according to already collected information; in addition, tasks are also largely divided into two types: one is a simple task of collecting HTML pages, and the other is data of dynamic information such as JSON (Java Server pages), and the task needs related task parameters, Cookies and other information, and executes JavaScript related codes to finish collection.
Further, task scheduling is mainly to allocate different collection tasks to different application programs according to the different application programs, control the collection rate of the collection tasks, and avoid being prohibited.
Further, task deduplication is mainly based on URLs, and only one unified deduplication queue needs to be used in the Redis database, so that each time a task is generated, whether the URL has been accessed is queried.
Further, the task recovery function needs to identify the task that failed to collect and schedule its recovery in due time.
The invention has the following beneficial effects:
the data acquisition method and the data acquisition system based on man-in-the-middle attack can support the data acquisition task of the application taking the browser kernel as the core, can adopt flexible configuration according to URL regulation, and has modularization and functionalization in the crawling process, thereby greatly improving the efficiency of acquiring the application program data and having good applicability and universality.
In fig. 1, a client application device is a device installed with an application that needs to collect information, and a broker configuration needs to be configured on the device. After the configuration is completed, the man-in-the-middle agent module can be seen to act on all the traffic between the client device and the application server and implement man-in-the-middle attack in due time, and the man-in-the-middle agent module can interact with the task scheduling module and the Redis database. And finally, the acquired data is analyzed by an analyzer and then stored in a MongoDB database.
The data acquisition system based on man-in-the-middle attack comprises five modules in the acquisition process, the processing process is modularized, the functions are simplified, Redis data cache decoupling is adopted among the modules, and the efficiency of the crawler system is greatly improved. The stability of data capture is guaranteed, and meanwhile the operation and maintenance cost of the system is greatly reduced.
The following are system examples corresponding to the above method examples, and this embodiment can be implemented in cooperation with the above embodiments. The related technical details mentioned in the above embodiments are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the above-described embodiments.
The invention also provides an internet data acquisition system based on the man-in-the-middle, which comprises:
the module 1, through installing the broker's agent certificate to the information acquisition equipment of the webpage, set up the broker of the information acquisition equipment of the webpage, when the information acquisition equipment of the webpage visits the webpage information in the Internet, the broker acts on all network traffic of the information acquisition equipment of the webpage;
the module 2, the middle person obtains the collection task containing the URL regular expression of the webpage to be collected, captures the flow which is in accordance with the URL regular expression in all the network flows and is used as the middle flow, and injects the collection task into the HTML page of the middle flow to obtain the page to be analyzed and stores the page into the first database;
and the module 3 and the analysis module distribute the page to be analyzed to an analyzer instance for analysis according to the URL information of the page to be analyzed in the first database, and acquire a webpage acquisition result containing the structured data and store the webpage acquisition result in the second database.
The internet data acquisition system based on the man-in-the-middle, wherein the module 2 comprises: and the man-in-the-middle decrypts the encrypted content in the network flow according to the HTTPS security certificate configured by the webpage information acquisition equipment.
The system for acquiring webpage information in the internet, wherein the generation process of the acquisition task in the module 2 comprises the following steps: and generating the acquisition task according to the preconfigured seed information, or generating a new acquisition task according to the acquired webpage acquisition result.
The system for acquiring the webpage information in the internet, wherein the module 2 comprises: intercepting part of HTTP/HTTPS requests according to the configured URL regular expression, and returning empty content to improve the collection efficiency.
The system for acquiring webpage information in the internet, wherein the acquisition task in the module 2 comprises: HTML page collection task and dynamic content collection task; the HTML page acquisition task comprises a jump code, and a jump is made to a URL to be acquired next time; the dynamic content collection task not only comprises a jump code, but also comprises a JavaScript code which is used for obtaining corresponding interface parameters and a collected page.

Claims (10)

1. An internet data acquisition method based on a man-in-the-middle is characterized by comprising the following steps:
step 1, establishing a broker of the webpage information acquisition equipment by installing a broker proxy certificate to the webpage information acquisition equipment, wherein the broker proxies all network traffic of the webpage information acquisition equipment when the webpage information acquisition equipment accesses webpage information in the internet;
step 2, the middle person obtains a collection task containing a URL regular expression of a webpage to be collected, captures the flow which is in accordance with the URL regular expression in all network flows and takes the flow as an intermediate flow, and injects the collection task into an HTML page of the intermediate flow to obtain a page to be analyzed and stores the page into a first database;
and 3, the analysis module distributes the page to be analyzed to an analyzer instance for analysis according to the URL information of the page to be analyzed in the first database, and acquires a webpage acquisition result containing the structured data and stores the webpage acquisition result in a second database.
2. The man-in-the-middle based internet data collection method of claim 1, wherein the step 2 comprises: and the man-in-the-middle decrypts the encrypted content in the network flow according to the HTTPS security certificate configured by the webpage information acquisition equipment.
3. The man-in-the-middle based internet data collection method of claim 1, wherein the collection task generation process in step 2 comprises: and generating the acquisition task according to the preconfigured seed information, or generating a new acquisition task according to the acquired webpage acquisition result.
4. The man-in-the-middle based internet data collection method of claim 1, wherein step 2 comprises: intercepting part of HTTP/HTTPS requests according to the configured URL regular expression, and returning empty content to improve the collection efficiency.
5. The man-in-the-middle based internet data collection method of claim 1, wherein the collection task in step 2 comprises: HTML page collection task and dynamic content collection task; the HTML page acquisition task comprises a jump code, and a jump is made to a URL to be acquired next time; the dynamic content collection task not only comprises a jump code, but also comprises a JavaScript code which is used for obtaining corresponding interface parameters and a collected page.
6. An internet data acquisition system based on a man-in-the-middle, comprising:
the module 1, through installing the broker's agent certificate to the information acquisition equipment of the webpage, set up the broker of the information acquisition equipment of the webpage, when the information acquisition equipment of the webpage visits the webpage information in the Internet, the broker acts on all network traffic of the information acquisition equipment of the webpage;
the module 2, the middle person obtains the collection task containing the URL regular expression of the webpage to be collected, captures the flow which is in accordance with the URL regular expression in all the network flows and is used as the middle flow, and injects the collection task into the HTML page of the middle flow to obtain the page to be analyzed and stores the page into the first database;
and the module 3 and the analysis module distribute the page to be analyzed to an analyzer instance for analysis according to the URL information of the page to be analyzed in the first database, and acquire a webpage acquisition result containing the structured data and store the webpage acquisition result in the second database.
7. The man-in-the-middle based internet data collection system of claim 6, wherein the module 2 comprises: and the man-in-the-middle decrypts the encrypted content in the network flow according to the HTTPS security certificate configured by the webpage information acquisition equipment.
8. The man-in-the-middle based internet data collection system of claim 6, wherein the collection task generation process in module 2 comprises: and generating the acquisition task according to the preconfigured seed information, or generating a new acquisition task according to the acquired webpage acquisition result.
9. The man-in-the-middle based internet data collection system of claim 6, wherein module 2 comprises: intercepting part of HTTP/HTTPS requests according to the configured URL regular expression, and returning empty content to improve the collection efficiency.
10. The man-in-the-middle based internet data collection system of claim 6, wherein the collection task in module 2 comprises: HTML page collection task and dynamic content collection task; the HTML page acquisition task comprises a jump code, and a jump is made to a URL to be acquired next time; the dynamic content collection task not only comprises a jump code, but also comprises a JavaScript code which is used for obtaining corresponding interface parameters and a collected page.
CN201910909270.7A 2019-09-25 2019-09-25 Internet data acquisition method and system based on middleman Active CN110781367B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910909270.7A CN110781367B (en) 2019-09-25 2019-09-25 Internet data acquisition method and system based on middleman

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910909270.7A CN110781367B (en) 2019-09-25 2019-09-25 Internet data acquisition method and system based on middleman

Publications (2)

Publication Number Publication Date
CN110781367A true CN110781367A (en) 2020-02-11
CN110781367B CN110781367B (en) 2023-10-20

Family

ID=69384426

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910909270.7A Active CN110781367B (en) 2019-09-25 2019-09-25 Internet data acquisition method and system based on middleman

Country Status (1)

Country Link
CN (1) CN110781367B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113194044A (en) * 2021-05-20 2021-07-30 深圳市联软科技股份有限公司 Intelligent flow distribution method and system based on enterprise security

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100205297A1 (en) * 2009-02-11 2010-08-12 Gurusamy Sarathy Systems and methods for dynamic detection of anonymizing proxies
US20130311301A1 (en) * 2012-05-17 2013-11-21 Ad-Vantage Networks, Inc. Content easement and management system for internet access providers and premise operators
CN105657046A (en) * 2016-02-24 2016-06-08 中国科学技术大学 Method for injecting advertisements based on Openwrt router
CN105787750A (en) * 2014-12-25 2016-07-20 杭州迪普科技有限公司 Information pushing method and information pushing device
CN107894888A (en) * 2017-10-18 2018-04-10 南京邮数通信息科技有限公司 A kind of web page contents replacement method and system based on js injections
CN107948052A (en) * 2017-11-14 2018-04-20 福建中金在线信息科技有限公司 Information crawler method, apparatus, electronic equipment and system
CN108287874A (en) * 2017-12-19 2018-07-17 中国科学院声学研究所 A kind of DB2 database management method and device
CN108737332A (en) * 2017-04-17 2018-11-02 南京邮电大学 A kind of man-in-the-middle attack prediction technique based on machine learning
CN109033838A (en) * 2018-07-27 2018-12-18 平安科技(深圳)有限公司 Website security detection method and device
CN109543086A (en) * 2018-11-23 2019-03-29 北京信息科技大学 A kind of network data acquisition and methods of exhibiting towards multi-data source
CN109710830A (en) * 2018-12-28 2019-05-03 四川新网银行股份有限公司 A kind of distributed network crawler method and system based on browser plug-in
CN109831491A (en) * 2019-01-15 2019-05-31 科大国创软件股份有限公司 Intrusive social data acquisition method based on agency

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100205297A1 (en) * 2009-02-11 2010-08-12 Gurusamy Sarathy Systems and methods for dynamic detection of anonymizing proxies
US20130311301A1 (en) * 2012-05-17 2013-11-21 Ad-Vantage Networks, Inc. Content easement and management system for internet access providers and premise operators
CN105787750A (en) * 2014-12-25 2016-07-20 杭州迪普科技有限公司 Information pushing method and information pushing device
CN105657046A (en) * 2016-02-24 2016-06-08 中国科学技术大学 Method for injecting advertisements based on Openwrt router
CN108737332A (en) * 2017-04-17 2018-11-02 南京邮电大学 A kind of man-in-the-middle attack prediction technique based on machine learning
CN107894888A (en) * 2017-10-18 2018-04-10 南京邮数通信息科技有限公司 A kind of web page contents replacement method and system based on js injections
CN107948052A (en) * 2017-11-14 2018-04-20 福建中金在线信息科技有限公司 Information crawler method, apparatus, electronic equipment and system
CN108287874A (en) * 2017-12-19 2018-07-17 中国科学院声学研究所 A kind of DB2 database management method and device
CN109033838A (en) * 2018-07-27 2018-12-18 平安科技(深圳)有限公司 Website security detection method and device
CN109543086A (en) * 2018-11-23 2019-03-29 北京信息科技大学 A kind of network data acquisition and methods of exhibiting towards multi-data source
CN109710830A (en) * 2018-12-28 2019-05-03 四川新网银行股份有限公司 A kind of distributed network crawler method and system based on browser plug-in
CN109831491A (en) * 2019-01-15 2019-05-31 科大国创软件股份有限公司 Intrusive social data acquisition method based on agency

Non-Patent Citations (10)

* Cited by examiner, † Cited by third party
Title
ERICD.KNAPP等著,宁文元等译: "《应用网络安全与智能电网 现代电力基础设施的安全控制》", 31 March 2015, 国防工业出版社, pages: 82 *
SAS???的博客: "最好用的中间人___工具mitmproxy", 《CSDN博客》 *
SAS???的博客: "最好用的中间人___工具mitmproxy", 《CSDN博客》, 24 October 2018 (2018-10-24) *
ZSYOUNG的博客: "使用AnyProxy自动爬取微信公众号数据-包括阅读数和点赞数", 《CSDN博客》 *
ZSYOUNG的博客: "使用AnyProxy自动爬取微信公众号数据-包括阅读数和点赞数", 《CSDN博客》, 20 December 2017 (2017-12-20) *
严志涛;方滨兴;刘奇旭;崔翔;: "一种基于无线路由器的IoT设备轻量级防御框架", 中国科学院大学学报, no. 06 *
吴桦等: "《新一代互联网流媒体服务及路由关键技术》", 30 November 2017, pages: 170 - 171 *
吴霖: "分布式微信公众平台爬虫系统的研究与应用", 《中国优秀硕士学位论文全文库》 *
吴霖: "分布式微信公众平台爬虫系统的研究与应用", 《中国优秀硕士学位论文全文库》, 30 April 2016 (2016-04-30), pages 7 - 54 *
董海韬;田静;杨军;叶晓舟;宋磊;: "适用于网络内容审计的SSL/TLS保密数据高效明文采集方法", 计算机应用, no. 10 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113194044A (en) * 2021-05-20 2021-07-30 深圳市联软科技股份有限公司 Intelligent flow distribution method and system based on enterprise security

Also Published As

Publication number Publication date
CN110781367B (en) 2023-10-20

Similar Documents

Publication Publication Date Title
US10567407B2 (en) Method and system for detecting malicious web addresses
US9262300B1 (en) Debugging computer programming code in a cloud debugger environment
CN1262940C (en) Equipment and method for providing global session persistence
US9519561B2 (en) Method and system for configuration-controlled instrumentation of application programs
US20130160130A1 (en) Application security testing
US11070648B2 (en) Offline client replay and sync
Bates et al. Transparent web service auditing via network provenance functions
CN106126693B (en) Method and device for sending related data of webpage
US20060184829A1 (en) Web-based analysis of defective computer programs
KR20230054474A (en) Micro-front-end system, sub-application loading method, electronic device, computer program product and computer-readable storage medium
CN111177519B (en) Webpage content acquisition method, device, storage medium and equipment
CN110769009B (en) User identity authentication method and system
CN104268082A (en) Pressure test method and pressure test device for browser
EP2820583A1 (en) Network service interface analysis
CN109359231A (en) A kind of information crawler method, server and the storage medium of distributed network crawler
AU2008355023A1 (en) Generating sitemaps
CN110598135A (en) Network request processing method and device, computer readable medium and electronic equipment
CN103729380A (en) Data processing method, system and device
CN114221995A (en) Service calling method and device and electronic equipment
US8532960B2 (en) Remotely collecting and managing diagnostic information
CN104615597A (en) Method, device and system for clearing cache file in browser
CN110781367B (en) Internet data acquisition method and system based on middleman
CN104954363A (en) Method and device for generating interface document
CN105677688B (en) Page data loading method and system
CN112988569A (en) Method and system for viewing micro-service request response based on nginx

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant