CN114095207A - IPv6 website detection method based on distributed scheduling - Google Patents

IPv6 website detection method based on distributed scheduling Download PDF

Info

Publication number
CN114095207A
CN114095207A CN202111244665.3A CN202111244665A CN114095207A CN 114095207 A CN114095207 A CN 114095207A CN 202111244665 A CN202111244665 A CN 202111244665A CN 114095207 A CN114095207 A CN 114095207A
Authority
CN
China
Prior art keywords
task
control
url
management
scheduling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111244665.3A
Other languages
Chinese (zh)
Inventor
缪俊
李科
王少帅
陈琦
李号
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Lianxing Technology Co ltd
Original Assignee
Beijing Lianxing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Lianxing Technology Co ltd filed Critical Beijing Lianxing Technology Co ltd
Priority to CN202111244665.3A priority Critical patent/CN114095207A/en
Publication of CN114095207A publication Critical patent/CN114095207A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/145Countermeasures against malicious traffic the attack involving the propagation of malware through the network, e.g. viruses, trojans or worms
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/02Capturing of monitoring data
    • H04L43/028Capturing of monitoring data by filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Cardiology (AREA)
  • Virology (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses an IPv6 website detection method based on distributed scheduling, which comprises the following steps: the middle management program executes url validity judgment, the task is deduplicated, and the maximum task queue is limited; dispatching and issuing non-running task management through kafka, and controlling each task List and a storage structure; obtaining scheduling priority control of uncompleted tasks, inspection, task recovery and final recording through Redis control; splitting a task received by Redis control, and performing polling, timing updating, requesting and warehousing on the task to perform control management; selecting tasks according to task priorities, the time sequence of creating the tasks and url names in ascending order according to task issuing conditions, and setting the scheduling priority of the current task to execute according to the latest scheduling priority; a task is received and a url connection is scanned. The method of the invention improves the website detection speed and accuracy, can greatly shorten the monitoring time and improve the monitoring efficiency.

Description

IPv6 website detection method based on distributed scheduling
Technical Field
The invention relates to a distributed scheduling technology in the field of Internet, in particular to an IPv6 website detection method based on distributed scheduling.
Background
Since 2012 began, the IPv4 address pool of the APNIC was essentially exhausted, and telecommunications carriers in asia-pacific regions could no longer obtain bulk IPv4 addresses. Due to continuous and high-speed development of emerging information technologies such as mobile internet, cloud computing, big data, internet of things, artificial intelligence and the like, the demand for IP address resources is abnormally vigorous, the IPv4 address held by a telecom operator is rapidly consumed, and the dilemma that a lot of NAT conversion technologies are adopted to deal with the lack of the IPv4 address is forced. The shortage of IPv4 address resources has become a limiting factor in the development of digital economy. In the face of the current situation of IPv4 address shortage, the large-scale deployment of IPv6 to promote the upgrading evolution of the Internet to IPv6 is the only fundamental solution for solving the problem of IP addresses.
The IPv6 protocol is a next-generation internet protocol, and is greatly improved in address space, security, and the like. The internet of things, cloud computing, big data, artificial intelligence and other new technologies drive network space to evolve towards the interconnection of everything, the situation that address shortage is solved and innovative space is cultivated by utilizing the IPv6 technology is great, countries in the world fully know the urgency of deploying IPv6 in a large scale, the global communication industry and enterprises developing emerging technology application are migrating to IPv6, and the acceleration trend is presented.
In 11 months in 2017, the state issues an action plan for promoting the scale deployment of the sixth version of the internet protocol (IPv6) 'guide opinion on developing the industrial Internet by deepening the Internet and advanced manufacturing industry', IPv6 (Internet protocol version 6) is deployed comprehensively, and the detection is carried out on a website for deploying IPv6 according to the national policy requirements, so that the detection is in accordance with the national IPv6 network standard.
However, the existing linear detection technology has the problems of low detection speed, inaccuracy caused by the influence of a reverse climbing technology and the like.
Disclosure of Invention
Aiming at the defects of low detection speed, inaccuracy caused by the influence of a reverse-crawling technology and the like in the existing linear detection technology, the invention provides a method for detecting the IPv6 website based on distributed scheduling, which adopts the distributed scheduling IPv6 website detection technology to carry out deep detection on the website according to indexes, and can detect the home page, the secondary connection and the tertiary connection of the IPv6 website through the distributed IPv6 website detection technology.
The technical scheme adopted by the invention is as follows:
the invention provides an IPv6 website detection method based on distributed scheduling, which comprises the following steps:
1) and (4) middle station management: receiving a website detection task, executing url validity judgment by a middle management program, removing duplication of the task, and limiting a maximum task queue;
2) redis control: the method comprises the steps that unoperated task management is issued to dispatching through kafka, and a primary task, a secondary task, a task heartbeat, an operation task List and a storage structure are controlled;
3) and (3) Mongo control: obtaining scheduling priority control of uncompleted tasks, inspection, task recovery and final recording through Redis control;
4) scheduling: splitting a task received by Redis control, splitting a main url into a plurality of sub urls, issuing the task according to the performance of each monitoring point and the current task state, and polling, updating at regular time, requesting and warehousing the task to perform control management;
5) a gateway: selecting tasks according to task priorities, the time sequence of creating the tasks and url names in ascending order according to task issuing conditions, and setting the scheduling priority of the current task to execute according to the latest scheduling priority;
6) crawler: a task is received and a url connection is scanned.
In the step 4), one main url is split into a plurality of sub urls, the split sub urls are generated into a single executable task, and the task management control mechanism is coordinated with the task management control mechanisms in the steps 2) and 3) to complete task control.
Analyzing the main url in the step 4), and generating each sub-link based on the main url for each link in the main url page, thereby forming a plurality of independently accessible url sub-links.
The invention has the following technical effects and advantages:
1. the invention provides an IPv6 website detection method based on distributed scheduling, which is characterized in that through a distributed IPv6 website detection technology, compared with the traditional website detection technology, the website detection speed and the detection accuracy are improved, and the problems of low detection speed, inaccuracy caused by the influence of a reverse-crawling technology and the like in the traditional linear detection technology are solved.
2. The method of the invention can greatly shorten the monitoring time and improve the monitoring efficiency by splitting the second-level connection and the third-level connection of the website into independent urls, distributing and issuing the urls to all monitoring points and executing the detection in a unified time and executing the detection at the same time of multiple monitoring points.
3. The distributed scheduling is to reasonably split the second-level and third-level connections of the website, execute detection tasks by a plurality of different monitoring points in detection, and uniformly manage and control through the distributed scheduling, so that the detection problem generated by a single detection point can be effectively avoided, and the detection accuracy is effectively improved.
Drawings
FIG. 1 is a flowchart of a method for detecting IPv6 website based on distributed scheduling;
FIG. 2 is a graph comparing the results of the present invention method with prior art tests.
Detailed Description
As shown in FIG. 1, the invention provides an IPv6 website detection method based on distributed scheduling, which comprises the following steps:
1) and (4) middle station management: receiving a website detection task, executing url legality judgment by a middle management program (which is an existing program), removing duplication of the task, and limiting a maximum task queue;
2) redis control: the method comprises the steps that unoperated task management is issued to dispatching through kafka, and a primary task, a secondary task, a task heartbeat, an operation task List and a storage structure are controlled;
3) and (3) Mongo control: obtaining scheduling priority control of uncompleted tasks, inspection, task recovery and final recording through Redis control;
4) scheduling: splitting a task received by Redis control, splitting a main url into a plurality of sub urls, issuing the task according to the performance of each monitoring point and the current task state, and polling, updating at regular time, requesting and warehousing the task to perform control management;
5) a gateway: selecting tasks according to task priorities, the time sequence of creating the tasks and url names in ascending order according to task issuing conditions, and setting the scheduling priority of the current task to execute according to the latest scheduling priority;
6) crawler: a task is received and a url connection is scanned.
In this embodiment, the station in step 1) receives a website detection task: receiving a website detection task, executing url legality judgment by a middle website monitoring program, maintaining a url task list issued by a user, and outputting detection results under ISPs (internet service providers) such as different addresses ISP1, ISP2, ISP3, ISP4 and ISP5 according to a user issuing plan;
in the step 2), recording maintenance is carried out by Redis control through maintaining own db1, db2 and db3, and task management is coordinated and scheduled through kafka, wherein db1 controls addition of a List of tasks which are not issued and taskInfo; db2 records and maintains the List structure being run; db3 records and maintains the completed task List;
in the step 3), the Mongo control acquires the uncompleted tasks, the inspection and the task recovery and the scheduling priority control of the last record through the Redis control;
and 4) scheduling to split the task received by Redis control, splitting a main url into a plurality of sub urls, issuing the task according to the performance of each monitoring point and the current task state, performing control management on polling, timing updating, requesting, warehousing and the like on the task, performing locking control on the distributed type, and performing state marking and locking control on the successful task and all tasks failed by ISP.
Step 5), the gateway selects tasks according to task issuing conditions and in ascending order of task priority, time of creating tasks and url names, and sets current task scheduling priority for execution according to the latest scheduling priority;
and 6), the crawler receives the task sent by the gateway in the step 5), scans the url connection and completes detection.
In the step 4), one main url is split into a plurality of sub urls, the split sub urls are generated into a single executable task, the task management control mechanism in the step 2) and the step 3) is coordinated to complete task control, and the split sub url is sent to the step 6) to execute detection through the step 5). By analyzing the main url, each link in the main url page is generated based on the main url, and thus a plurality of independently accessible url sub-links are formed.
The method adopts the distributed execution of multiple monitoring points, thereby improving the detection efficiency; the problem of detection accuracy caused by the abnormality of a single detection point is solved; the problem of reverse-crawling detection of a specific website on a single detection point is solved.
Examples of the applications
1. And (3) testing environmental conditions:
testing the network environment: internet network
And (3) test server configuration:
Figure BDA0003320522030000031
Figure BDA0003320522030000041
2. the test results are compared with the prior art
As shown in fig. 2, compared with the program scanning time realized by the prior art and the patent technology, the scanning time and the scanning efficiency of the scanning using the patent technology are superior to those of the prior art.
Although the present invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from the scope of the present invention.

Claims (3)

1. An IPv6 website detection method based on distributed scheduling is characterized by comprising the following steps:
1) and (4) middle station management: receiving a website detection task, executing url validity judgment by a middle management program, removing duplication of the task, and limiting a maximum task queue;
2) redis control: the method comprises the steps that unoperated task management is issued to dispatching through kafka, and a primary task, a secondary task, a task heartbeat, an operation task List and a storage structure are controlled;
3) and (3) Mongo control: obtaining scheduling priority control of uncompleted tasks, inspection, task recovery and final recording through Redis control;
4) scheduling: splitting a task received by Redis control, splitting a main url into a plurality of sub urls, issuing the task according to the performance of each monitoring point and the current task state, and polling, updating at regular time, requesting and warehousing the task to perform control management;
5) a gateway: selecting tasks according to task priorities, the time sequence of creating the tasks and url names in ascending order according to task issuing conditions, and setting the scheduling priority of the current task to execute according to the latest scheduling priority;
6) crawler: a task is received and a url connection is scanned.
2. The IPv6 website detection method based on distributed scheduling as recited in claim 1, wherein: in the step 4), one main url is split into a plurality of sub urls, the split sub urls are generated into a single executable task, and the task management control mechanism is coordinated with the task management control mechanisms in the steps 2) and 3) to complete task control.
3. The IPv6 website detection method based on distributed scheduling as recited in claim 2, wherein: analyzing the main url in the step 4), and generating each sub-link based on the main url for each link in the main url page, thereby forming a plurality of independently accessible url sub-links.
CN202111244665.3A 2021-10-26 2021-10-26 IPv6 website detection method based on distributed scheduling Pending CN114095207A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111244665.3A CN114095207A (en) 2021-10-26 2021-10-26 IPv6 website detection method based on distributed scheduling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111244665.3A CN114095207A (en) 2021-10-26 2021-10-26 IPv6 website detection method based on distributed scheduling

Publications (1)

Publication Number Publication Date
CN114095207A true CN114095207A (en) 2022-02-25

Family

ID=80297580

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111244665.3A Pending CN114095207A (en) 2021-10-26 2021-10-26 IPv6 website detection method based on distributed scheduling

Country Status (1)

Country Link
CN (1) CN114095207A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112422330A (en) * 2020-11-06 2021-02-26 北京连星科技有限公司 Method for managing enterprise network IPv6 era transition full life cycle

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7836502B1 (en) * 2007-07-03 2010-11-16 Trend Micro Inc. Scheduled gateway scanning arrangement and methods thereof
CN103310012A (en) * 2013-07-02 2013-09-18 北京航空航天大学 Distributed web crawler system
CN106411578A (en) * 2016-09-12 2017-02-15 国网山东省电力公司电力科学研究院 Website monitoring system and method applicable to power industry
CN110020046A (en) * 2017-10-20 2019-07-16 中移(苏州)软件技术有限公司 A kind of data grab method and device
CN110147475A (en) * 2019-03-29 2019-08-20 汇通达网络股份有限公司 A kind of network data acquisition system of distributed deployment
CN112818201A (en) * 2021-02-07 2021-05-18 四川封面传媒有限责任公司 Network data acquisition method and device, computer equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7836502B1 (en) * 2007-07-03 2010-11-16 Trend Micro Inc. Scheduled gateway scanning arrangement and methods thereof
CN103310012A (en) * 2013-07-02 2013-09-18 北京航空航天大学 Distributed web crawler system
CN106411578A (en) * 2016-09-12 2017-02-15 国网山东省电力公司电力科学研究院 Website monitoring system and method applicable to power industry
CN110020046A (en) * 2017-10-20 2019-07-16 中移(苏州)软件技术有限公司 A kind of data grab method and device
CN110147475A (en) * 2019-03-29 2019-08-20 汇通达网络股份有限公司 A kind of network data acquisition system of distributed deployment
CN112818201A (en) * 2021-02-07 2021-05-18 四川封面传媒有限责任公司 Network data acquisition method and device, computer equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112422330A (en) * 2020-11-06 2021-02-26 北京连星科技有限公司 Method for managing enterprise network IPv6 era transition full life cycle
CN112422330B (en) * 2020-11-06 2023-05-30 北京连星科技有限公司 Method for managing enterprise network IPv6 intergrating migration full life cycle

Similar Documents

Publication Publication Date Title
US10949253B2 (en) Data forwarder for distributed data acquisition, indexing and search system
US7668957B2 (en) Partitioning social networks
CN104182288A (en) Method for automatically testing power consumption of server cluster system
CA2701107C (en) Method and apparatus for concurrent topology discovery
US8688681B1 (en) Identifying internet protocol addresses for internet hosting entities
CN104063425A (en) Method for querying data through database middleware and database middleware
CN108632111A (en) Service link monitoring method based on log
CN103870381A (en) Test data generating method and device
CN114095207A (en) IPv6 website detection method based on distributed scheduling
CN112291365A (en) Access balance processing method and device, computer equipment and storage medium
CN115269193A (en) Method and device for realizing distributed load balance in automatic test
Yang et al. An end-to-end and adaptive i/o optimization tool for modern hpc storage systems
CN106210159A (en) A kind of domain name analytic method and equipment
Ristov et al. Godeploy: Portable deployment of serverless functions in federated faas
Seidel et al. Data mining system architecture for industrial internet of things in electronics production
CN1336770A (en) Operation and maintenance of router and storage and explanation of configuration command
CN110515714A (en) A kind of task balance dispatching method based on group system
CN116016196A (en) Method and system for constructing system architecture topology in real time
CN107122246B (en) Intelligent numerical simulation operation management and feedback method
GB2464125A (en) Topology discovery comprising partitioning network nodes into groups and using multiple discovery agents operating concurrently in each group.
CN115904388A (en) Application program protocol analysis method, device, equipment and storage medium
CN114968287A (en) Method and system for automatically deploying project
CN101510830B (en) Method for recognizing expandable P2P flow
CN113572863A (en) Application acceleration method and system based on dynamic routing protocol
CN105721631A (en) Large-scale internet protocol (IP) address resources use method in orientation information grasping scenario

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination