CN109241483B - Website discovery method and system based on domain name recommendation - Google Patents
Website discovery method and system based on domain name recommendation Download PDFInfo
- Publication number
- CN109241483B CN109241483B CN201811008674.0A CN201811008674A CN109241483B CN 109241483 B CN109241483 B CN 109241483B CN 201811008674 A CN201811008674 A CN 201811008674A CN 109241483 B CN109241483 B CN 109241483B
- Authority
- CN
- China
- Prior art keywords
- domain name
- character string
- character
- candidate
- root
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Abstract
The invention relates to a website discovery method based on domain name recommendation, which comprises the following steps: randomly selecting any character arrangement combination in a domain name character set to obtain a root character string; forming candidate character strings by the root character strings; splicing the candidate character string and the candidate domain name suffix to form a recommended domain name; performing DNS analysis on the recommended domain name to judge that the recommended domain name is legal; and verifying whether the legal domain name has a corresponding website, and if so, acquiring the legal domain name as a target website.
Description
Technical Field
The invention belongs to the field of internet resource discovery, and particularly relates to a website discovery technology based on domain name recommendation.
Background
With the continuous development and evolution of the internet, various types of websites appear, and more website domain names need to be found out for the needs of supervision or search, where the website domain name is mainly a first-level domain name, which refers to a domain name with only one non-top-level domain name before a top-level domain name, such as sina.
The traditional method usually adopts an epidemic method to find more domain names, and the epidemic method comprises the following steps: the method comprises the steps of collecting data of a website, extracting more URLs (Uniform Resource locators) from the data, and searching for duplicates of the extracted URLs and known domain names to obtain more domain names. In addition, there are some domain name discovery methods based on network interception. The Zhongzhou national patent "a method and a system for searching for an unregistered website based on a multi-path data access mode", application number 201410299875.6, adopts a multi-path data access mode to obtain a domain name, screens out the unregistered domain name and forms a domain name seed bank; performing DNS analysis on the domain name which is not recorded to obtain a corresponding IP address; positioning an IP address to obtain an unregistered domain name library; and obtaining the information of the unregistered website through activity verification.
The above methods are relatively passive methods, and only if these domain names appear in the acquired web pages, or in the intercepted data stream, will they be acquired. For some relatively isolated web sites, it may be difficult to obtain, such as some personal blog web sites, which may be substantially difficult to discover due to too few linked-in sites.
Disclosure of Invention
Aiming at the problems, the invention provides a website discovery method based on domain name recommendation, which comprises the following steps: randomly selecting any character arrangement combination in a domain name character set to obtain a root character string; and forming a candidate character string by using the root character string: splicing the candidate character string and the candidate domain name suffix to form a recommended domain name; performing DNS analysis on the recommended domain name to judge that the recommended domain name is legal; and verifying whether the legal domain name has a corresponding website, and if so, acquiring the legal domain name as a target website.
The website discovery method takes the root character string as the candidate character string, or splices the root character string with a prefix character string and/or a suffix character string to form the candidate character string.
The website discovery method of the invention obtains the frequency S1 that the character string A1 with the length of M characters appears in the prefix position of the known domain name in all the known domain names, and if S1 is more than m.S, the character A1 is taken as the prefix character string; acquiring the frequency S2 of the character string A2 with the length of M characters in the suffix position of the known domain name in all the known domain names, and taking the character A2 as a suffix character string if S2 is greater than m.S; wherein S is 1/KMK is the number of characters in the domain name character set, M is a positive integer, M is a preset value, M is more than or equal to 2 and less than or equal to 4, and M is more than 1.
The website discovery method of the invention selects any number of characters from the domain name coincidence set to be arranged and combined to generate a plurality of character strings, and takes the character string with the character length of N as the root character string; wherein N is a positive integer, and N is more than or equal to 1 and less than or equal to 10.
The invention also relates to a website discovery system based on domain name recommendation, which comprises the following components:
the root character string module is used for randomly selecting any character arrangement combination in the domain name character set to obtain a root character string;
a candidate character string module for forming a candidate character string by the root character string;
the domain name generation module is used for splicing the candidate character string and a candidate domain name suffix to form a recommended domain name;
the domain name verification module is used for performing DNS analysis on the recommended domain name so as to judge the legal recommended domain name as a legal domain name;
and the website acquisition module is used for verifying whether the legal domain name has a corresponding website or not, and acquiring the legal domain name as a target website if the legal domain name has the corresponding website.
The website discovery system of the present invention, wherein the candidate string module further comprises: the first candidate character string module is used for taking the root character string as the candidate character string; and the second candidate character string module is used for splicing the root character string and the prefix character string and/or the suffix character string into the candidate character string.
The second candidate string module comprises: a prefix character string module for acquiring the prefix character string; acquiring the frequency S1 that a character string A1 with the length of M characters appears in the prefix position of the known domain name in all the known domain names, and if S1 is greater than m.S, using the character A1 as the prefix character string; a suffix string module to obtain the suffix string: acquiring the frequency S2 of the character string A2 with the length of M characters in all the known domain names at the position of suffixes of the known domain names, and taking the character A2 as a suffix character string if S2 is greater than M & S; wherein S is 1/KMK is the number of characters in the domain name character set, M is a positive integer, M is a preset value, M is more than or equal to 2 and less than or equal to 4, and M is more than 1.
The website discovery system of the present invention, wherein the root string module specifically comprises: selecting any number of characters from the domain name character set to be arranged and combined to generate a plurality of character strings, and taking the character string with the character length of N as the root character string; wherein N is a positive integer, and N is more than or equal to 1 and less than or equal to 10.
Drawings
Fig. 1 is a flowchart of a website discovery method based on domain name recommendation according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a website discovery system based on domain name recommendation according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more clearly understood, the following describes in detail a website discovery method and system based on domain name recommendation, which are provided by the present invention, with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
The existing domain name discovery method mostly adopts a passive mode, and only after enough data streams are acquired, undiscovered domain names are acquired from the data streams through data stream analysis, but if a certain website belongs to an isolated website and the linked websites are few, the data streams containing URLs of the isolated website are difficult to acquire, and further the website domain names are difficult to acquire.
The invention discloses a website discovery method based on domain name recommendation, which adopts a technology of autonomously generating a domain name, selects one or more characters from a domain name character set to be arranged and combined to obtain a root character string, forms a candidate character string by the root character string, sequentially splices the candidate character string with a candidate domain name suffix to form a recommended domain name, and obtains a legal domain name after performing DNS analysis on the recommended domain name to verify the legality of the recommended domain name. Fig. 1 is a flowchart of a website discovery method based on domain name recommendation according to an embodiment of the present invention, and as shown in fig. 1, specifically, the website discovery method based on domain name recommendation according to the present invention includes:
Fig. 2 is a schematic structural diagram of a website discovery system based on domain name recommendation according to an embodiment of the present invention. As shown in fig. 2, a website discovery system based on domain name recommendation of the present invention includes: the system comprises a root character string module, a candidate character string module, a domain name generation module, a domain name verification module and a website acquisition module; the root character string module is used for randomly selecting any character arrangement combination in a domain name character set to obtain a root character string; the candidate character string module is used for forming candidate character strings by the root character strings; the domain name generation module is used for splicing the candidate character strings and the candidate domain name suffixes to form a recommended domain name; the domain name verification module is used for performing DNS analysis on the recommended domain name so as to judge the legal recommended domain name as a legal domain name; the website acquisition module is used for verifying whether the legal domain name has a corresponding website, and if the legal domain name has the corresponding website, acquiring the target website.
The root character string module of the website discovery system specifically comprises: selecting any number of characters from the domain name character set to be arranged and combined to generate a plurality of character strings, and taking the character string with the character length of N as the root character string; wherein N is a positive integer, and N is more than or equal to 1 and less than or equal to 10. For example, the selected character is any character in the domain name character set, such as the double characters ab, g2, cc, the three characters abc, hhu, ttt, yy6, the four characters abcd, jjjyh, 7fff, which is not limited in the present invention; the method selects any character from the domain name character set to be arranged and combined, and constructs a root set of the character string with the length of N characters in the generated character string, because the increase of N can rapidly increase the scale of the root set, the invention limits the range of N, and the root character string does not exceed the length of 10 characters.
The website discovery system of the invention takes the root character string as the candidate character string and splices the candidate character string with the candidate domain suffix to form the recommended domain name, and further, further optimization can be carried out on the basis of the root character string, therefore, the candidate character string module specifically comprises: a first candidate character string module and a second candidate character string module; the first candidate character string module is used for taking a root character string as a candidate character string; and the second candidate character string module is used for splicing the root character string and the prefix character string and/or the suffix character string into candidate character strings. Therefore, the second candidate string module further includes a prefix string module and a suffix string module.
The prefix character string module is used for acquiring the prefix character string, namely acquiring the frequency S1 of the prefix position of the character string A1 with the length of M characters in all known domain names, and if S1 is greater than m.S, using the character A1 as the prefix character string; the suffix character string module is used for acquiring a suffix character string, namely acquiring the frequency S2 of the suffix position of a character string A2 with the length of M characters in all known domain names, and taking the character A2 as the suffix character string if S2 is greater than M & S; wherein S is 1/KMK is the number of characters in the domain name character set, M is a positive integer, M is a preset value, M is more than or equal to 2 and less than or equal to 4, and M is more than 1.
Specifically, all the known domain names are counted to obtain the double-character A2Number of occurrences at prefix position and suffix position to calculate the double character A, respectively2The frequency S1 appearing at the prefix position and the frequency S2 appearing at the suffix position are set according to the experience, m is larger than 1, when K characters are shared in the existing domain name, the frequency S of any double character appearing at the prefix position is 1/K2If S1 > n.S ═ m/K2Then with double character A2Is a prefix character string, if S2 > m.S ═ m/K2Then with double character A2A suffix string that is a double character; because the characters in the domain name are required to meet the requirements of the domain name character set, according to the range of the domain name character set, K is 37; for three characters A3And four characters A4Processing by adopting the method to obtain prefix character strings and suffix character strings of three characters and four characters; in some embodiments, for example, the five-character and six-character strings may also be processed by the method described above to obtain prefix strings and suffix strings of the five-character and six-character strings, which is not limited by the present invention.
The website acquisition module is used for verifying the obtained legal domain name to judge whether a corresponding website exists, if so, acquiring the corresponding website as a target website, and if not, discarding the legal domain name; the invention adopts curl tool to verify legal domain name, and can also use other software tools with website verification function to verify, the invention is not limited by this.
The invention adopts the technology of automatically generating the domain name instead of spreading discovery or other passive acquisition methods, avoids the limitation of passive discovery and can detect more website domain names; the domain name is generated by adopting a root discovery and combination method, so that the detection success rate and pertinence of active detection are improved.
The present invention may be embodied in other specific forms without departing from the spirit or scope of the invention, and it should be understood that various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.
Claims (8)
1. A website discovery method based on domain name recommendation is characterized by comprising the following steps:
randomly selecting any character arrangement combination in a domain name character set to obtain a root character string;
taking the root character string as a candidate character string, or splicing the root character string with a prefix character string and/or a suffix character string to form the candidate character string;
splicing the candidate character string and the candidate domain name suffix to form a recommended domain name;
performing DNS analysis on the recommended domain name to judge that the recommended domain name is legal;
and verifying whether the legal domain name has a corresponding website, and if so, acquiring the legal domain name as a target website.
2. The method of claim 1, wherein the frequency S1 is obtained when the M character strings A1 appear at the prefix position of the known domain name among all the known domain names, if S1 >m.S, the character A1 is used as the prefix character string; wherein S is 1/KMK is the number of characters in the domain name character set, M is a positive integer, M is a preset value, M is more than or equal to 2 and less than or equal to 4, and M is more than 1.
3. The website discovery method of claim 1, wherein the frequency S2 of occurrence of a character string a2 having a length of M characters in a suffix position of the known domain name among all known domain names is obtained, and if S2 > M · S, the character string a2 is used as a suffix character string; wherein S is 1/KMK is the number of characters in the domain name character set, M is a positive integer, M is a preset value, M is more than or equal to 2 and less than or equal to 4, and M is more than 1.
4. The website discovery method according to claim 1, wherein any number of characters are selected from the domain name character set for permutation and combination to generate a plurality of character strings, and the character string with a character length of N is used as the root character string; wherein N is a positive integer, and N is more than or equal to 1 and less than or equal to 10.
5. A website discovery system based on domain name recommendation, comprising:
the root character string module is used for randomly selecting any character arrangement combination in the domain name character set to obtain a root character string;
a candidate character string module for forming a candidate character string by the root character string; the method comprises the following steps: the first candidate character string module is used for taking the root character string as the candidate character string; the second candidate character string module is used for splicing the root character string and the prefix character string and/or the suffix character string into the candidate character string;
the domain name generation module is used for splicing the candidate character string and a candidate domain name suffix to form a recommended domain name;
the domain name verification module is used for performing DNS analysis on the recommended domain name so as to judge the legal recommended domain name as a legal domain name;
and the website acquisition module is used for verifying whether the legal domain name has a corresponding website or not, and acquiring the legal domain name as a target website if the legal domain name has the corresponding website.
6. A website discovery system as defined in claim 5, wherein the second candidate string module comprises: a prefix character string module for acquiring the prefix character string; acquiring the frequency S1 that a character string A1 with the length of M characters appears in the prefix position of the known domain name in all the known domain names, and if S1 is greater than m.S, using the character A1 as the prefix character string; wherein S is 1/KMK is the number of characters in the domain name character set, M is a positive integer, M is a preset value, M is more than or equal to 2 and less than or equal to 4, and M is more than 1.
7. The website discovery system of claim 5 wherein the second candidate string module further comprises: a suffix string module for obtaining the suffix string; acquiring the frequency S2 that a character string A2 with the length of M characters appears in the suffix position of the known domain name in all the known domain names, and taking the character A2 as a suffix character string if S2 is greater than M & S; wherein S is 1/KMK is the number of characters in the domain name character set, M is a positive integer, M is a preset value, M is more than or equal to 2 and less than or equal to 4, and M is more than 1.
8. The website discovery system of claim 5, wherein the root string module specifically comprises: selecting any number of characters from the domain name character set to be arranged and combined to generate a plurality of character strings, and taking the character string with the character length of N as the root character string; wherein N is a positive integer, and N is more than or equal to 1 and less than or equal to 10.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811008674.0A CN109241483B (en) | 2018-08-31 | 2018-08-31 | Website discovery method and system based on domain name recommendation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811008674.0A CN109241483B (en) | 2018-08-31 | 2018-08-31 | Website discovery method and system based on domain name recommendation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109241483A CN109241483A (en) | 2019-01-18 |
CN109241483B true CN109241483B (en) | 2021-10-12 |
Family
ID=65068896
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811008674.0A Active CN109241483B (en) | 2018-08-31 | 2018-08-31 | Website discovery method and system based on domain name recommendation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109241483B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113285979B (en) * | 2021-04-15 | 2022-11-29 | 北京奇艺世纪科技有限公司 | Network request processing method, device, terminal and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104113539A (en) * | 2014-07-11 | 2014-10-22 | 哈尔滨工业大学(威海) | Phishing website engine detection method and device |
CN107770132A (en) * | 2016-08-18 | 2018-03-06 | 中兴通讯股份有限公司 | A kind of method and device detected to algorithm generation domain name |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8504583B1 (en) * | 2012-02-14 | 2013-08-06 | Microsoft Corporation | Multi-domain recommendations |
CN104065532B (en) * | 2014-06-26 | 2018-08-14 | 国家计算机网络与信息安全管理中心 | A kind of non-recorded website search method and system based on multichannel data access way |
CN106302438A (en) * | 2016-08-11 | 2017-01-04 | 国家计算机网络与信息安全管理中心 | A kind of method of actively monitoring fishing website of Behavior-based control feature by all kinds of means |
CN106302440B (en) * | 2016-08-11 | 2019-12-10 | 国家计算机网络与信息安全管理中心 | Method for acquiring suspicious phishing websites through multiple channels |
CN106503125B (en) * | 2016-10-19 | 2019-10-15 | 中国互联网络信息中心 | A kind of data source extended method and device |
CN108124025A (en) * | 2017-12-14 | 2018-06-05 | 北京锐安科技有限公司 | Website converts detection method, the device and system of domain name |
-
2018
- 2018-08-31 CN CN201811008674.0A patent/CN109241483B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104113539A (en) * | 2014-07-11 | 2014-10-22 | 哈尔滨工业大学(威海) | Phishing website engine detection method and device |
CN107770132A (en) * | 2016-08-18 | 2018-03-06 | 中兴通讯股份有限公司 | A kind of method and device detected to algorithm generation domain name |
Non-Patent Citations (1)
Title |
---|
网络钓鱼欺诈检测技术研究;张茜;《网络与信息安全学报》;20170731;第3卷(第7期);第7-10页 * |
Also Published As
Publication number | Publication date |
---|---|
CN109241483A (en) | 2019-01-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Marchal et al. | PhishStorm: Detecting phishing with streaming analytics | |
US9258289B2 (en) | Authentication of IP source addresses | |
US8990936B2 (en) | Method and device for detecting flood attacks | |
JP5989919B2 (en) | URL matching apparatus, URL matching method, and URL matching program | |
US20170053031A1 (en) | Information forecast and acquisition method based on webpage link parameter analysis | |
CN107046586B (en) | A kind of algorithm generation domain name detection method based on natural language feature | |
JP5415390B2 (en) | Filtering method, filtering system, and filtering program | |
CN105635064B (en) | CSRF attack detection method and device | |
CN106789849B (en) | CC attack identification method, node and system | |
Marchal et al. | PhishScore: Hacking phishers' minds | |
JP5465651B2 (en) | List generation method, list generation apparatus, and list generation program | |
CN114328962A (en) | Method for identifying abnormal behavior of web log based on knowledge graph | |
US8392421B1 (en) | System and method for internet endpoint profiling | |
CN110233821B (en) | Detection and safety scanning system and method for network space of intelligent equipment | |
He et al. | Malicious domain detection via domain relationship and graph models | |
CN109241483B (en) | Website discovery method and system based on domain name recommendation | |
CN109547294B (en) | Networking equipment model detection method and device based on firmware analysis | |
JP2006215735A (en) | Duplicate website detection device | |
CN106227741A (en) | A kind of extensive URL matching process based on multilevel hash index chained list | |
Marchal et al. | Semantic exploration of DNS | |
CN108200191B (en) | Utilize the client dynamic URL associated script character string detection system of perturbation method | |
CN106161352A (en) | A kind of matching process and client, server and matching unit | |
CN108170812B (en) | Data filtering method and equipment | |
Na et al. | Service identification of internet-connected devices based on common platform enumeration | |
JP2012118577A (en) | Illegal domain detection device, illegal domain detection method and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |