CN112804210A - Data association method and device, electronic equipment and computer-readable storage medium - Google Patents

Data association method and device, electronic equipment and computer-readable storage medium Download PDF

Info

Publication number
CN112804210A
CN112804210A CN202011630986.2A CN202011630986A CN112804210A CN 112804210 A CN112804210 A CN 112804210A CN 202011630986 A CN202011630986 A CN 202011630986A CN 112804210 A CN112804210 A CN 112804210A
Authority
CN
China
Prior art keywords
domain name
information
detected
malicious
registration information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011630986.2A
Other languages
Chinese (zh)
Other versions
CN112804210B (en
Inventor
董秀坤
万耀东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Knownsec Information Technology Co Ltd
Original Assignee
Beijing Knownsec Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Knownsec Information Technology Co Ltd filed Critical Beijing Knownsec Information Technology Co Ltd
Priority to CN202011630986.2A priority Critical patent/CN112804210B/en
Publication of CN112804210A publication Critical patent/CN112804210A/en
Application granted granted Critical
Publication of CN112804210B publication Critical patent/CN112804210B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1483Countermeasures against malicious traffic service impersonation, e.g. phishing, pharming or web spoofing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/45Network directories; Name-to-address mapping
    • H04L61/4505Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
    • H04L61/4511Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The application provides a data association method, a data association device, electronic equipment and a computer readable storage medium, and relates to the field of information security. The method comprises the steps of obtaining sample data, wherein the sample data comprises a plurality of domain names to be detected, inquiring domain name registration information and IP information corresponding to each domain name to be detected, extracting page keywords of a website corresponding to the domain names to be detected, determining malicious domain names in the domain names to be detected according to the domain name registration information and the IP information and page similarity between the website corresponding to the domain names to be detected and an official website corresponding to the page keywords, and associating different malicious domain names according to the domain name registration information and the IP information corresponding to each malicious domain name to construct an associated network. According to the method and the device, different malicious domain names are associated from multiple dimensions of domain name registration information and IP information, an association network is built, and single domain name association is broken, so that a refined association relation is formed, and the accuracy of association of a cheating website is improved.

Description

Data association method and device, electronic equipment and computer-readable storage medium
Technical Field
The present application relates to the field of information security, and in particular, to a data association method, apparatus, electronic device, and computer-readable storage medium.
Background
Following the evolution of the internet, the threat of fraudulent websites has evolved into a commercial activity driven by economic interest. The fraud website induces the user to operate on the website through the 'disguised website service content', and has the danger of being cheated to fetch property or private information and the like. At present, the prior art for cheating websites mainly builds a website credit blacklist mechanism, which includes two ways: one is a basic technique for importing fraudulent websites into a blacklist according to the judgment of a single characteristic or event; the other method is that in the basic blacklist technology, aiming at the detected cheating websites, the hidden chains contained in the cheating websites are subjected to correlation analysis to generate a website credit blacklist.
However, most of the current cheating websites imitate a single page to cheat, and the conditions of dark chains, outer chains and the like do not exist basically, so that the association analysis through the link relation of the cheating website is difficult to achieve.
Disclosure of Invention
In view of the above, an object of the present application is to provide a data association method, apparatus, electronic device and computer-readable storage medium, so as to improve accuracy of associating a fraudulent website.
In order to achieve the above purpose, the embodiments of the present application employ the following technical solutions:
in a first aspect, the present application provides a data association method, including:
acquiring sample data; the sample data comprises a plurality of domain names to be detected;
inquiring domain name registration information and IP information corresponding to each domain name to be detected, and extracting page keywords of a website corresponding to the domain name to be detected;
according to the domain name registration information, the IP information and the page similarity between the website corresponding to the domain name to be detected and the official website corresponding to the page keyword, determining a malicious domain name in the domain names to be detected;
and associating different malicious domain names according to the domain name registration information and the IP information corresponding to each malicious domain name to construct an associated network.
In an optional embodiment, the step of determining a malicious domain name in the plurality of domain names to be detected according to the domain name registration information, the IP information, and the page similarity between the website corresponding to the domain name to be detected and the official website corresponding to the page keyword includes:
and if the page similarity corresponding to any domain name to be detected is greater than a set threshold value, and the domain name registration information and the IP information corresponding to the domain name to be detected are not matched with the official website, determining that the domain name to be detected is a malicious domain name.
In an optional embodiment, the step of associating different malicious domain names according to the domain name registration information and the IP information corresponding to each malicious domain name to construct an associated network includes:
associating each malicious domain name with domain name registration information and IP information corresponding to the malicious domain name; if first overlapping information exists between domain name registration information corresponding to any two malicious domain names, the two malicious domain names are associated through the first overlapping information, and/or if second overlapping information exists between IP information corresponding to any two malicious domain names, the two malicious domain names are associated through the second overlapping information, and therefore an associated network is obtained.
In an alternative embodiment, the method further comprises:
performing a reverse-check operation based on first overlapping information or second overlapping information in the associated network to obtain a target domain name registered by using the first overlapping information or the second overlapping information; the target domain name is other than a malicious domain name in the associated network;
and under the condition that the target domain name is determined to be a malicious domain name, updating the associated network according to the target domain name.
In an optional embodiment, after the step of obtaining sample data, the method further includes:
performing survivability detection on each domain name to be detected in the sample data to filter the inactivated domain name to be detected in the sample data and keep the alive domain name to be detected;
and if the inactivated domain name to be detected is detected to be used again, adding the domain name to be used again into the domain name black library for monitoring.
In an optional embodiment, after the step of determining a malicious domain name in the plurality of domain names to be detected according to the domain name registration information, the IP information, and the page similarity between the website corresponding to the domain name to be detected and the official website corresponding to the page keyword, the method further includes:
and storing the malicious domain name, domain name registration information corresponding to the malicious domain name and IP information in a cloud server.
In an optional embodiment, the domain name registration information includes a domain name registrar, registrant information, and sub-domain name information; the IP information comprises an IP address, an IP geographical position and IP registration information.
In a second aspect, the present application provides a data association apparatus, comprising:
the data acquisition module is used for acquiring sample data; the sample data comprises a plurality of domain names to be detected;
the data query module is used for querying domain name registration information and IP information corresponding to each domain name to be detected and extracting page keywords of a website corresponding to the domain name to be detected;
the malicious domain name detection module is used for determining a malicious domain name in the plurality of domain names to be detected according to the domain name registration information, the IP information and the page similarity between the website corresponding to the domain name to be detected and the official website corresponding to the page keyword;
and the data association module is used for associating different malicious domain names according to the domain name registration information and the IP information corresponding to each malicious domain name so as to construct an associated network.
In a third aspect, the present application provides an electronic device comprising a processor and a memory, wherein the memory stores a computer program, and the processor implements the method of any one of the preceding embodiments when executing the computer program.
In a fourth aspect, the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any of the preceding embodiments.
According to the data association method, the data association device, the electronic equipment and the computer readable storage medium, sample data is obtained, the sample data comprises a plurality of domain names to be detected, domain name registration information and IP information corresponding to each domain name to be detected are inquired, page keywords of a website corresponding to the domain name to be detected are extracted, malicious domain names in the domain names to be detected are determined according to the domain name registration information and the IP information and page similarity between the website corresponding to the domain name to be detected and an official website corresponding to the page keywords, and different malicious domain names are associated according to the domain name registration information and the IP information corresponding to each malicious domain name to construct an associated network. According to the method and the device, the association is carried out on different malicious domain names from multiple dimensions of domain name registration information and IP information, the association network is constructed, and single domain name association is broken through, so that a refined association relation is formed, and the accuracy of association of a cheating website is improved.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a block diagram of an electronic device provided in an embodiment of the present application;
FIG. 2 is a flow chart of a data association method provided by an embodiment of the present application;
FIG. 3 shows a schematic diagram of an association network;
FIG. 4 is a schematic flow chart illustrating a data association method according to an embodiment of the present disclosure;
FIG. 5 is a schematic flow chart illustrating a data association method provided by an embodiment of the present application;
FIG. 6 is a schematic flow chart illustrating a data association method provided by an embodiment of the present application;
FIG. 7 is a functional block diagram of a data association apparatus provided by an embodiment of the present application;
fig. 8 is a block diagram illustrating another function of a data association apparatus according to an embodiment of the present disclosure.
Icon: 100-an electronic device; 110-a memory; 120-a processor; 130-a communication module; 700-data association means; 710-a data acquisition module; 720-data query module; 730-malicious domain name detection module; 740-a data association module; 750-a data update module; 760-viability detection module; 770-data storage module.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
It is noted that relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Fig. 1 is a block diagram of an electronic device 100 according to an embodiment of the present disclosure. The electronic device 100 may be a tablet Computer, a PC (Personal Computer), or the like, and includes a memory 110, a processor 120, and a communication module 130. The memory 110, the processor 120, and the communication module 130 are electrically connected to each other directly or indirectly to enable data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines.
The memory 110 is used to store programs or data. The Memory 110 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.
The processor 120 is used to read/write data or programs stored in the memory 110 and perform corresponding functions. For example, the processor 120 may implement the data association method disclosed in the embodiment of the present application when executing the computer program stored in the memory 110.
The communication module 130 is used for establishing a communication connection between the electronic device 100 and another communication terminal through a network, and for transceiving data through the network.
It should be understood that the configuration shown in fig. 1 is merely a schematic diagram of the configuration of the electronic device 100, and that the electronic device 100 may include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.
Embodiments of the present application further provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by the processor 120, the computer program may implement the data association method disclosed in the embodiments of the present application.
Fig. 2 is a schematic flow chart of a data association method according to an embodiment of the present application. It should be noted that the data association method provided in the embodiment of the present application is not limited by fig. 2 and the following specific sequence, and it should be understood that, in other embodiments, the sequence of some steps in the data association method provided in the embodiment of the present application may be interchanged according to actual needs, or some steps in the data association method may also be omitted or deleted. The data association method can be applied to the electronic device 100 shown in fig. 1, and the specific flow shown in fig. 2 will be described in detail below.
Step S201, sample data is obtained; the sample data includes a plurality of domain names to be detected.
In this embodiment, the domain name to be detected may be a domain name of a currently found fraudulent website or a domain name of a suspected fraudulent website, for example, a published malicious domain name, data in a domain name black library monitored daily, and reported blacklist website data.
Step S202, domain name registration information and IP information corresponding to each domain name to be detected are inquired, and page keywords of a website corresponding to the domain name to be detected are extracted.
In this embodiment, domain name registration information and IP information corresponding to each domain name to be detected may be obtained through a public query interface (e.g., chaniz. com, etc.) in a query, and meanwhile, a heuristic algorithm is used to extract page keywords of a website corresponding to the domain name to be detected, and an organization or an enterprise to which the website belongs may be determined through the page keywords, so as to determine an official website (a regular website) corresponding to the organization or the enterprise.
Step S203, according to the domain name registration information, the IP information and the page similarity between the website corresponding to the domain name to be detected and the official website corresponding to the page keyword, determining the malicious domain name in the domain names to be detected.
In this embodiment, since the domain name to be detected in the sample data includes the domain name of the suspected fraudulent website, after acquiring domain name registration information and IP information corresponding to each domain name to be detected and extracting the page keyword of the website corresponding to the domain name to be detected, the similarity between the website corresponding to the domain name to be detected and the official website corresponding to the page keyword may be compared, the page similarity between the website corresponding to the domain name to be detected and the official website may be judged, and whether the domain name to be detected is a malicious domain name may be determined by combining the extracted domain name registration information and the IP information. When the page similarity is calculated, the website corresponding to the domain name to be detected and the official website corresponding to the page keyword can be compared by simultaneously combining the webpage text structure and the page visual effect, and the page similarity is further obtained. For the domain name to be detected determined to be the malicious domain name, the domain name to be detected and the domain name registration information and the IP information corresponding to the domain name to be detected can be marked with the malicious feature (i.e. adding the malicious label).
And step S204, associating different malicious domain names according to the domain name registration information and the IP information corresponding to each malicious domain name to construct an associated network.
In this embodiment, the electronic device 100 may perform multiple splitting and repeated association on domain name registration information and IP information corresponding to each malicious domain name, so that information with a multi-dimensional malicious label is subjected to comprehensive collision and association, and thus different malicious domain names are associated with each other to construct an association network. The associated network may include all malicious domain names in the sample data, and domain name registration information and IP information corresponding to each malicious domain name.
According to the data association method provided by the embodiment of the application, sample data is obtained, the sample data comprises a plurality of domain names to be detected, domain name registration information and IP information corresponding to each domain name to be detected are inquired, page keywords of a website corresponding to the domain names to be detected are extracted, malicious domain names in the domain names to be detected are determined according to the domain name registration information, the IP information and page similarity between the website corresponding to the domain names to be detected and an official website corresponding to the page keywords, and different malicious domain names are associated according to the domain name registration information and the IP information corresponding to each malicious domain name to construct an associated network. According to the method and the device, the association is carried out on different malicious domain names from multiple dimensions of domain name registration information and IP information, the association network is constructed, and single domain name association is broken through, so that a refined association relation is formed, and the accuracy of association of a cheating website is improved.
Optionally, in this embodiment, the domain name registration information may include a domain name registrar, registrant information, and sub-domain name information; the IP information comprises an IP address, an IP geographical position and IP registration information.
The domain name registration information may be understood as WHOIS information corresponding to the domain name to be detected. WHOIS, read as "Who is," is a transport protocol for IP and owner information used to query domain names. In brief, WHOIS is a database used to query whether a domain name has been registered, and to register details of the domain name (e.g., domain name owner, domain name registrar). The registrant information can include information such as the name, mobile phone number, mailbox, and the like of the registrant. The IP registration information may be understood as IP WHOIS information, such as information about the person using IP and the person using IP (person/e-mail/address/phone).
In an embodiment, the step S203 may specifically include: and if the page similarity corresponding to any domain name to be detected is greater than a set threshold value, and the domain name registration information and the IP information corresponding to the domain name to be detected are not matched with the official website, determining that the domain name to be detected is a malicious domain name.
That is, the website corresponding to the domain name to be detected is compared with the official website for page similarity, the domain name registration information and the IP information of the website corresponding to the domain name to be detected are compared when the page similarity is larger than a set threshold value, the domain name registration information and the IP information of the website corresponding to the domain name to be detected are not matched when the page similarity is found, and the website corresponding to the domain name to be detected is determined as a fraudulent website, and accordingly, the domain name to be detected is a malicious domain name.
It should be noted that, in practical applications, in order to further improve the detection accuracy of the malicious domain name, while extracting domain name registration information, IP information, and page keywords corresponding to the domain name to be detected, website filing information (which may include information such as a name of a host, properties of a host, a website name, and audit time) corresponding to each domain name to be detected may also be queried, and then, based on a comparison result of the domain name registration information, the IP information, and the website filing information corresponding to the domain name to be detected and an official website, and a comparison result of page similarity between the website corresponding to the domain name to be detected and the official website, whether the domain name to be detected is the malicious domain name or not is comprehensively determined.
For example, for a certain domain name to be detected, the contents of website record information, domain name registration information, IP information and the like corresponding to the domain name to be detected can be extracted, meanwhile, a heuristic algorithm is used for extracting page keywords of the website corresponding to the domain name to be detected, if the extracted page keywords are found to be related to the XX bank, the website corresponding to the domain name to be detected is compared with the official website of the XX bank, the similarity of the pages is found to be higher than a set threshold value and is highly similar, and meanwhile, the website record information, the domain name registration information, the IP geographical position and the like of the two are compared, the website corresponding to the domain name to be detected is found to be very similar to the official website of the XX bank, but the website record information and the domain name registration information are both different from the official website, and the IP geographical position corresponding to the domain name to be detected is out of the country, and if the enterprise to which the official website belongs is a domestic enterprise, the website corresponding to the domain name to be detected is judged to be a fraudulent website, and the domain name to be detected is a malicious domain name.
According to the data association method provided by the embodiment of the application, corresponding domain name registration information, IP information and page keywords can be extracted from a domain name to be detected in sample data, a mechanism or an enterprise to which the website belongs is judged based on the page keywords, the website corresponding to the domain name to be detected and an official website are compared in page similarity, the domain name registration information and the IP information of the website corresponding to the domain name to be detected and the official website are compared at the same time, and when the page similarity corresponding to the domain name to be detected is larger than a set threshold value and the domain name registration information and the IP information corresponding to the website corresponding to the domain name to be detected and the official website are not matched, the domain name to be detected. Therefore, the comparison of domain name registration information and IP information, the detection of webpage content and the comparison of page similarity are simultaneously used as the basis of malicious domain name detection, whether the domain name to be detected is a malicious domain name or not is judged by mutual combination and mutual assistance, and the accuracy of the identification of a fraudulent website is effectively improved.
In one embodiment, the electronic device 100 may construct the association network as follows, that is, the step S204 may include: associating each malicious domain name with domain name registration information and IP information corresponding to the malicious domain name; if first overlapping information exists between domain name registration information corresponding to any two malicious domain names, the two malicious domain names are associated through the first overlapping information, and/or if second overlapping information exists between IP information corresponding to any two malicious domain names, the two malicious domain names are associated through the second overlapping information, and therefore an associated network is obtained.
For example, suppose that the malicious domain Name in the sample data includes domain Name 1, domain Name 2, and domain Name n, where domain Name 1, domain Name 2, and domain Name n are respectively associated with their corresponding domain registration information and IP information, including an IP address, a Name (Name) of a registrant, a Phone number (Phone), a mailbox (Email), and the like, and if the IP address associated with domain Name 1 is the same as the IP address associated with domain Name 2, the IP address can be used as second overlapping information to associate domain Name 1 and domain Name 2; if the name of the registrant associated with the domain name 2 is the same as the name of the registrant associated with the domain name n, the name of the registrant is used as first overlapping information, and the domain name 2 and the domain name n are associated, so that the association network shown in fig. 3 can be constructed.
Optionally, referring to fig. 4, the data association method provided in the embodiment of the present application may further include:
step S401, performing back-check operation based on the first overlapping information or the second overlapping information in the associated network to obtain a target domain name registered by using the first overlapping information or the second overlapping information; wherein the target domain name is other than the malicious domain name in the associated network.
In this embodiment, since the first overlapping information is an overlapping portion of domain name registration information corresponding to two or more malicious domain names in the associated network, and the second overlapping information is an overlapping portion of IP information corresponding to two or more malicious domain names in the associated network, the first overlapping information and the second overlapping information belong to node data with strong association in the associated network. For the node data with stronger relevance in the associated network, by performing back-check operation on the node data, more information related to the node data can be obtained, for example, other related domain names (i.e., target domain names) registered by using the node data. The back-check operation can comprise means of back-check of the contact person, back-check of the IP address, back-check of the contact information and the like.
Step S402, under the condition that the target domain name is determined to be a malicious domain name, updating the associated network according to the target domain name.
In this embodiment, the target domain name obtained through the reverse check operation may be detected to determine whether the target domain name is a malicious domain name, so as to achieve the purposes of detecting the malicious domain name in advance and supplementing an existing blacklist. Meanwhile, for the malicious domain names detected in advance, the association relation with other malicious domain names can be established in the association network according to the domain name registration information and the IP information corresponding to the malicious domain names, so that the association network is continuously expanded, and the dynamic update of data is realized.
For example, if the contact address corresponding to a certain malicious domain name in the associated network is node data with strong association, the contact address may be back-checked to query other related domain names registered by using the contact address, and then whether malicious behaviors exist in the other related domain names registered by using the contact address is detected, so as to determine whether the other related domain names are malicious domain names.
According to the data association method provided by the embodiment of the application, the electronic device 100 can perform reverse check detection on node data with strong association in the association network, deeply excavate the data according to the characteristic of resource use commonality, gradually identify more malicious domain names and add the malicious domain names into a blacklist, perform re-association marking, and continuously add new information related to the malicious domain names into the association network, so that the association network is continuously expanded, and dynamic update of the data is realized; the method is beneficial to accurate analysis and association detection of the cheating website, helps people to find potential threats, and controls risks in the stage that the risks are not generated or are just generated, so that the cheating website is prevented more effectively.
Optionally, referring to fig. 5, after step S201, the data association method provided in the embodiment of the present application may further include:
step S501, survivability detection is carried out on each domain name to be detected in the sample data so as to filter out the inactivated domain name to be detected in the sample data and keep the alive domain name to be detected.
Step S502, if the inactivated domain name to be detected is detected to be used again, the domain name to be used again is added into the domain name black library again for monitoring.
In this embodiment, the domain name of the general website that is alive can be queried with the corresponding IP information and WHOIS information, while the domain name that is inactive cannot be queried (the website cannot access). Therefore, after sample data is obtained, survivability detection needs to be performed on each domain name to be detected in the sample data, those inactivated domain names to be detected are filtered out, and the surviving domain names to be detected are reserved; and setting a regular detection for the inactivated domain name to be detected, and if the domain name to be detected is found to be used again, adding the used domain name again into the domain name black library for monitoring so as to keep the accuracy of the data.
It should be noted that step S502 may be executed before step S202, after step S202, or simultaneously with step S202, which is not limited in the embodiment of the present application.
According to the data association method provided by the embodiment of the application, survivability detection is carried out on the domain name to be detected in the sample data, and for the surviving domain name to be detected, corresponding domain name registration information and IP information can be inquired, so that whether the domain name is a malicious domain name or not is detected; and the inactivated domain name to be detected can be regularly detected whether to be re-registered for use, so that the accuracy and the activity of data are effectively ensured.
Optionally, referring to fig. 6, after step S203, the data association method provided in the embodiment of the present application may further include:
step S601, storing the malicious domain name, domain name registration information corresponding to the malicious domain name, and IP information in the cloud server.
In this embodiment, the electronic device 100 may first perform survivability detection on the input sample data, filter out inactivated data, query domain registration information, IP information, and page keywords corresponding to a surviving domain name to be detected, further detect a malicious domain name and add a malicious label according to the domain registration information, the IP information, and page similarity between a website corresponding to the domain name to be detected and an official website corresponding to the page keywords, and then store the malicious domain name, the domain registration information corresponding to the malicious domain name, and the IP information in the cloud server for subsequent association analysis. Step S601 may be executed before step S204, or after step S204, or may be executed simultaneously with step S204, which is not limited in this embodiment of the application.
It should be noted that, in practical applications, the website corresponding to the main domain name may be accessed according to the domain name to be detected, keywords in the web HTML tag may be extracted and classified, and may be counted as an industry category, that is, an industry classification tag, such as a bank, a future, a credit, etc., is added to each domain name to be detected, so that the current situation of a certain industry is conveniently determined according to the tag data.
In order to execute the corresponding steps in the above embodiments and various possible manners, an implementation manner of the data association apparatus is given below. Referring to fig. 7, fig. 7 is a functional block diagram of a data association apparatus 700 according to an embodiment of the present application. It should be noted that the basic principle and the generated technical effect of the data association apparatus 700 provided in the present embodiment are the same as those of the above embodiments, and for the sake of brief description, no part of the present embodiment is mentioned, and reference may be made to the corresponding contents in the above embodiments. The data association apparatus 700 can be applied to the electronic device 100 shown in fig. 1, and includes a data acquisition module 710, a data query module 720, a malicious domain name detection module 730, and a data association module 740.
Alternatively, the modules may be stored in the memory 110 shown in fig. 1 in the form of software or Firmware (Firmware) or be fixed in an Operating System (OS) of the electronic device 100, and may be executed by the processor 120 in fig. 1. Meanwhile, data, codes of programs, and the like required to execute the above-described modules may be stored in the memory 110.
The data obtaining module 710 is configured to obtain sample data; the sample data includes a plurality of domain names to be detected.
It is understood that the data obtaining module 710 may perform the step S201.
The data query module 720 is configured to query domain registration information and IP information corresponding to each domain name to be detected, and extract page keywords of a website corresponding to the domain name to be detected.
It is understood that the data query module 720 may perform the step S202.
The malicious domain name detection module 730 is configured to determine a malicious domain name in the plurality of domain names to be detected according to domain name registration information, IP information, and page similarity between a website corresponding to the domain name to be detected and an official website corresponding to the page keyword.
It is understood that the malicious domain name detection module 730 can perform the step S203.
The data association module 740 is configured to associate different malicious domain names according to domain name registration information and IP information corresponding to each malicious domain name, so as to construct an association network.
It is understood that the data association module 740 may perform the step S204.
Optionally, the malicious domain name detection module 730 is specifically configured to determine that the domain name to be detected is a malicious domain name if the page similarity corresponding to any domain name to be detected is greater than a set threshold, and the domain name registration information and the IP information corresponding to the domain name to be detected are both not matched with the official website.
Optionally, the data association module 740 is specifically configured to associate each malicious domain name with domain name registration information and IP information corresponding to the malicious domain name; if first overlapping information exists between domain name registration information corresponding to any two malicious domain names, the two malicious domain names are associated through the first overlapping information, and/or if second overlapping information exists between IP information corresponding to any two malicious domain names, the two malicious domain names are associated through the second overlapping information, and therefore an associated network is obtained.
Optionally, referring to fig. 8, the data association apparatus 700 may further include a data update module 750, a survivability detection module 760 and a data storage module 770.
The data updating module 750 is configured to perform a back-check operation based on the first overlapping information or the second overlapping information in the associated network to obtain a target domain name registered using the first overlapping information or the second overlapping information; the target domain name is other than a malicious domain name in the associated network; and under the condition that the target domain name is determined to be a malicious domain name, updating the associated network according to the target domain name.
It is understood that the data update module 750 can perform the above steps S401 to S402.
The survivability detection module 760 is configured to perform survivability detection on each domain name to be detected in the sample data to filter out inactive domain names to be detected in the sample data and retain the alive domain name to be detected; and if the inactivated domain name to be detected is detected to be used again, adding the domain name to be used again into the domain name black library for monitoring.
It is understood that the survivability detection module 760 may perform the steps S501-S502 described above.
The data storage module 770 is configured to store the malicious domain name, domain name registration information corresponding to the malicious domain name, and IP information in the cloud server.
It is understood that the data storage module 770 can perform the step S601.
The data association apparatus 700 provided in the embodiment of the present application acquires, by using the data acquisition module 710, sample data, where the sample data includes a plurality of domain names to be detected; the data query module 720 queries domain registration information and IP information corresponding to each domain name to be detected, and extracts page keywords of a website corresponding to the domain name to be detected; the malicious domain name detection module 730 determines a malicious domain name in the plurality of domain names to be detected according to the domain name registration information, the IP information and the page similarity between the website corresponding to the domain name to be detected and the official website corresponding to the page keyword; the data association module 740 associates different malicious domain names according to the domain name registration information and the IP information corresponding to each malicious domain name to construct an association network. According to the method and the device, the association is carried out on different malicious domain names from multiple dimensions of domain name registration information and IP information, the association network is constructed, and single domain name association is broken through, so that a refined association relation is formed, and the accuracy of association of a cheating website is improved.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. A method for associating data, the method comprising:
acquiring sample data; the sample data comprises a plurality of domain names to be detected;
inquiring domain name registration information and IP information corresponding to each domain name to be detected, and extracting page keywords of a website corresponding to the domain name to be detected;
according to the domain name registration information, the IP information and the page similarity between the website corresponding to the domain name to be detected and the official website corresponding to the page keyword, determining a malicious domain name in the domain names to be detected;
and associating different malicious domain names according to the domain name registration information and the IP information corresponding to each malicious domain name to construct an associated network.
2. The method according to claim 1, wherein the step of determining the malicious domain name in the plurality of domain names to be detected according to the domain name registration information, the IP information, and the page similarity between the website corresponding to the domain name to be detected and the official website corresponding to the page keyword comprises:
and if the page similarity corresponding to any domain name to be detected is greater than a set threshold value, and the domain name registration information and the IP information corresponding to the domain name to be detected are not matched with the official website, determining that the domain name to be detected is a malicious domain name.
3. The method according to claim 1, wherein the step of associating different malicious domain names according to the domain name registration information and the IP information corresponding to each malicious domain name to construct an associated network comprises:
associating each malicious domain name with domain name registration information and IP information corresponding to the malicious domain name; if first overlapping information exists between domain name registration information corresponding to any two malicious domain names, the two malicious domain names are associated through the first overlapping information, and/or if second overlapping information exists between IP information corresponding to any two malicious domain names, the two malicious domain names are associated through the second overlapping information, and therefore an associated network is obtained.
4. The method of claim 3, further comprising:
performing a reverse-check operation based on first overlapping information or second overlapping information in the associated network to obtain a target domain name registered by using the first overlapping information or the second overlapping information; the target domain name is other than a malicious domain name in the associated network;
and under the condition that the target domain name is determined to be a malicious domain name, updating the associated network according to the target domain name.
5. The method of claim 1, wherein after the step of obtaining sample data, the method further comprises:
performing survivability detection on each domain name to be detected in the sample data to filter the inactivated domain name to be detected in the sample data and keep the alive domain name to be detected;
and if the inactivated domain name to be detected is detected to be used again, adding the domain name to be used again into the domain name black library for monitoring.
6. The method according to claim 1, wherein after the step of determining the malicious domain name in the plurality of domain names to be detected according to the domain name registration information, the IP information, and the page similarity between the website corresponding to the domain name to be detected and the official website corresponding to the page keyword, the method further comprises:
and storing the malicious domain name, domain name registration information corresponding to the malicious domain name and IP information in a cloud server.
7. The method according to any of claims 1-6, wherein the domain name registration information comprises domain name registrar, registrar information, sub-domain name information; the IP information comprises an IP address, an IP geographical position and IP registration information.
8. An apparatus for associating data, the apparatus comprising:
the data acquisition module is used for acquiring sample data; the sample data comprises a plurality of domain names to be detected;
the data query module is used for querying domain name registration information and IP information corresponding to each domain name to be detected and extracting page keywords of a website corresponding to the domain name to be detected;
the malicious domain name detection module is used for determining a malicious domain name in the plurality of domain names to be detected according to the domain name registration information, the IP information and the page similarity between the website corresponding to the domain name to be detected and the official website corresponding to the page keyword;
and the data association module is used for associating different malicious domain names according to the domain name registration information and the IP information corresponding to each malicious domain name so as to construct an associated network.
9. An electronic device, comprising a processor and a memory, the memory storing a computer program that, when executed by the processor, performs the method of any one of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1 to 7.
CN202011630986.2A 2020-12-31 2020-12-31 Data association method and device, electronic equipment and computer-readable storage medium Active CN112804210B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011630986.2A CN112804210B (en) 2020-12-31 2020-12-31 Data association method and device, electronic equipment and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011630986.2A CN112804210B (en) 2020-12-31 2020-12-31 Data association method and device, electronic equipment and computer-readable storage medium

Publications (2)

Publication Number Publication Date
CN112804210A true CN112804210A (en) 2021-05-14
CN112804210B CN112804210B (en) 2022-12-27

Family

ID=75808321

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011630986.2A Active CN112804210B (en) 2020-12-31 2020-12-31 Data association method and device, electronic equipment and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN112804210B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113360895A (en) * 2021-06-02 2021-09-07 北京百度网讯科技有限公司 Station group detection method and device and electronic equipment
CN113923193A (en) * 2021-10-27 2022-01-11 北京知道创宇信息技术股份有限公司 Network domain name association method, device, storage medium and electronic equipment
CN114416990A (en) * 2022-01-17 2022-04-29 北京百度网讯科技有限公司 Object relationship network construction method and device and electronic equipment
CN115150354A (en) * 2022-06-29 2022-10-04 北京天融信网络安全技术有限公司 Method and device for generating domain name, storage medium and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105357221A (en) * 2015-12-04 2016-02-24 北京奇虎科技有限公司 Method and apparatus for identifying phishing website
US20160294862A1 (en) * 2014-01-03 2016-10-06 Tencent Technology (Shenzhen) Company Limited Malicious website address prompt method and router
CN106302438A (en) * 2016-08-11 2017-01-04 国家计算机网络与信息安全管理中心 A kind of method of actively monitoring fishing website of Behavior-based control feature by all kinds of means
CN108600249A (en) * 2018-05-04 2018-09-28 哈尔滨工业大学(威海) The method that illegal domain name registration clique excavates is carried out based on multidimensional related information
CN110035075A (en) * 2019-04-03 2019-07-19 北京奇安信科技有限公司 Detection method, device, computer equipment and the storage medium of fishing website
CN110677384A (en) * 2019-08-26 2020-01-10 奇安信科技集团股份有限公司 Phishing website detection method and device, storage medium and electronic device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160294862A1 (en) * 2014-01-03 2016-10-06 Tencent Technology (Shenzhen) Company Limited Malicious website address prompt method and router
CN105357221A (en) * 2015-12-04 2016-02-24 北京奇虎科技有限公司 Method and apparatus for identifying phishing website
CN106302438A (en) * 2016-08-11 2017-01-04 国家计算机网络与信息安全管理中心 A kind of method of actively monitoring fishing website of Behavior-based control feature by all kinds of means
CN108600249A (en) * 2018-05-04 2018-09-28 哈尔滨工业大学(威海) The method that illegal domain name registration clique excavates is carried out based on multidimensional related information
CN110035075A (en) * 2019-04-03 2019-07-19 北京奇安信科技有限公司 Detection method, device, computer equipment and the storage medium of fishing website
CN110677384A (en) * 2019-08-26 2020-01-10 奇安信科技集团股份有限公司 Phishing website detection method and device, storage medium and electronic device

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113360895A (en) * 2021-06-02 2021-09-07 北京百度网讯科技有限公司 Station group detection method and device and electronic equipment
CN113360895B (en) * 2021-06-02 2023-07-25 北京百度网讯科技有限公司 Station group detection method and device and electronic equipment
CN113923193A (en) * 2021-10-27 2022-01-11 北京知道创宇信息技术股份有限公司 Network domain name association method, device, storage medium and electronic equipment
CN113923193B (en) * 2021-10-27 2023-11-28 北京知道创宇信息技术股份有限公司 Network domain name association method and device, storage medium and electronic equipment
CN114416990A (en) * 2022-01-17 2022-04-29 北京百度网讯科技有限公司 Object relationship network construction method and device and electronic equipment
CN114416990B (en) * 2022-01-17 2024-05-21 北京百度网讯科技有限公司 Method and device for constructing object relation network and electronic equipment
CN115150354A (en) * 2022-06-29 2022-10-04 北京天融信网络安全技术有限公司 Method and device for generating domain name, storage medium and electronic equipment
CN115150354B (en) * 2022-06-29 2023-11-10 北京天融信网络安全技术有限公司 Method and device for generating domain name, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN112804210B (en) 2022-12-27

Similar Documents

Publication Publication Date Title
CN112804210B (en) Data association method and device, electronic equipment and computer-readable storage medium
CN108092963B (en) Webpage identification method and device, computer equipment and storage medium
US9276956B2 (en) Method for detecting phishing website without depending on samples
CN110177114B (en) Network security threat indicator identification method, equipment, device and computer readable storage medium
CN107888606B (en) Domain name credit assessment method and system
CN104156490A (en) Method and device for detecting suspicious fishing webpage based on character recognition
CN112019519B (en) Method and device for detecting threat degree of network security information and electronic device
Tan et al. Phishing website detection using URL-assisted brand name weighting system
GB2555801A (en) Identifying fraudulent and malicious websites, domain and subdomain names
CN103209177A (en) Detection method and device for network phishing attacks
Ramesh et al. Identification of phishing webpages and its target domains by analyzing the feign relationship
WO2021154114A1 (en) Method and system for detecting an infrastructure of malware or a cybercriminal
CN105530251A (en) Method and device for identifying phishing website
CN105262730A (en) Monitoring method and device based on enterprise domain name safety
CN106933880B (en) Label data leakage channel detection method and device
CN113810518A (en) Effective sub-domain name recognition method and device and electronic equipment
CN107332856B (en) Address information detection method and device, storage medium and electronic device
CN115794780A (en) Method and device for collecting network space assets, electronic equipment and storage medium
CN115001724B (en) Network threat intelligence management method, device, computing equipment and computer readable storage medium
CN112104656B (en) Network threat data acquisition method, device, equipment and medium
Jo et al. You're not who you claim to be: Website identity check for phishing detection
CN113992390A (en) Phishing website detection method and device and storage medium
CN113726826A (en) Threat information generation method and device
CN109831472B (en) Information pushing and information displaying method and system
US11962618B2 (en) Systems and methods for protection against theft of user credentials by email phishing attacks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant