CN113239155A - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN113239155A
CN113239155A CN202110616187.8A CN202110616187A CN113239155A CN 113239155 A CN113239155 A CN 113239155A CN 202110616187 A CN202110616187 A CN 202110616187A CN 113239155 A CN113239155 A CN 113239155A
Authority
CN
China
Prior art keywords
domain name
name information
registered
target
comparison
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110616187.8A
Other languages
Chinese (zh)
Inventor
杨鹏迪
张园超
余锋
柳寒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang eCommerce Bank Co Ltd
Original Assignee
Zhejiang eCommerce Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang eCommerce Bank Co Ltd filed Critical Zhejiang eCommerce Bank Co Ltd
Priority to CN202110616187.8A priority Critical patent/CN113239155A/en
Publication of CN113239155A publication Critical patent/CN113239155A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The embodiment of the specification provides a data processing method and a data processing device, wherein the data processing method comprises the following steps: acquiring registered domain name information, and preprocessing the registered domain name information to acquire a target domain name information set; determining initial domain name information, and processing the initial domain name information based on a preset processing mode to obtain contrast domain name information associated with the initial domain name information; and comparing the comparison domain name information with the target domain name information in the target domain name information set to determine whether the target domain name information identical to the comparison domain name information exists in the target domain name information set.

Description

Data processing method and device
Technical Field
The embodiment of the specification relates to the technical field of computers, in particular to a data processing method. One or more embodiments of the present specification also relate to a data processing apparatus, a computing device, and a computer-readable storage medium.
Background
With the popularization and development of electronic commerce and internet application, the network security problem is more and more severe, and fraud and phishing attacks by utilizing counterfeit domain names often occur; the phishing behavior is usually that a domain name similar to a target domain name is adopted to make a user misunderstand that a phishing website is a regular target website, and the existing counterfeit domain name detection mainly takes corresponding measures in/after the fact, for example, early warning and interception are performed after an event is investigated, so that the risk detection efficiency is low, the processing time is passive, and the coping time is relatively short.
Disclosure of Invention
In view of this, the present specification provides a data processing method. One or more embodiments of the present specification also relate to a data processing apparatus, a computing device, and a computer-readable storage medium to address technical deficiencies in the prior art.
According to a first aspect of embodiments herein, there is provided a data processing method including:
acquiring registered domain name information, and preprocessing the registered domain name information to acquire a target domain name information set;
determining initial domain name information, and processing the initial domain name information based on a preset processing mode to obtain contrast domain name information associated with the initial domain name information;
and comparing the comparison domain name information with the target domain name information in the target domain name information set to determine whether the target domain name information identical to the comparison domain name information exists in the target domain name information set.
According to a second aspect of embodiments herein, there is provided a data processing apparatus comprising:
the system comprises a preprocessing module, a target domain name information collection module and a domain name information collection module, wherein the preprocessing module is configured to acquire registered domain name information and preprocess the registered domain name information to acquire the target domain name information collection;
the determining module is configured to determine initial domain name information, process the initial domain name information based on a preset processing mode and obtain contrast domain name information associated with the initial domain name information;
and the comparison module is configured to compare the comparison domain name information with the target domain name information in the target domain name information set and determine whether the target domain name information identical to the comparison domain name information exists in the target domain name information set.
According to a third aspect of embodiments herein, there is provided a computing device comprising:
a memory and a processor;
the memory is configured to store computer-executable instructions and the processor is configured to execute the computer-executable instructions, wherein the processor implements the steps of the data processing method when executing the computer-executable instructions.
According to a fourth aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of any one of the data processing methods.
In one embodiment of the present description, a target domain name information set is obtained by acquiring registered domain name information and preprocessing the registered domain name information; determining initial domain name information, and processing the initial domain name information based on a preset processing mode to obtain contrast domain name information associated with the initial domain name information; comparing the comparison domain name information with target domain name information in the target domain name information set to determine whether the target domain name information identical to the comparison domain name information exists in the target domain name information set; whether the counterfeit domain name information exists or not can be determined before the risk occurs, so that the counterfeit domain name information can be pre-judged and intercepted in the follow-up process, the risk detection efficiency can be improved, and the opportunity processing is passively used as the initiative, so that the risk spread is reduced.
Drawings
FIG. 1 is a flow chart of a data processing method provided by an embodiment of the present description;
FIG. 2 is a flow chart of a data processing method according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present specification;
fig. 4 is a block diagram of a computing device according to an embodiment of the present disclosure.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, as those skilled in the art will be able to make and use the present disclosure without departing from the spirit and scope of the present disclosure.
The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can also be referred to as a second and, similarly, a second can also be referred to as a first without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
First, the noun terms to which one or more embodiments of the present specification relate are explained.
DNS record: as a distributed database record that maps domain names and IP addresses to each other.
Fishing attack: refers to a criminal fraud process that attempts to masquerade as a recipient from electronic communications.
FDNS: forward DNS, containing ANY/A/AAAA/TXT/MX/CNAME type record data.
RDNS: reverse DNS, PTR of IP records data.
RIR: regional Internet Registry.
And (3) similarity calculation: comparing the similarity of two objects, generally calculating the distance between the features of the objects, if the distance is small, the similarity is large; if the distance is large, the similarity is small.
At present, for the actions of fraud and phishing attack by using counterfeit domain names, corresponding measures are mostly taken in the middle of/after the affairs, for example, early warning and interception are carried out after the affairs are found out, which leads to the reinforcement of sheep death. There may also be instances where content fraud imitating domain name publications, and malicious files/out-links are not detected by conventional security products. According to the data processing method provided by the embodiment of the application, the domain name to be monitored is subjected to similarity calculation according to the monitored global DNS change record, counterfeit information can be generated before threat risk occurs, prejudgment and interception are achieved, meanwhile, the corresponding time difference is reduced, passive is changed into active, and risk spread is reduced.
In the present specification, a data processing method is provided, and the present specification relates to a data processing apparatus, a computing device, and a computer-readable storage medium, which are described in detail one by one in the following embodiments.
It should be noted that the data processing method provided in the embodiments of the present description may be applied to a threat risk information scenario summarized in the network space security field, and before a counterfeit domain name risk occurs, the counterfeit domain name information is queried in the registration domain name information to determine whether there is a domain name identical to the counterfeit domain name information, and counterfeit information is generated, so that the counterfeit domain name is subsequently and accurately located and intercepted, and the fastest information production requirement in an anti-fraud and counterfeit interception scenario may be met.
Fig. 1 shows a flowchart of a data processing method according to an embodiment of the present specification, which specifically includes the following steps.
Step 102: and acquiring registered domain name information, and preprocessing the registered domain name information to acquire a target domain name information set.
The registered domain name information may be understood as domain name information registered in a global DNS resolution server (domain name center), and it should be noted that the registered domain name information may also be obtained from other domain name centers, which is not limited in this description embodiment.
The target domain name information set can be understood as a target domain name information set which is obtained after processing the registered domain name information and is convenient for inquiring the counterfeited domain name information subsequently.
Specifically, the server obtains the registered domain name information from the domain name center, and pre-processes the registered domain name information of the domain name center to obtain a target domain name information set, where the registered domain name information includes a large amount of mapping information of the registered domain name, such as information of mapping between a domain name and an IP address or information of mapping between an IP address and a domain name.
In practical application, the server can monitor change data in the global DNS analysis server in real time, acquire the full amount of registered domain name information from the global DNS analysis server, preprocess the full amount of registered domain name information, and acquire a target domain name information set, so that counterfeit domain name information can be quickly searched in the target domain name information set subsequently, and further risk spreading is reduced.
Furthermore, the registered domain name information in the global DNS analysis server also comprises two types, and the acquired two types of registered domain name information are subjected to subsequent processing to ensure that the full amount of registered domain name information is acquired; specifically, the registered domain name information includes forward registered domain name information and reverse registered domain name information,
correspondingly, the acquiring the registered domain name information includes:
and acquiring the forward registered domain name information and the reverse registered domain name information.
The forward registration domain name information may be understood as resolving associated IP address information according to the domain name information.
The reverse registration domain name information may be understood as domain name information resolved according to the IP address.
In practical applications, during the process of acquiring the global DNS resolution server, the server acquires two types of registered domain name information, including FDNS data (i.e., forward registered domain name information) and RDNS data (i.e., reverse registered domain name information), it should be noted that the two types of registered domain name information may be acquired from a regional network registry through open source data, where the regional network registry includes RIPE (european IP address registry)/LACNIC (latin america and caribbean Internet address registry)/ARIN (american Internet number registry)/african network information center/APNIC (asia-pacific address network information center).
In the data processing method provided in the embodiment of the present specification, the acquisition of the full amount of domain name information in the global DNS resolution server is realized by acquiring two types of registered domain name information, so that the counterfeit domain name information is subsequently searched for in the full amount of domain name information.
After the full amount of registered domain name information is obtained, the registered domain name information can be preprocessed to obtain a target domain name information set, so that counterfeit domain name information can be conveniently searched in the target domain name information set in a follow-up manner; specifically, the preprocessing the registered domain name information to obtain a target domain name information set includes one of the following modes:
determining a character string of the registered domain name information, and adjusting the character type and the character sequence of the character string to obtain a target domain name information set;
determining a character string of the registered domain name information, and sequencing the character string according to preset processing conditions to obtain a target domain name information set;
determining a character string of the registered domain name information, adjusting the character type and the character sequence of the character string to obtain alternative domain name information, and sequencing the alternative domain name information according to preset processing conditions to obtain a target domain name information set.
The method for preprocessing the registered domain name information may be any one of the above preprocessing methods, but is not limited thereto, and the embodiment of the present disclosure describes the above three preprocessing methods in detail, but does not limit the specific preprocessing method at all.
The first pretreatment mode comprises the following steps: the server determines a character string of the registered domain name information of the domain name center and adjusts the character type of the character string of the registered domain name information, for example, capital letters are adjusted to corresponding lower case letters, or capital letters are adjusted to corresponding lower case letters, so that the character sequence of the registered domain name information can be adjusted, for example, the domain name characters in a positive sequence are adjusted to domain name characters in a reverse sequence, and on the basis, the character string of the registered domain name information is preprocessed, so that a target domain name information set can be obtained.
For example, the registered domain name information includes three, namely, ABcf, abe, and the letters of the three registered domain name information are uniformly adjusted into small letters.
The second pretreatment mode comprises the following steps: the server determines a character string of the registered domain name information of the domain name center, and sorts the character string according to a preset processing condition, wherein the preset processing condition can be understood as a sorting mode of the character string, for example, sorting initials of the character string according to an order from a to Z, or sorting initials of the character string according to an order from Z to a, and the like, and after sorting the character string of the registered domain name information, a target domain name information set is obtained based on the sorted registered domain name information.
For example, the three pieces of registered domain name information are respectively named as ' Abc '. Subjects ', and ' Bcf '. The initial letters of the three pieces of registered domain name information are sorted according to the sequence from A to Z, and then the sequence of the registered domain name information is named as ' Abc '. Subjects ', and ' Bcf '. Subjects ', and the target domain name information set is obtained.
The third pretreatment mode is as follows: the server determines a character string of the registered domain name information of the domain name center, adjusts the character type and the character sequence of the character string of the registered domain name information, obtains the alternative domain name information, and performs sorting processing on the alternative domain name information according to preset processing conditions to obtain a target domain name information set, for example, letters in the character string of the registered domain name information are uniformly adjusted to be lower-case letters, the adjusted letters are arranged in a reverse order to form the character string, the alternative domain name information is obtained, and sorting processing is performed on the initial letters of the alternative domain name information according to the sequence from A to Z to obtain the target domain name information set.
For example, the registered domain name information includes three, namely, Abc, cBce, Bcf, letters in the three registered domain name information are uniformly adjusted to be lower-case letters, the order of the letters is adjusted to be reverse order, the alternative domain name information is obtained, and the alternative domain name information is sorted according to the order of the letters A to Z, and the target domain name information set is obtained and is named as cba, ecbc, bc, 52.
In practical application, the domain name information amount in the target domain name information set is very large, after the server acquires the registered domain name information, the server determines the character string of the registered domain name information, the letters of the character string are in a uniform case, rev reversal operation is carried out on the character string in the uniform case, the reversed registered domain name information is subjected to sequencing processing, and finally, the target domain name information set is obtained.
The data processing method provided in the embodiment of the present specification performs preprocessing on the registered domain name information to realize subsequent acceleration of file query on the registered domain name information.
In order to accelerate the sorting processing of the alternative domain name information, the alternative domain name information needs to be divided to determine a subset of the alternative domain name information, and the subset is sorted; specifically, the sorting the alternative domain name information according to a preset processing condition includes:
dividing the alternative domain name information according to a preset memory value to obtain at least two subsets of the alternative domain name information, and sequencing the subsets according to a preset sequencing condition.
Specifically, the server divides the acquired alternative domain name information according to a preset memory value to obtain at least two subsets of the alternative domain name information, and performs sorting processing on each subset according to a preset sorting condition.
In practical application, the server acquires more alternative domain name information, the alternative domain name information is sorted, the alternative domain name information can be divided into a plurality of small files, the alternative domain name information of the plurality of small files is sorted at the same time, and then the sorted small files are combined; for example, the alternative domain name information may be cut according to a preset file size, the domain name information in each cut file is sorted according to a preset sorting condition, the domain name information may be sorted according to the order of the initials a to Z, or sorted according to the order of the initials Z to a, and the domain name information in each sorted file is unified again and is merged.
In the data processing method provided in the embodiment of the present specification, after dividing the alternative domain name information into a plurality of subsets, each subset is sorted, and then the sorted subsets are combined, so that not only is the processing efficiency of the registered domain name information improved, but also the counterfeit domain name information can be conveniently and quickly searched in the processed target domain name information set.
Step 104: determining initial domain name information, and processing the initial domain name information based on a preset processing mode to obtain comparison domain name information associated with the initial domain name information.
The initial domain name information can be understood as real domain name information in practical application, namely anti-counterfeit domain name information.
The comparison domain name information can be understood as comparison domain name information which has higher similarity with the initial domain name information and is counterfeited with the initial domain name information.
Specifically, the server may determine initial domain name information, process the initial domain name information based on a preset processing mode, and obtain comparison domain name information associated with the initial domain name information, where it should be noted that the comparison domain name information may be a plurality of domain name information associated with the initial domain name information, and the number of the comparison domain name information in the embodiment of the present specification is not limited at all.
In practical applications, the manner of obtaining the domain name information may be obtained by a prediction model, but is not limited thereto; specifically, the processing the initial domain name information based on a preset processing mode to obtain comparison domain name information associated with the initial domain name information includes:
inputting the initial domain name information into a prediction model, and obtaining comparison domain name information associated with the initial domain name information, wherein the prediction model is obtained based on training of similarity calculation of the initial domain name information.
The prediction model is obtained by pre-training, a plurality of domain name information similar to the domain name information can be output by inputting one domain name information into the trained prediction model, a specific training mode can be obtained by comparing the similarity of two objects, similarity calculation is performed according to the initial domain name information, model parameters after the similarity calculation are continuously adjusted, and then the trained prediction model is obtained, and the specific prediction model training process is not limited in the embodiment of the specification.
Specifically, the server inputs the initial domain name information into the trained prediction model, and then obtains the comparison domain name information associated with the initial domain name information, for example, the initial domain name information input into the prediction model is × abc °, and then the comparison domain name information associated with the initial domain name information is output, for example, abb.
In practical application, for the similarity calculation of the prediction model, different similarity thresholds can be set according to different practical application situations, generally by calculating the distance between the features of objects, if the distance is small, the similarity is large, if the distance is large, the similarity is small, and if the lowest threshold value of the similarity in the prediction model is set to eighty percent, the similarity of the comparison domain name information and the initial domain name information output through the prediction model may be more than eighty percent, and thus, the prediction model may output the comparison domain name information with a similarity of more than eighty percent with the initial domain name information, and it should be noted that, under the condition that the similarity between the comparison domain name information and the initial domain name information is more than eighty percent, certain confusion is brought to a user, and the comparison domain name information can be used as a counterfeit domain name of the initial domain name information.
It should be noted that, the processing manner of obtaining the comparison domain name information with a higher similarity to the initial domain name information according to the initial domain name information is not limited to using a prediction model, and the comparison domain name information may also be obtained through similarity calculation or other processing manners, which is not limited in this specification.
According to the data processing method provided by the embodiment of the specification, the comparison domain name information associated with the initial domain name information can be determined through the prediction model, and the counterfeit domain name of the initial domain name information can be predicted in advance, so that whether the domain name with the counterfeit initial domain name information exists or not can be judged subsequently, and the probability of the risk of the counterfeit domain name is reduced.
Step 106: and comparing the comparison domain name information with the target domain name information in the target domain name information set to determine whether the target domain name information identical to the comparison domain name information exists in the target domain name information set.
In practical application, after the server determines a large amount of comparison domain name information associated with the initial domain name information, each comparison domain name information can be compared with the target domain name information in the target domain name information set to determine whether the target domain name information identical to the comparison domain name information exists in the target domain name information set, that is, whether counterfeit domain name information identical to the comparison domain name information exists in the processed registered domain name information is determined, and because the comparison domain name information is a counterfeit domain name which is predicted to possibly counterfeit the initial domain name information in advance, whether the counterfeit domain name exists needs to be determined, so that the counterfeit domain name is determined in advance, and then the follow-up prejudgment or interception is facilitated.
In the data processing method provided in the embodiment of the present specification, a binary search algorithm is adopted for a manner of comparing target domain name information of comparison domain name information in a target domain name information set; specifically, the comparing the comparison domain name information with the target domain name information in the target domain name information set includes:
and determining domain name information to be matched in the target domain name information set based on a preset searching mode, and comparing the comparison domain name information with the domain name information to be matched.
The preset search mode may be a binary search method, or may also be a hash mapping search, which is not limited in this description embodiment.
The domain name information to be matched can be understood as the domain name information matched with the contrast domain name information in the target domain name information set.
Specifically, the server may determine domain name information to be matched in the target domain name information set according to a preset search mode, and compare the comparison domain name information with the domain name information to be matched to determine whether the comparison domain name information is the same as the domain name information to be matched, where it is to be noted that the domain name information to be matched may be continuously scanned in a bisection mode to search for the target domain name information in the target domain name information set.
The data processing method provided in the embodiment of the present specification may perform scanning and searching in the target domain name information set by using a preset searching manner, so as to determine whether the target domain name information identical to the comparison domain name information exists in the target domain name information set, and implement prejudgment or interception of counterfeit domain name information.
Another embodiment of the present specification provides a data processing method, further including:
and determining that the comparison domain name information has risk under the condition that the target domain name information which is the same as the comparison domain name information exists in the target domain name information set.
Specifically, after the server determines that the target domain name information identical to the comparison domain name information exists in the target domain name information set, it indicates that the counterfeit domain name information exists in the target domain name information set, and then determines that the comparison domain name information has a risk.
For example, if target domain name information of a target domain name information set exists, the server may find the target domain name information set through a preset search method, and the target domain name information of the target domain name information set exists, the target domain name information of the target domain name information set indicates that the target domain name information of the target domain name information set is registered and stored in the global DNS resolution server.
In the data processing method provided in the embodiment of the present specification, the target domain name information that is the same as the comparison domain name information is searched in the target domain name information set, and then whether the comparison domain name information has a risk is determined, so that detection of a counterfeit domain name in advance is realized, and threat to the network space security field is reduced.
To sum up, in the embodiments of the present description, according to a change record in a monitoring full-scale DNS resolution server, similarity calculation is performed on initial domain name information to be monitored, counterfeitable domain name information is determined, and before threat risk occurs, whether a counterfeited domain name exists is searched for in the monitored change record, so that prejudgment and interception are performed, and meanwhile, a response time difference can be reduced, and passivity can be changed into initiative, so as to reduce risk spread.
The following description will further describe the data processing method provided in this specification with reference to fig. 2 by taking an application of the data processing method in monitoring counterfeit domain names as an example. Fig. 2 shows a flowchart of a processing procedure of a data processing method according to an embodiment of the present specification, which specifically includes the following steps.
It should be noted that monitoring the counterfeit domain name can be realized through two steps, the first step is a monitoring and speed-up preposition step, and the speed of monitoring whether the counterfeit domain name exists in the target domain name information set can be accelerated subsequently by preprocessing the registered domain name information; and secondly, a monitoring step, namely determining real domain name information in the anti-counterfeiting domain name list, predicting the real domain name information through a prediction model, determining counterfeitable comparison domain name information, and searching whether target domain name information identical to the comparison domain name information exists in a target domain name information set through a binary search method so as to realize the information output of the counterfeiting domain name.
Step 202: the server obtains the full registration domain name information of the global DNS resolution server.
The full registration domain name information can comprise FDNS data and RDNS data, and the content which can be monitored by the counterfeit domain name is extracted from an authoritative DNS resolution record base.
Step 204: the server uniformly caps the registered domain name information.
Specifically, the server may unify letters in the registered domain name information into upper case letters or lower case letters, so as to facilitate subsequent query of the counterfeit domain name in the registered domain name information.
Step 206: and the server performs rev inversion on the registered domain name information after the unified case and case.
The rev reversal can be understood as domain name reverse output, and the server performs domain name reverse output on the registered domain name information after the unified case.
Step 208: and the server cuts the reversed registered domain name information according to the size of a preset file.
Specifically, the server may divide the registered domain name information after the domain name is reversely output according to a preset file memory value, so as to obtain a plurality of registered domain name information files.
Step 210: and the server carries out sequencing processing on each cut file.
Specifically, the domain name information in each registered domain name information file may be sorted according to a preset sorting manner, where the preset sorting manner may be alphabetical sorting or alphabetical sorting in reverse order, and the sorting manner is not limited in any way in this specification.
Step 212: and the server merges the sorted files and determines a target domain name information set.
Specifically, the server performs merging processing on each sorted file to determine a target domain name information set, where the target domain name information set may be a domain name set sorted in the alphabetical forward order of the domain names.
Step 214: the server determines a list of anti-spoofed domain names.
Specifically, the server may determine a monitored anti-counterfeit domain name list, where the anti-counterfeit domain name list may be understood as real domain name information, and an application purpose of the data processing method provided in the embodiment of the present specification is to monitor whether a domain name that counterfeits the real domain name information exists, so as to prevent a network security problem from occurring due to the counterfeit domain name.
Step 216: and the server inputs the anti-counterfeiting domain name into a prediction model for prediction, and determines the domain name information.
The prediction model provided in the embodiments of the present specification may be pre-trained, and after inputting the real domain name information into the prediction model, a plurality of comparison domain name information similar to the real domain name information may be output, or it may be understood that, among a plurality of obtained domain name information similar to the real domain name information, the similarity is high, and it is relatively easy to counterfeit the real domain name information, and further the website has a certain inductivity.
Step 218: the server scans a target domain name information set through a binary search algorithm, and judges whether target domain name information identical to the contrast domain name information exists in the target domain name information set or not.
In practical application, the predicted comparison domain name information is used, and a binary search method is utilized to perform scanning search in a target domain name information set, so that monitoring and speed increasing can be achieved, and response time difference can be reduced.
It should be noted that, in the embodiment of the present specification, a specific search method is not limited at all.
Step 220: if the server judges that the domain name exists, the server outputs the information of the counterfeit domain name.
Specifically, when the server determines that the domain name information identical to the comparison domain name information exists in the target domain name information set, the comparison domain name information can be determined to be a counterfeit domain name, and a record is registered in the global DNS resolution server, and information of the counterfeit domain name is recorded so as to facilitate subsequent processing.
In the data processing method provided in the embodiments of the present description, a target domain name information set is obtained by obtaining registered domain name information and preprocessing the registered domain name information; determining initial domain name information, and processing the initial domain name information based on a preset processing mode to obtain contrast domain name information associated with the initial domain name information; comparing the comparison domain name information with target domain name information in the target domain name information set to determine whether the target domain name information identical to the comparison domain name information exists in the target domain name information set; whether the counterfeit domain name information exists or not can be determined before the risk occurs, so that the counterfeit domain name information can be pre-judged and intercepted in the follow-up process, the risk detection efficiency can be improved, and the opportunity processing is passively used as the initiative, so that the risk spread is reduced.
Corresponding to the above method embodiment, this specification further provides a data processing apparatus embodiment, and fig. 3 shows a schematic structural diagram of a data processing apparatus provided in an embodiment of this specification. As shown in fig. 3, the apparatus includes:
a preprocessing module 302 configured to acquire registered domain name information and preprocess the registered domain name information to obtain a target domain name information set;
a determining module 304, configured to determine initial domain name information, and process the initial domain name information based on a preset processing manner to obtain comparison domain name information associated with the initial domain name information;
a comparison module 306 configured to compare the comparison domain name information with the target domain name information in the target domain name information set, and determine whether the target domain name information identical to the comparison domain name information exists in the target domain name information set.
Optionally, the apparatus further comprises:
a risk determination module configured to determine that the comparison domain name information has a risk when target domain name information identical to the comparison domain name information exists in the target domain name information set.
Optionally, the preprocessing module is further configured to:
the target domain name information determining module is configured to determine a character string of the registered domain name information, and adjust the character type and the character sequence of the character string to obtain a target domain name information set;
determining a character string of the registered domain name information, and sequencing the character string according to preset processing conditions to obtain a target domain name information set;
determining a character string of the registered domain name information, adjusting the character type and the character sequence of the character string to obtain alternative domain name information, and sequencing the alternative domain name information according to preset processing conditions to obtain a target domain name information set.
Optionally, the target domain name information determining module is further configured to:
dividing the alternative domain name information according to a preset memory value to obtain at least two subsets of the alternative domain name information, and sequencing the subsets according to a preset sequencing condition.
Optionally, the determining module 304 is further configured to:
inputting the initial domain name information into a prediction model, and obtaining comparison domain name information associated with the initial domain name information, wherein the prediction model is obtained based on training of similarity calculation of the initial domain name information.
Optionally, the alignment module 306 is further configured to:
and determining domain name information to be matched in the target domain name information set based on a preset searching mode, and comparing the comparison domain name information with the domain name information to be matched.
Optionally, the preprocessing module 302 is further configured to:
and acquiring the forward registered domain name information and the reverse registered domain name information.
The data processing device provided in the embodiment of the present specification obtains a target domain name information set by obtaining registered domain name information and preprocessing the registered domain name information; determining initial domain name information, and processing the initial domain name information based on a preset processing mode to obtain contrast domain name information associated with the initial domain name information; comparing the comparison domain name information with target domain name information in the target domain name information set to determine whether the target domain name information identical to the comparison domain name information exists in the target domain name information set; whether the counterfeit domain name information exists or not can be determined before the risk occurs, so that the counterfeit domain name information can be pre-judged and intercepted in the follow-up process, the risk detection efficiency can be improved, and the opportunity processing is passively used as the initiative, so that the risk spread is reduced.
The above is a schematic configuration of a data processing apparatus of the present embodiment. It should be noted that the technical solution of the data processing apparatus and the technical solution of the data processing method belong to the same concept, and details that are not described in detail in the technical solution of the data processing apparatus can be referred to the description of the technical solution of the data processing method.
FIG. 4 illustrates a block diagram of a computing device 400 provided in accordance with one embodiment of the present description. The components of the computing device 400 include, but are not limited to, a memory 410 and a processor 420. Processor 420 is coupled to memory 410 via bus 430 and database 450 is used to store data.
Computing device 400 also includes access device 440, access device 440 enabling computing device 400 to communicate via one or more networks 460. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. The access device 440 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.
In one embodiment of the present description, the above-described components of computing device 400, as well as other components not shown in FIG. 4, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 4 is for purposes of example only and is not limiting as to the scope of the present description. Those skilled in the art may add or replace other components as desired.
Computing device 400 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smartphone), wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 400 may also be a mobile or stationary server.
Wherein the processor 420 is configured to execute computer-executable instructions that, when executed by the processor, implement the steps of the data processing method.
The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the data processing method belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the data processing method.
An embodiment of the present specification further provides a computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the data processing method.
The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the data processing method, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the data processing method.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of acts, but those skilled in the art should understand that the present embodiment is not limited by the described acts, because some steps may be performed in other sequences or simultaneously according to the present embodiment. Further, those skilled in the art should also appreciate that the embodiments described in this specification are preferred embodiments and that acts and modules referred to are not necessarily required for an embodiment of the specification.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the embodiments. The specification is limited only by the claims and their full scope and equivalents.

Claims (10)

1. A method of data processing, comprising:
acquiring registered domain name information, and preprocessing the registered domain name information to acquire a target domain name information set;
determining initial domain name information, and processing the initial domain name information based on a preset processing mode to obtain contrast domain name information associated with the initial domain name information;
and comparing the comparison domain name information with the target domain name information in the target domain name information set to determine whether the target domain name information identical to the comparison domain name information exists in the target domain name information set.
2. The data processing method of claim 1, further comprising:
and determining that the comparison domain name information has risk under the condition that the target domain name information which is the same as the comparison domain name information exists in the target domain name information set.
3. The data processing method according to claim 1 or 2, wherein the preprocessing the registered domain name information to obtain a target domain name information set comprises one of the following manners:
determining a character string of the registered domain name information, and adjusting the character type and the character sequence of the character string to obtain a target domain name information set;
determining a character string of the registered domain name information, and sequencing the character string according to preset processing conditions to obtain a target domain name information set;
determining a character string of the registered domain name information, adjusting the character type and the character sequence of the character string to obtain alternative domain name information, and sequencing the alternative domain name information according to preset processing conditions to obtain a target domain name information set.
4. The data processing method according to claim 3, wherein the sorting the candidate domain name information according to a preset processing condition includes:
dividing the alternative domain name information according to a preset memory value to obtain at least two subsets of the alternative domain name information, and sequencing the subsets according to a preset sequencing condition.
5. The data processing method according to claim 4, wherein the processing the initial domain name information based on a preset processing manner to obtain comparison domain name information associated with the initial domain name information comprises:
inputting the initial domain name information into a prediction model, and obtaining comparison domain name information associated with the initial domain name information, wherein the prediction model is obtained based on training of similarity calculation of the initial domain name information.
6. The data processing method of claim 5, wherein comparing the comparison domain name information with target domain name information in the set of target domain name information comprises:
and determining domain name information to be matched in the target domain name information set based on a preset searching mode, and comparing the comparison domain name information with the domain name information to be matched.
7. The data processing method of claim 3, the registered domain name information comprising forward registered domain name information and reverse registered domain name information,
correspondingly, the acquiring the registered domain name information includes:
and acquiring the forward registered domain name information and the reverse registered domain name information.
8. A data processing apparatus comprising:
the system comprises a preprocessing module, a target domain name information collection module and a domain name information collection module, wherein the preprocessing module is configured to acquire registered domain name information and preprocess the registered domain name information to acquire the target domain name information collection;
the determining module is configured to determine initial domain name information, process the initial domain name information based on a preset processing mode and obtain contrast domain name information associated with the initial domain name information;
and the comparison module is configured to compare the comparison domain name information with the target domain name information in the target domain name information set and determine whether the target domain name information identical to the comparison domain name information exists in the target domain name information set.
9. A computing device, comprising:
a memory and a processor;
the memory is configured to store computer-executable instructions and the processor is configured to execute the computer-executable instructions, wherein the processor implements the steps of the data processing method according to any one of claims 1 to 7 when executing the computer-executable instructions.
10. A computer readable storage medium storing computer instructions which, when executed by a processor, carry out the steps of the data processing method of any one of claims 1 to 7.
CN202110616187.8A 2021-06-02 2021-06-02 Data processing method and device Pending CN113239155A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110616187.8A CN113239155A (en) 2021-06-02 2021-06-02 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110616187.8A CN113239155A (en) 2021-06-02 2021-06-02 Data processing method and device

Publications (1)

Publication Number Publication Date
CN113239155A true CN113239155A (en) 2021-08-10

Family

ID=77136438

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110616187.8A Pending CN113239155A (en) 2021-06-02 2021-06-02 Data processing method and device

Country Status (1)

Country Link
CN (1) CN113239155A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113779389A (en) * 2021-08-26 2021-12-10 杭州安恒信息技术股份有限公司 Illegal website identification method and device, electronic device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190050878A1 (en) * 2017-08-09 2019-02-14 Verisign, Inc. System and method for domain name valuation
CN110008705A (en) * 2019-04-15 2019-07-12 北京微步在线科技有限公司 A kind of recognition methods of malice domain name, device and electronic equipment based on deep learning
CN110855716A (en) * 2019-11-29 2020-02-28 北京邮电大学 Self-adaptive security threat analysis method and system for counterfeit domain names
CN112532764A (en) * 2020-12-01 2021-03-19 上海哔哩哔哩科技有限公司 Data acquisition method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190050878A1 (en) * 2017-08-09 2019-02-14 Verisign, Inc. System and method for domain name valuation
CN110008705A (en) * 2019-04-15 2019-07-12 北京微步在线科技有限公司 A kind of recognition methods of malice domain name, device and electronic equipment based on deep learning
CN110855716A (en) * 2019-11-29 2020-02-28 北京邮电大学 Self-adaptive security threat analysis method and system for counterfeit domain names
CN112532764A (en) * 2020-12-01 2021-03-19 上海哔哩哔哩科技有限公司 Data acquisition method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113779389A (en) * 2021-08-26 2021-12-10 杭州安恒信息技术股份有限公司 Illegal website identification method and device, electronic device and storage medium

Similar Documents

Publication Publication Date Title
Naeem et al. Malware detection in industrial internet of things based on hybrid image visualization and deep learning model
JP5941163B2 (en) Spam detection system and method using frequency spectrum of character string
Abutair et al. CBR-PDS: a case-based reasoning phishing detection system
US11310200B1 (en) Classifying locator generation kits
GB2424969A (en) Training an anti-spam filter
Davuth et al. Classification of malicious domain names using support vector machine and bi-gram method
Tan et al. Adaptive malicious URL detection: Learning in the presence of concept drifts
Lison et al. Neural reputation models learned from passive DNS data
Thonnard et al. Actionable knowledge discovery for threats intelligence support using a multi-dimensional data mining methodology
Palau et al. DNS tunneling: A deep learning based lexicographical detection approach
CN111723371A (en) Method for constructing detection model of malicious file and method for detecting malicious file
CN113239155A (en) Data processing method and device
CN110855716B (en) Self-adaptive security threat analysis method and system for counterfeit domain names
CN109309665B (en) Access request processing method and device, computing device and storage medium
Liang et al. Malportrait: Sketch malicious domain portraits based on passive DNS data
CN111666258A (en) Information processing method and device, and information query method and device
Weng et al. Deep packet pre-filtering and finite state encoding for adaptive intrusion detection system
Žiža et al. DNS exfiltration detection in the presence of adversarial attacks and modified exfiltrator behaviour
Dangwal et al. Feature selection for machine learning-based phishing websites detection
Zhu et al. Detecting malicious domains using modified SVM model
Yan et al. Pontus: A linguistics-based DGA detection system
CN111031068B (en) DNS analysis method based on complex network
Silveira et al. Detection of Malicious Domains Using Passive DNS with XGBoost
Silveira et al. Xgboost applied to identify malicious domains using passive dns
Marchai et al. Semantic based DNS forensics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210810

RJ01 Rejection of invention patent application after publication