CN114363025A

CN114363025A - Domain name detection method, device, equipment and storage medium

Info

Publication number: CN114363025A
Application number: CN202111610605.9A
Authority: CN
Inventors: 朱周平; 刘东鑫; 汪来富; 谢泳
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2021-12-27
Filing date: 2021-12-27
Publication date: 2022-04-15

Abstract

The disclosure provides a domain name detection method, a domain name detection device, domain name detection equipment and a storage medium, and relates to the technical field of network security. The method comprises the following steps: acquiring a domain name to be detected and inter-domain information corresponding to the domain name to be detected; associating the domain name to be detected with the inter-domain information corresponding to the domain name to be detected; and inputting the associated domain name to be detected and the inter-domain information corresponding to the domain name to be detected into a domain name classifier trained in advance to obtain a detection result of whether the domain name to be detected is the target domain name.

Description

Domain name detection method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of network security technologies, and in particular, to a method, an apparatus, a device, and a storage medium for detecting a domain name.

Background

At present, malicious domain names have become one of the most concerned hazards in the field of network security in China and even all over the world.

According to the domain name detection scheme in the related technology, each domain name is regarded as an independent object, only internal information is concerned, data dimensionality is not comprehensive enough, the domain name is possibly trapped in a local view angle, and the missing report rate is high.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The present disclosure provides a domain name detection method, device, apparatus, and storage medium, which at least solve the problems of incomplete data dimensionality and high false negative rate in the related art to some extent.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to an aspect of the present disclosure, there is provided a domain name detection method, the method including:

acquiring a domain name to be detected and inter-domain information corresponding to the domain name to be detected;

associating the domain name to be detected with the inter-domain information corresponding to the domain name to be detected;

and inputting the associated domain name to be detected and the inter-domain information corresponding to the domain name to be detected into a domain name classifier trained in advance to obtain a detection result of whether the domain name to be detected is the target domain name.

In an embodiment of the present disclosure, the inter-domain information corresponding to the domain name to be detected includes a first associated IP and ASN information thereof of the domain name to be detected, a first associated domain and a second associated domain of the domain name to be detected, registration information and sub-domain information thereof, a second associated IP address and ASN information thereof of the first associated domain, and a third associated IP address and ASN information thereof of the second associated domain.

In an embodiment of the disclosure, before inputting the associated domain name to be detected and the inter-domain information corresponding to the domain name to be detected into the pre-trained classifier, the method further includes:

acquiring a training sample set, wherein the training sample set comprises a plurality of training samples, and each training sample comprises a sample domain name and associated inter-domain information thereof;

and training the initial classifier by using the training sample set until the training stopping condition is met, and obtaining the trained domain name classifier.

In one embodiment of the present disclosure, before training the initial classifier using the training sample set, the method further comprises:

calculating the similarity value between the marked sample and the unmarked sample in the training sample set;

deleting unmarked samples with similarity values larger than a preset threshold value;

training an initial classifier using a training sample set, comprising:

and training the initial classifier by using the training sample set after deleting the unlabeled samples with the similarity values larger than the preset threshold value.

In one embodiment of the present disclosure, calculating a similarity value between labeled samples and unlabeled samples in a training sample set includes:

and calculating a modified cosine similarity value between the marked sample and the unmarked sample in the training sample set.

unmarked samples of worthless in the training sample set are deleted.

In one embodiment of the present disclosure, training an initial classifier with a training sample set includes:

training an initial classifier by using a training sample set based on a dynamic threshold; the dynamic threshold is determined based on the number of current labeled samples and the number of labeled samples last time the optimal threshold was taken.

According to another aspect of the present disclosure, there is provided a domain name detecting apparatus including:

the information acquisition module is used for acquiring the domain name to be detected and the inter-domain information corresponding to the domain name to be detected;

the information correlation module is used for correlating the domain name to be detected and the inter-domain information corresponding to the domain name to be detected;

and the detection module is used for inputting the associated domain name to be detected and the inter-domain information corresponding to the domain name to be detected into a domain name classifier trained in advance to obtain a detection result of whether the domain name to be detected is the target domain name.

According to still another aspect of the present disclosure, there is provided an electronic device including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the above-described domain name detection method via execution of the executable instructions.

According to yet another aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the domain name detection method described above.

The domain name detection method provided by the embodiment of the disclosure associates a domain name to be detected and inter-domain information corresponding to the domain name to be detected, and inputs the associated domain name to be detected and the inter-domain information corresponding to the domain name to be detected into a domain name classifier trained in advance to obtain a detection result of whether the domain name to be detected is a target domain name. That is to say, in the embodiment of the present disclosure, indirect contact data between domain names is established, and the data dimension of the domain name to be detected is widened, so that the system can see more domain names, thereby reducing the false negative rate of the malicious domain names.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.

FIG. 1 is a schematic flow chart of a domain name detection method according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a model training process according to an embodiment of the present disclosure;

FIG. 3 is a second schematic diagram illustrating a model training process according to an embodiment of the present disclosure;

FIG. 4 is a third schematic diagram illustrating a model training process according to an embodiment of the present disclosure;

FIG. 5 is a diagram illustrating a model training process according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of a domain name detection apparatus according to an embodiment of the disclosure; and

FIG. 7 is a block diagram of a computer device according to an embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

Based on the background technology, the problems of incomplete data dimensionality and high missing report rate exist in the related technology.

The current mainstream malicious domain name detection algorithm is based on machine learning, and a relatively popular method usually has the defects of large quantity and high quality requirement on normal and malicious sample sets, over-fitting of the same data source, poor robustness of different data sources, low training speed, no support of on-line training, no update of a trained model and the like.

In a related technical scheme, the classifier based on self-learning algorithm training is provided, the model updating mode is changed, the training speed is improved, the problem of online training real-time detection is solved, and data overfitting is relieved. However, this method still has the following problems:

(1) the data dimensionality is not comprehensive enough, and may be trapped in a local visual angle, so that the missing report rate is high: in the related technology, each domain name is regarded as an independent object, only internal information is concerned, and the direct or indirect relationship existing between the domain names is ignored.

(2) The robustness of the classifier is insufficient: in the existing algorithm, a fixed threshold is used for deciding the confidence coefficient of prediction, and in the face of flexible, complex and new data, the generalization capability of the classifier is weak due to the fixed threshold.

(3) The selected domain name to be detected is easy to be interfered by abnormal points, so that the false alarm rate is high: the domain name to be detected with the shortest Euclidean distance to the training set is preferentially selected in the online learning algorithm for prediction and training, and the interference of abnormal values cannot be responded.

(4) The value of the domain name to be detected is short of judgment, and the time cost is increased: some online learning algorithms may train worthless request domain names to generate a large time cost as long as valid domain names are trained.

The embodiment of the disclosure provides a domain name detection method, a domain name detection device and a storage medium, which establish indirect connection data among domain names, widen the data dimension of the domain names to be detected, and enable the system to see more, thereby at least solving the problems that the data dimension is not comprehensive enough and the report missing rate is high in the scheme.

The domain name detection method provided by the embodiment of the disclosure can be executed by any electronic device with computing processing capability. For example, the main body of the domain name detection method may be, but is not limited to, any terminal device or server that can be configured to execute the domain name detection method provided by the embodiments of the present disclosure, or the main body of the method may also be a client itself that can execute the method.

The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform.

The terminal may be any electronic device including, but not limited to, a smart phone, a tablet computer, a laptop portable computer, a desktop computer, a wearable device, an augmented reality device, a virtual reality device, and the like, which is not limited herein.

The present exemplary embodiment will be described in detail below with reference to the drawings and examples.

Fig. 1 shows a flowchart of a domain name detection method in an embodiment of the present disclosure, and as shown in fig. 1, the domain name detection method provided in the embodiment of the present disclosure includes the following steps:

step S102, acquiring a domain name to be detected and inter-domain information corresponding to the domain name to be detected;

step S104, associating the domain name to be detected and the inter-domain information corresponding to the domain name to be detected;

and step S106, inputting the associated domain name to be detected and the inter-domain information corresponding to the domain name to be detected into a domain name classifier trained in advance to obtain a detection result of whether the domain name to be detected is the target domain name.

The above steps are described in detail below, specifically as follows:

the target domain name here may be a malicious domain name. The malicious domain name can be a domain name site which carries out malicious activities by spreading malicious software, sending junk mails and the like.

The domain name, also called network domain, is the name of a certain computer or computer group on the Internet composed of a string of names separated by points, and is used for positioning and identifying the computer during data transmission. Because the IP address has the disadvantages of inconvenient memorization and incapability of displaying the Name and property of the address organization, people design a Domain Name and map the Domain Name and the IP address with each other through a Domain Name System (DNS), so that people can access the internet more conveniently without remembering the number string of the IP addresses which can be directly read by a machine.

In the related art, domain information features are adopted and are independent from each other when domain names are detected, and domain name internal data, such as domain name structural features (such as domain name length, depth, sub-domain name average length and the like), domain name linguistic features (such as whether numbers are included, number ratios and the like) and domain name statistical features (such as N-gram distribution, average request number and the like) are included; and also contains data related to the inside of the domain name, such as obtaining the registration information based on the domain name and obtaining the ASN information based on the corresponding target IP. However, the lack of inter-domain correlation data can cause the model to be stuck in local view, and a high false negative rate exists.

In some embodiments, the inter-domain information corresponding to the domain name to be detected may include the first associated IP and its ASN information of the domain name to be detected, the first associated domain and the second associated domain of the domain name to be detected, registration information and sub-domain information thereof, the second associated IP address and its ASN information of the first associated domain, and the third associated IP address and its ASN information of the second associated domain.

In the embodiment of the present disclosure, in addition to collecting conventional DNS traffic data (including domain names and its registration information, corresponding IP addresses and its ASN information, etc.), inter-domain data is also required. Namely, the first associated IP and ASN information of the target domain, the first associated domain and the second associated domain of the target IP address and the registration information thereof and the sub-domain information thereof, the second associated IP address and the ASN information thereof of the first associated domain, and the third associated IP address and the ASN information thereof of the second associated domain are collected.

In the embodiment of the disclosure, indirect contact data between domain names is established, and the data dimension of the domain name to be detected is widened, so that the system can see more domain names, thereby reducing the false rate of malicious domain names.

In some embodiments, before the step S106, as shown in fig. 2, the method may further include the following steps:

step S202, a training sample set is obtained, the training sample set comprises a plurality of training samples, and each training sample comprises a sample domain name and associated inter-domain information thereof;

and step S204, training the initial classifier by using the training sample set until the training stopping condition is met, and obtaining the trained domain name classifier.

Fig. 3 illustrates a process of training an initial classifier in an embodiment of the present disclosure, and as shown in fig. 3, before inputting a target vector into the initial classifier, that is, before training the initial classifier by using a training sample set in the foregoing, sample data may also be processed by an anti-interference similarity optimization module.

Based on this, the above method may further include the steps of:

accordingly, training the initial classifier using the training sample set in step S204 may include:

As an example, the similarity value between the labeled sample and the unlabeled sample in the training sample set is calculated in the foregoing, and this similarity value may be a modified cosine similarity value. The modified cosine similarity is used for considering the similarity of the variation trend degrees between the two involved in calculating the similarity on one hand and measuring the difference degree of the two values on the other hand.

In the related art, when detecting a domain name, the influence of singular values is not considered, and all data and unlabeled data are used for calculating the Euclidean distance to obtain an average value, so that the most similar unlabeled sample, namely the sample with the minimum average value, is screened out. However, in practice, the disguised malicious domain names may be a very small euclidean distance from the individual normal domain names, thereby pulling down the average.

In the embodiment of the disclosure, the method for removing the outlier can well avoid the influence to a certain extent, that is, the potential outlier influence is removed under the condition of keeping the optimal similarity.

Under the condition of not influencing the screening of the actual optimal similarity, the interference of singular values can be filtered by means of de-extremum, and optionally, other similarity measurement modes can be used, such as:

wherein A is_i,B_iRepresenting the components of vectors a and B respectively,

represents the average of the component i.

Specifically, the similarity between the unlabeled sample vector and each labeled sample vector is obtained, the unlabeled sample vectors are sorted from small to large, the first 20% of the labeled sample vectors are discarded, the residual values are averaged to obtain the similarity of each unlabeled sample, and the sample with the smallest value, namely the sample with the largest similarity, is selected as the target vector for preferential training prediction.

In some embodiments, as shown in fig. 4, before processing by the anti-interference similarity optimization module, the sample data may also be processed by the valid domain name demotion module.

Based on this, before training the initial classifier using the training sample set, the method may further include:

unmarked samples of worthless in the training sample set are deleted.

And the valid domain name de-counterfeiting module.

In the data screening process, only valid data is generally obtained, and useless data cannot be filtered. The effective data refers to data which is not lost or meets requirements, the effective data contains a large amount of worthless data, and does not generate a great amount of forward benefits for model optimization, and the effective data is pseudo data and needs to be removed to improve the training speed.

In particular, those domain name samples that are accessed by a large number of clients or that appear in a labeled sample are trivial domain names.

And the effective domain name false removing module can remove the worthless domain name from the effective domain name. For example, at domain names accessed by most clients, and samples that appear in benign and malignant domains, etc., because these samples do not yield positive benefits to the model.

In some embodiments, as shown in fig. 5, the initial classifier may be trained using a training sample set based on dynamic thresholds; the dynamic threshold is determined based on the number of current labeled samples and the number of labeled samples last time the optimal threshold was taken.

The malicious domain names occupy a very small proportion in the whole domain name set and are distributed unevenly, and proper threshold selection is necessary.

In addition, the ratio expressions of different types of malicious behaviors in the malicious domain name are different, the characteristics carried by the malicious domain name generated by an attacker are constantly changed, the system defense is pointed, and the attack means is not optimized locally. Therefore, in the case of a complex and flexible security problem of a malicious domain name, a higher false negative rate and a higher false positive rate are generated by adopting a fixed threshold method compared with the traditional machine learning/deep learning application field.

In the embodiment of the disclosure, a dynamic threshold module is used, the threshold is dynamically changed, and the dynamic threshold can be discovered in time along with the change of an attacker compared with a invariable passive defense. And the threshold capability is better and better as the training progresses.

As an example, in the embodiment of the present disclosure, a P-R curve is used to select an optimal threshold at the current time, and then a dynamic update manner is k > -f × m, where k is the number of current labeled samples, m is the number of labeled samples when the optimal threshold is last taken, when the number relationship is satisfied, the threshold is updated to the current optimal value, and the threshold is updated until the next number relationship is satisfied.

Compared with the manual fixed threshold in the prior art, the P-R dynamic threshold module obtains a P-R curve mode by changing the threshold, selects the threshold corresponding to max (P + R) as the current optimal threshold, marks the number of marked samples corresponding to the last optimal threshold as m, updates the current threshold if the number of the current marked samples is marked as k, and marks m at the moment when the number of the current marked samples is larger than or equal to the coefficient function f. Once the change is held constant until the next numerical relationship is established.

The horizontal and vertical coordinates of the P-R curve are respectively called and precision, and the calculation formula is as follows:

wherein the positive examples are correctly classified as positive examples, denoted as TP; classifying positive case errors as negative cases, denoted as FN; negative case errors are classified as positive cases, denoted FP. Given a threshold, there will be a corresponding P-R point.

In the embodiment of the disclosure, indirect contact data between domain names is established, an anti-interference similarity optimization module is used for removing potential abnormal value influence under the condition of keeping the similarity optimal, and the P-R dynamic threshold module dynamically updates the threshold and improves the generalization capability of the method, so that the method can carry out real-time detection and greatly reduce the false alarm rate and the false alarm rate on the basis of relieving overfitting, and the robustness of the method is improved; and optionally, an effective domain name false removing module is added, so that the training time can be greatly reduced by only adding little effort and screening out worthless effective domain name requests.

Based on the same inventive concept, the embodiment of the present disclosure further provides a domain name detection apparatus, as described in the following embodiments. Because the principle of the embodiment of the apparatus for solving the problem is similar to that of the embodiment of the method, the embodiment of the apparatus can be implemented by referring to the implementation of the embodiment of the method, and repeated details are not described again.

Fig. 6 shows a domain name detection apparatus in an embodiment of the present disclosure, and as shown in fig. 6, the domain name detection apparatus 600 includes:

an information obtaining module 602, configured to obtain a domain name to be detected and inter-domain information corresponding to the domain name to be detected;

an information association module 604, configured to associate the domain name to be detected and inter-domain information corresponding to the domain name to be detected;

the detecting module 606 is configured to input the correlated domain name to be detected and inter-domain information corresponding to the domain name to be detected into a domain name classifier trained in advance, so as to obtain a detection result of whether the domain name to be detected is the target domain name.

In some embodiments, the domain name detection apparatus 600 may further include:

the system comprises a sample acquisition module, a training sample analysis module and a training sample analysis module, wherein the sample acquisition module is used for acquiring a training sample set, the training sample set comprises a plurality of training samples, and each training sample comprises a sample domain name and associated inter-domain information thereof;

and the model training module is used for training the initial classifier by utilizing the training sample set until the training stopping condition is met, so as to obtain the trained domain name classifier.

the similarity calculation module is used for calculating the similarity value between the marked sample and the unmarked sample in the training sample set;

the first deleting module is used for deleting the unmarked samples with the similarity values larger than a preset threshold value;

accordingly, training the initial classifier using the training sample set may include:

In some embodiments, the similarity calculation module is specifically configured to:

and the second deleting module is used for deleting the unmarked samples which are worthless in the training sample set.

In some embodiments, the model training module trains the initial classifier using a training sample set, which may include:

The traffic prediction apparatus provided in the embodiment of the present application may be configured to implement the traffic prediction method provided in the above method embodiments, and the implementation principle and the technical effect are similar, and for the sake of brevity, no further description is given here.

As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

An electronic device 700 according to this embodiment of the disclosure is described below with reference to fig. 7. The electronic device 700 shown in fig. 7 is only an example and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 7, electronic device 700 is embodied in the form of a general purpose computing device. The components of the electronic device 700 may include, but are not limited to: the at least one processing unit 710, the at least one memory unit 720, and a bus 730 that couples various system components including the memory unit 720 and the processing unit 710.

Wherein the storage unit stores program code that is executable by the processing unit 710 to cause the processing unit 710 to perform steps according to various exemplary embodiments of the present disclosure as described in the above section "exemplary methods" of this specification. For example, the processing unit 710 may perform the following steps of the above method embodiments:

The storage unit 720 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)7201 and/or a cache memory unit 7202, and may further include a read only memory unit (ROM) 7203.

The storage unit 720 may also include a program/utility 7204 having a set (at least one) of program modules 7205, such program modules 7205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 730 may be any representation of one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 700 may also communicate with one or more external devices 740 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 700, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 700 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 750. Also, the electronic device 700 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet) via the network adapter 760. As shown, the network adapter 760 communicates with the other modules of the electronic device 700 via the bus 730. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 700, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium, which may be a readable signal medium or a readable storage medium. On which a program product capable of implementing the above-described method of the present disclosure is stored. In some possible embodiments, various aspects of the disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the disclosure described in the "exemplary methods" section above of this specification, when the program product is run on the terminal device.

More specific examples of the computer-readable storage medium in the present disclosure may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

In the present disclosure, a computer readable storage medium may include a propagated data signal with readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Alternatively, program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

In particular implementations, program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Moreover, although the steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A method for detecting a domain name, the method comprising:

2. The method according to claim 1, wherein the inter-domain information corresponding to the domain name to be detected comprises a first associated IP and its ASN information of the domain name to be detected, a first associated domain and a second associated domain of the domain name to be detected and registration information and sub-domain information thereof, a second associated IP address of the first associated domain and its ASN information, and a third associated IP address of the second associated domain and its ASN information.

3. The method according to claim 1, wherein before the associated domain name to be detected and the inter-domain information corresponding to the domain name to be detected are input into a pre-trained classifier, the method further comprises:

4. The method of claim 3, wherein prior to training an initial classifier using the training sample set, the method further comprises:

calculating similarity values between labeled samples and unlabeled samples in the training sample set;

training an initial classifier by using the training sample set, comprising:

5. The method of claim 4, wherein calculating the similarity value between the labeled and unlabeled samples in the set of training samples comprises:

6. The method of any of claims 3-5, wherein prior to training an initial classifier using the training sample set, the method further comprises:

and deleting unmarked samples with no value in the training sample set.

7. The method of claim 1, wherein training an initial classifier using the training sample set comprises:

training an initial classifier using the training sample set based on a dynamic threshold; the dynamic threshold is determined based on the number of current labeled samples and the number of labeled samples last time the optimal threshold was taken.

8. A domain name detection apparatus, characterized in that the apparatus comprises:

the information acquisition module is used for acquiring a domain name to be detected and inter-domain information corresponding to the domain name to be detected;

the information association module is used for associating the domain name to be detected with the inter-domain information corresponding to the domain name to be detected;

9. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the domain name detection method of any one of claims 1 to 7 via execution of the executable instructions.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the domain name detection method according to any one of claims 1 to 7.