CN117081865A - Network security defense system based on malicious domain name detection method - Google Patents

Network security defense system based on malicious domain name detection method Download PDF

Info

Publication number
CN117081865A
CN117081865A CN202311337285.3A CN202311337285A CN117081865A CN 117081865 A CN117081865 A CN 117081865A CN 202311337285 A CN202311337285 A CN 202311337285A CN 117081865 A CN117081865 A CN 117081865A
Authority
CN
China
Prior art keywords
domain name
vector
query
unit
certificate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311337285.3A
Other languages
Chinese (zh)
Other versions
CN117081865B (en
Inventor
黄铁军
常庭懋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qitian Anxin Technology Co ltd
Original Assignee
Beijing Qitian Anxin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qitian Anxin Technology Co ltd filed Critical Beijing Qitian Anxin Technology Co ltd
Priority to CN202311337285.3A priority Critical patent/CN117081865B/en
Publication of CN117081865A publication Critical patent/CN117081865A/en
Application granted granted Critical
Publication of CN117081865B publication Critical patent/CN117081865B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/45Network directories; Name-to-address mapping
    • H04L61/4505Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
    • H04L61/4511Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0823Network architectures or network communication protocols for network security for authentication of entities using certificates
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/3263Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials involving certificates, e.g. public key certificate [PKC] or attribute certificate [AC]; Public key infrastructure [PKI] arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/3297Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials involving time stamps, e.g. generation of time stamps
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2463/00Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00
    • H04L2463/121Timestamp

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a network security defense system based on a malicious domain name detection method, which relates to the technical field of network security defense and comprises the following steps: acquiring domain name parameters, randomly matching a query method list, and acquiring detection rules of a search engine based on a certificate transparency protocol; acquiring corresponding domain name detailed information based on an API query function and a detection rule provided by a search engine censys; acquiring a domain name and returning the webpage content corresponding to the domain name, and performing webpage content similarity analysis to obtain a webpage content similarity characteristic value; performing weight setting processing based on the domain name detailed information and the webpage content similarity characteristic value to obtain a domain name relevance score; based on the domain name association score and a preset association threshold value, analyzing whether the malicious domain names have association relation, and generating the domain name association score by calculating the similarity of the webpage content, so that the association relation between the malicious domain names is found, an alarm is sent out in time, and attacks of the malicious domain names are defended.

Description

Network security defense system based on malicious domain name detection method
Technical Field
The invention relates to the technical field of network security defense, in particular to a network security defense system based on a malicious domain name detection method.
Background
Currently, with the development of scientific technology, the development of the network technology level is also faster and faster. With the network security problem, network attacks are endless, and domain names and websites used by attackers are called malicious domain names. The malicious domain name detection method is subjected to regular matching and evolution of the deep learning detection method, and a certain effect is achieved. In order to avoid the existing detection method, an attacker uses a relatively independent account to purchase domain names at a plurality of domain name registrars, each domain name uses a separate account, and the domain names have no relevance in terms of characteristics, mailboxes and contact ways. Often an attacker registers multiple malicious domain names for a period of time, while domain name update times also have relevance.
Therefore, the invention provides a network security defense system based on a malicious domain name detection method.
Disclosure of Invention
The invention provides a network security defense system based on a malicious domain name detection method, which is used for setting filtering rules for one or more domain names of an attacker, analyzing the correlation degree of certificate generation time, updating time, certificate serial numbers, certificate signing ending time and timestamp of a pre-certificate in the transparency of a query certificate through regular matching, comparing the content carried by a domain name website, calculating the similarity of web page content, and carrying out weighted processing on the certificate, the domain name, the timestamp, serial number information and the web page similarity score by a correlation matching module to generate a domain name correlation score, so that the correlation relation between the malicious domain names is found, an alarm is sent out timely, and the attack of the malicious domain names is defended.
The invention provides a network security defense system based on a malicious domain name detection method, which comprises the following steps:
the preparation module: acquiring domain name parameters, randomly matching a query method list, and acquiring detection rules of a search engine based on a certificate transparency protocol;
and a detection module: acquiring corresponding domain name detailed information based on an API query function provided by a search engine censys and the detection rule;
and an analysis module: acquiring a domain name and returning the webpage content corresponding to the domain name, and performing webpage content similarity analysis to obtain a webpage content similarity characteristic value;
and (3) a setting module: performing weight setting processing based on the domain name detailed information and the webpage content similarity characteristic value to obtain a domain name relevance score;
and (3) an association module: and analyzing whether the malicious domain names have association relations or not based on the domain name association degree scores and a preset association threshold value.
Preferably, the present invention provides a network security defense system based on a malicious domain name detection method, and a preparation module, including:
domain name type acquisition unit: acquiring and analyzing domain name parameters to obtain domain name types;
query rule acquisition unit: obtaining all corresponding query rules based on the domain name type and a type-query rule comparison table;
query list acquisition unit: and randomly arranging all query rules to obtain a query method list.
Preferably, the present invention provides a network security defense system based on a malicious domain name detection method, and a preparation module, which further includes:
certificate parameter acquisition unit: obtaining corresponding certificate parameters based on the certificate transparency protocol and the domain name parameters;
a security verification unit: carrying out security verification on the certificate parameters to obtain a security verification result;
rule construction unit: if the security verification result passes, constructing a detection rule of the search engine according to the query parameters corresponding to each query rule in the query method list.
Preferably, the present invention provides a network security defense system based on a malicious domain name detection method, which further includes:
request acquisition unit: based on each query parameter in the detection rule, obtaining corresponding request information;
search unit: and inputting request information in an API query function provided by a search engine censys to obtain corresponding domain name detailed information.
Preferably, the present invention provides a network security defense system based on a malicious domain name detection method, and an analysis module, which includes:
a text grabbing unit: capturing web page contents of a web page of a returned domain name corresponding to the same input domain name of any two input interfaces to obtain corresponding first text contents and second text contents;
text screening unit: based on the structural text database, matching the first structural text and the second structural text in the first text content and the second text content, and deleting;
clause unit: based on the processed punctuation marks of the first text content and the second text content, preliminary clauses are carried out, and a corresponding first clause set and a corresponding second clause set are obtained;
a substring extraction unit: according to the preset substring length, acquiring first substrings with the same length in each first clause in the first clause set;
dictionary matching unit: performing word matching on each first substring and the dictionary;
a substring processing unit: reserving the first substring which is successfully matched directly, removing the last character of the first substring which is not successfully matched directly, and carrying out word matching with the dictionary again until the matching is successful and reserving the substring with the characters removed;
the word segmentation set construction unit: constructing a first word segmentation set based on all reserved substrings;
syntax analysis unit: inputting each first word in the first word segmentation set into a move-in-specification grammar analyzer in sequence to perform LR (k) grammar analysis to obtain a corresponding first grammar analysis result;
the word segmentation screening unit: deleting the first word segmentation with unqualified first grammar analysis results in the first word segmentation set;
vector acquisition unit: obtaining a corresponding first vector set based on the processed first word segmentation set and a word segmentation-space comparison table of a corresponding matched input interface;
a second analysis unit: simultaneously, a second vector set of the second clause set is acquired;
vector group construction unit: based on the first vector set and the second vector set, matching the first vector and the second vector closest to each other, and constructing a vector group;
distance calculation unit: based on each vector group, calculating a corresponding chebyshev distance;
a feature value calculation unit: and calculating two webpage content similarity characteristic values based on each Chebyshev distance.
Preferably, the present invention provides a network security defense system based on a malicious domain name detection method, and a feature value calculation unit, including:
the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Representing the similarity characteristic value of the webpage content; />Refers to->A first vector in the set of set vectors; />Refers to->A second vector in the set of set vectors; />Representing the number of total vector sets; />Representing the +.>Chebyshev distance of the group vector set; />Characteristic coefficients representing the vector group; />Representing the +.>The vector weights of the set of vectors,indicate->A first vector in the group of vectors is a new vector obtained through adjustment of the corresponding vector weight; />Indicate->A new vector is obtained by adjusting the corresponding vector weight of the second vector in the group vector group; />A definition function representing chebyshev distance; log represents the sign of the log function.
Preferably, the present invention provides a network security defense system based on a malicious domain name detection method, and a setting module, including:
configuration option acquisition unit: determining configuration options based on the domain name detailed information and the webpage content similarity characteristic value;
weight setting unit: weight setting processing is carried out on each configuration option to obtain a domain name association score V;
the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Representing a domain name association value, ">The number of items is configured for domain name association,indicate->A plurality of configuration option weighting values; />The value of the associated item representing the ith configuration option;
the configuration options are related to a certificate serial number, a certificate signing ending time, a pre-certificate timestamp, web page similarity, a certificate updating time and a certificate generating time.
Preferably, the invention provides a network security defense system based on a malicious domain name detection method, and a quantization module is used for:
threshold value judging unit: if the domain name association value is larger than the preset association threshold value, the domain name and the returned domain name have association relation, and the domain name with association relation and the returned domain name are judged to be malicious domain names;
an alarm transmission unit: and sending the malicious domain name to a system administrator, and rejecting access of the malicious domain name.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
fig. 1 is a block diagram of a network security defense system based on a malicious domain name detection method in an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.
Example 1:
the embodiment of the invention provides a network security defense system based on a malicious domain name detection method, which comprises the following steps of:
the preparation module: acquiring domain name parameters, randomly matching a query method list, and acquiring detection rules of a search engine based on a certificate transparency protocol;
and a detection module: acquiring corresponding domain name detailed information based on an API query function provided by a search engine censys and the detection rule;
and an analysis module: acquiring a domain name and returning the webpage content corresponding to the domain name, and performing webpage content similarity analysis to obtain a webpage content similarity characteristic value;
and (3) a setting module: performing weight setting processing based on the domain name detailed information and the webpage content similarity characteristic value to obtain a domain name relevance score;
and (3) an association module: and analyzing whether the malicious domain names have association relations or not based on the domain name association degree scores and a preset association threshold value.
In this embodiment, the domain name parameter refers to a string of characters pointing to a certain IP address, such as: taobao.com, baidu.com.
In this embodiment, the query method list refers to a query method list obtained by analyzing domain name parameters to obtain query rules corresponding to domain name types and randomly arranging all the query rules. The domain name types include a top-level domain name, a second-level domain name, and a third-level domain name, where the top-level domain name is, for example: com stands for business, org stands for organization, secondary domain name such as: com, b.org, tertiary domain name comprising: ju.taobao.com, sf.taobao.com; a query rule refers to a statement that contains query conditions, query content, for example: city= "Chicago" AND BirthDate < DateAdd ("yyyy", -40, date ()).
In this embodiment, the certificate transparency protocol refers to providing a public data structure with only additional attributes that can record certificates issued by a Certificate Authority (CA), in which the encryption attributes of a log ensure that the corresponding entry is never removed or modified once the log accepts a certificate.
In this embodiment, the detection rule refers to a rule set that includes each query rule in the query method list and the corresponding query parameters.
In this embodiment, the domain name detailed information includes: certificate generation time, update time, certificate serial number, certificate signing ending time and timestamp of pre-certificate in the certificate transparency.
In this embodiment, the web page content similarity analysis refers to extracting a domain name input by a user and returning web page text content of the domain name, performing word segmentation on the extracted text, performing grammar analysis, describing the position of the text in a semantic space by using a numerical vector, measuring the spatial distance of two web page vectors, and calculating to obtain a web page content similarity characteristic value to represent the similarity degree of the two web page contents.
In this embodiment, the web page content similarity feature value refers to a numerical value that can represent the similarity degree of two web page contents obtained by performing vector representation on the word segmentation of the web page text content and performing calculation.
In this embodiment, the weight setting refers to setting the weight for each domain name detailed information according to a preset rule, where the rule includes: certificate serial number: a weighting value 20; certificate signing end time: a weighting value 15; timestamp of pre-certificate: a weighting value 25; web page similarity: a weight value 50; certificate update time: a weighted value of 30; certificate generation time: a weighted value of 30; the time within 50 seconds/sequence number difference is less than 10: a weight value 120; time 100 seconds/phase difference less than 20 weight: a weighting value of 100; time 200 seconds/sequence number less than 40: a weighted value 80; time 300 seconds/sequence number less than 60: a weighting value of 60; time 400 seconds/sequence number less than 80: a weighting value 40; time greater than 500 seconds/sequence number differs by greater than 100 or no inquiry is made: a weight value 50.
In this embodiment, the domain name association score refers to a numerical value that can represent association degrees of two domain names and is obtained by performing weight setting processing on the generation time, the update time, the certificate serial number, the certificate signing end time, the timestamp of the pre-certificate and the web page similarity score, and then performing calculation.
In this embodiment, the preset association threshold value refers to a preset minimum value indicating that there is an association relationship between the domain name and the returned domain name, and if the preset association threshold value is greater than the preset association threshold value, there is an association relationship between the domain name and the returned domain name.
The working principle and the beneficial effects of the technical scheme are as follows: by setting a filtering rule for one or more domain names of an attacker, analyzing the association degree of the certificate generation time, the update time, the certificate serial number, the certificate signing ending time and the timestamp of the pre-certificate in the transparency of the query certificate through regular matching, comparing the contents carried by domain name websites, calculating the similarity of web page contents, and carrying out weighted processing on the certificate, the domain name, the timestamp, the serial number information and the web page similarity score by an association matching module to generate a domain name association degree score, thereby finding the association relation among malicious domain names, sending out an alarm in time and defending the attack of the malicious domain names.
Example 2:
based on embodiment 1, the preparation module includes:
domain name type acquisition unit: acquiring and analyzing domain name parameters to obtain domain name types;
query rule acquisition unit: obtaining all corresponding query rules based on the domain name type and a type-query rule comparison table;
query list acquisition unit: and randomly arranging all query rules to obtain a query method list.
In this embodiment, the domain name types include a top-level domain name, a second-level domain name, and a third-level domain name, the top-level domain name being, for example: com stands for business, org stands for organization, secondary domain name such as: com, b.org, tertiary domain name comprising: ju.taobao.com, sf.taobao.com.
In this embodiment, the type-query rule lookup table refers to a lookup table of domain name types and a corresponding plurality of query rules.
In this embodiment, the query rule contains query conditions, statements of query content, for example: city= "Chicago" AND BirthDate < DateAdd ("yyyy", -40, date ()).
The working principle and the beneficial effects of the technical scheme are as follows: the domain name parameters are analyzed to obtain the domain name type, a plurality of corresponding query rules are matched, a query method list is obtained by random arrangement, the query is performed randomly, the comprehensive query is performed, and the subsequent analysis of the domain name association is facilitated.
Example 3:
based on the embodiment 1, the preparation module further includes:
certificate parameter acquisition unit: obtaining corresponding certificate parameters based on the certificate transparency protocol and the domain name parameters;
a security verification unit: carrying out security verification on the certificate parameters to obtain a security verification result;
rule construction unit: if the security verification result passes, constructing a detection rule of the search engine according to the query parameters corresponding to each query rule in the query method list.
In this embodiment, the certificate parameters refer to key parameters of a certificate found in a network with a certificate transparency protocol based on a domain name parameter, for example: a key.
In this embodiment, security verification refers to verifying a key in a certificate parameter, and if the key is successfully decoded, the access request is secure.
In this embodiment, the security verification result refers to a result that a key in a certificate parameter is qualified or unqualified, where the security verification result is that the domain name corresponding to the qualified security verification result is safe.
In this embodiment, the query parameters refer to key parameters of the query in the query rule, such as City= "Chicago" AND BirthDate < DateAdd ("yyyy"), -40, date ()), birthDate < DateAdd.
The working principle and the beneficial effects of the technical scheme are as follows: and the security verification is carried out on the certificate parameters, the malicious domain name is primarily screened, the attack of the simple malicious domain name is prevented, and the security of the website is ensured.
Example 4:
based on embodiment 3, a rule construction unit is used for:
determining a query compatibility type matched with a corresponding query rule from a formula-type database according to a first set formula corresponding to each query rule in the query method list;
constructing a first compatible vector according to the query compatibility type of each query rule, wherein the first compatible vector is obtained by carrying out compatibility type matching and compatibility value filling on the matching elements in the blank vectors, and 0 filling is carried out on the blank elements;
constructing and obtaining a first compatible matrix based on all first compatible vectors, wherein each row vector in the first compatible matrix is matched with a corresponding query rule respectively, and meanwhile, query column vectors aiming at the query rule are supplemented before the first row in the first compatible matrix to obtain a second compatible matrix;
analyzing the query types contained in the query column vectors, setting all query rules in parallel when the query types are full types, and constructing detection rules of a search engine according to main query parameters corresponding to each query rule;
when the query type is not the full type, locking a first missing type from the second compatible matrix, and locking a maximum compatible value and a second maximum compatible value from column vectors corresponding to each first missing type;
determining a first row with the maximum locking quantity and a second row with the second maximum locking quantity based on locking results of each row vector;
when the locking type of the first row and the second row can fully supplement the deletion type, acquiring secondary query parameters of query rules corresponding to the first row and the second row, and combining all the acquired primary query parameters to construct a detection rule of a search engine;
when the locking types of the first row and the second row cannot fully supplement the missing types, the maximum compatible values respectively consistent with each second missing type are screened according to the second missing types and according to the locking results of the remaining rows, parameters to be supplemented in the query rule corresponding to each maximum compatible value are extracted, and all secondary query parameters and main query parameters are combined to construct the detection rule of the search engine.
In this embodiment, the first set formula of the query rule is obtained by matching from a rule-formula database, because the query rule is generally determined according to a regular formula, is designed in advance by an expert, and is based on the database to store the mapping relationship between the rule and the formula, so that the query rule is convenient to directly call.
In this embodiment, the formula-type database includes different setting formulas and compatible types matched with the setting formulas, because many formulas are involved in the rule construction process, and the operation types corresponding to the formulas can be mutually compatible to ensure reasonable operation of the engine, for example, the compatible type of the formula 1: run types 1-0.3, usedIndicating that run type 2-0.3, use +.>Indicating that run type 3-0.5, use +.>A representation;
compatible types of equation 1: run types 1-0.2, usedIndicating that run type 2-0.1, use +.>Indicating that run type 3 is not present, at which time a 0 fill is made, with +.>A representation;
thus: the first compatible vector isAnd +.>First compatible matrix =>
In this embodiment, the query column vector is constructed for the query type corresponding to the query rule, and then the second compatibility matrix=
In this embodiment, the full type refers to a type that all query types in the query column satisfy the settings, and all the settings are full types, that is, whether all the query types exist based on all the settings is judged, if not, the full type is satisfied, otherwise, the deletion type is judged.
For example, the set type includes types 1, 2, and 3, the query type includes types 1 and 2, the miss type is type 3, and the set type includes types 1 and 2, and the set type is regarded as a full type.
In this embodiment, the purpose of parallel setting is to construct each rule according to a parallel sequence, and the main parameter represents a parameter that can completely represent the query type of the query rule, that is, reject irrelevant parameters in the query rule, only keep parameters consistent with the query type, and can effectively ensure the integrity of rule construction.
In this embodiment, for example, there is a miss type for type 3, where the largest compatibility value and the second largest element is determined from the second compatibility matrix to lock, for exampleAnd->Because the second compatible matrix contains at least more thanTwo rows, the upper side is illustrative.
In this embodiment, the secondary query parameter refers to a supplemental parameter corresponding to the query rule and consistent with the missing type, and the supplemental parameter is similar to the principle because there is compatibility, so there must be a parameter consistent with the type.
The beneficial effects of the technical scheme are as follows: the type and the compatibility analysis of the type are carried out on the query rule to ensure the integrity of the type involved in the construction process of the detection rule, wherein the full type analysis is carried out on the column vector of the query rule to determine the adopted parameter construction mode, and when the position is full, the rule is screened by locking a large compatible value, so that the supplement of the missing type parameter is realized, the construction integrity of the detection rule is ensured as much as possible, and the attack of a malicious domain name is prevented to the greatest extent.
Example 5:
based on embodiment 1, the detection module includes:
request acquisition unit: based on each query parameter in the detection rule, obtaining corresponding request information;
search unit: and inputting request information in an API query function provided by a search engine censys to obtain corresponding domain name detailed information.
In this embodiment, the request information refers to detailed information of a request query converted from a query parameter, for example, city= "Chicago" AND BirthDate < DateAdd ("yyyy"), -40, date ()), birthDate < DateAdd; "Chicago", "yyyy", "40, date ().
The working principle and the beneficial effects of the technical scheme are as follows: the corresponding domain name detailed information is obtained by searching according to the detection rules and the API query function provided by the search engine censys, so that the subsequent association analysis between the domain name and the returned domain name is facilitated.
Example 6:
based on embodiment 1, the analysis module includes:
a text grabbing unit: capturing web page contents of a web page of a returned domain name corresponding to the same input domain name of any two input interfaces to obtain corresponding first text contents and second text contents;
text screening unit: based on the structural text database, matching the first structural text and the second structural text in the first text content and the second text content, and deleting;
clause unit: based on the processed punctuation marks of the first text content and the second text content, preliminary clauses are carried out, and a corresponding first clause set and a corresponding second clause set are obtained;
a substring extraction unit: according to the preset substring length, acquiring first substrings with the same length in each first clause in the first clause set;
dictionary matching unit: performing word matching on each first substring and the dictionary;
a substring processing unit: reserving the first substring which is successfully matched directly, removing the last character of the first substring which is not successfully matched directly, and carrying out word matching with the dictionary again until the matching is successful and reserving the substring with the characters removed;
the word segmentation set construction unit: constructing a first word segmentation set based on all reserved substrings;
syntax analysis unit: inputting each first word in the first word segmentation set into a move-in-specification grammar analyzer in sequence to perform LR (k) grammar analysis to obtain a corresponding first grammar analysis result;
the word segmentation screening unit: deleting the first word segmentation with unqualified first grammar analysis results in the first word segmentation set;
vector acquisition unit: obtaining a corresponding first vector set based on the processed first word segmentation set and a word segmentation-space comparison table of a corresponding matched input interface;
a second analysis unit: simultaneously, a second vector set of the second clause set is acquired;
vector group construction unit: based on the first vector set and the second vector set, matching the first vector and the second vector closest to each other, and constructing a vector group;
distance calculation unit: based on each vector group, calculating a corresponding chebyshev distance;
a feature value calculation unit: and calculating two webpage content similarity characteristic values based on each Chebyshev distance.
In this embodiment, the first text content refers to the text content in the first web content that is required to be crawled according to the target in the web content of the web page of the return domain name corresponding to the input domain name of any two input interfaces.
In this embodiment, the second text content refers to the text content in the system custom display content of the second web page that is grabbed in the web page content of the web page of the return domain name corresponding to the input domain name of any two input interfaces.
In this embodiment, the structure text database refers to a database made up of structure texts, which refer to texts having no actual meaning but for structure distinction, for example: concept, technical background, one, two, three.
In this embodiment, the first structure text refers to the same text in the first text content as the structure text in the structure text database.
In this embodiment, the second structure text refers to the same text in the second text content as the structure text in the structure text database.
In this embodiment, the preliminary clause refers to dividing the first text content and the second text content into a plurality of independent sentences according to punctuation marks of the processed first text content and the second text content.
In this embodiment, the first clause set refers to a set of dividing the first text content into a plurality of independent sentences according to the punctuation marks of the processed first text content.
In this embodiment, the second clause set refers to a set of dividing the second text content into a plurality of independent sentences according to the punctuation marks of the processed second text content.
In this embodiment, the preset substring length refers to a length of a sentence string to be matched preset according to a longest length of words in a dictionary, for example: max_len=5.
In this embodiment, the first substring refers to obtaining, according to a preset substring length, a sentence string with the same length in each first clause in the first clause set.
In this embodiment, word matching refers to matching each first substring identically to words in the dictionary that are the same length as the first substring.
In this embodiment, the first word segment set refers to a set of all substring constructions that match successfully.
In this embodiment, the move-to-reduce parser refers to a form of bottom-up parsing, using a stack to hold text symbols, and an input buffer to hold the rest of the symbols to be parsed, move (shift): the next input symbol is moved to the top of the stack, reduction: the right end of the reduced symbol string must be the top of the stack. The parser determines the left end of the string in the stack and decides which non-terminal symbol to replace the string with.
In this embodiment, LR (k) grammar parsing refers to constructing a right-most derivation sequence in reverse direction starting from leaf nodes of the parse tree, gradually reaching up to the root node, thereby constructing a complete parse tree, L of LR (k) grammar is input from left to right, R is reverse right-most derivation (rightmost derivation in reverse), and k is the number of look-ahead symbols.
In this embodiment, the first parsing result refers to a parsing result that each first word in the first word segmentation set accords with a syntax structure or does not accord with the syntax structure, and includes that the parsing result is qualified and the parsing result is unqualified.
In this embodiment, the word-space lookup table refers to a lookup table in which word and corresponding space vectors are in one-to-one correspondence.
In this embodiment, the first set of vectors refers to a set of first vectors in which each first word in the first set of words corresponds to a first vector in the word-space lookup table.
In this embodiment, the second vector set refers to a set of second vectors corresponding to each second word in the second word set and the word-space comparison table, where the second set of second phrases is processed according to the same processing method as the first set of phrases.
In this embodiment, the vector group refers to matching a nearest second vector to each first vector in order, and if the number of first vectors or second vectors is not equal, the other vector which is not matched to the first vector or second vector forms a vector group with 0.
In this embodiment, chebyshev distance refers to an L-infinity metric, which is a metric in vector space, and the distance between two points is defined as the maximum of the absolute value of the difference between the coordinate values, which is a metric derived from the consistent norm (or "upscale norm"), which is also a type of hyperconvex metric:
the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Refers to->A first vector in the set of set vectors; />Refers to->The second vector in the set of vectors.
The working principle and the beneficial effects of the technical scheme are as follows: the web page contents of the web page of the returned domain name corresponding to the same input domain name of any two input interfaces are grabbed to obtain corresponding first text contents and second text contents, word segmentation is carried out on the first text contents and the second text contents, grammar analysis is carried out, the positions of the texts in semantic space are described by using numerical vectors, a first vector set and a second vector set are obtained, a vector set is constructed, corresponding web page content similarity characteristic values are calculated, web page content similarity is accurately analyzed, the analysis of domain name relevance is facilitated, early warning is timely carried out, and attacks of malicious domain names are effectively resisted.
Example 7:
based on embodiment 1, the feature value calculation unit includes:
the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Representing the similarity characteristic value of the webpage content; />Refers to->A first vector in the set of set vectors; />Refers to->A second vector in the set of set vectors; />Representing the number of total vector sets; />Representing the +.>Chebyshev distance of the group vector set; />Characteristic coefficients representing the vector group; />Representing the +.>The vector weights of the set of vectors,indicate->A first vector in the group of vectors is a new vector obtained through adjustment of the corresponding vector weight; />Indicate->A new vector is obtained by adjusting the corresponding vector weight of the second vector in the group vector group; />A definition function representing chebyshev distance; log represents the sign of the log function.
The working principle and the beneficial effects of the technical scheme are as follows: and the similarity is accurately analyzed by calculating the similarity characteristic value of the webpage content of the webpage of the returned domain name corresponding to the same input domain name of any two input interfaces, so that the analysis of the domain name relevance is facilitated, early warning is timely carried out, and the attack of the malicious domain name is effectively resisted.
Example 8:
based on embodiment 1, a setting module includes:
configuration option acquisition unit: determining configuration options based on the domain name detailed information and the webpage content similarity characteristic value;
weight setting unit: weight setting processing is carried out on each configuration option to obtain a domain name association score V;
the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Representing a domain name association value, ">The number of items is configured for domain name association,indicate->A plurality of configuration option weighting values; />The value of the associated item representing the ith configuration option;
the configuration options are related to a certificate serial number, a certificate signing ending time, a pre-certificate time stamp, web page similarity, a certificate updating time and a certificate generating time;
wherein,the weighting value reference values are as follows: certificate serial number: a weighting value 20; certificate signing end time: a weighting value 15; timestamp of pre-certificate: a weighting value 25; web page similarity: a weight value 50; certificate update time: a weighted value of 30; certificate generation time: a weighted value of 30;
the Si reference values are as follows, the difference in time 50 seconds/sequence number is less than 10: a weight value 120; time 100 seconds/phase difference less than 20 weight: 100; time 200 seconds/sequence number less than 40: a weighted value 80; time 300 seconds/sequence number less than 60: a weighting value of 60; time 400 seconds/sequence number less than 80: a weighting value 40; time greater than 500 seconds/sequence number differs by greater than 100 or no inquiry is made: a weight value 50.
In this embodiment, the configuration options refer to the corresponding values in the domain name details of the certificate serial number, the certificate signing end time, the pre-certificate timestamp, the web page similarity, the certificate update time, and the certificate generation time.
The working principle and the beneficial effects of the technical scheme are as follows: the domain name detailed information and the webpage content similarity characteristic value are analyzed, configuration options are determined, weight setting processing is conducted on each configuration option, the domain name relevance score is obtained through calculation, accurate analysis is conducted, early warning is conducted timely, and attacks of malicious domain names are effectively resisted.
Example 9:
based on embodiment 1, a quantization module is used for:
threshold value judging unit: if the domain name association value is larger than the preset association threshold value, the domain name and the returned domain name have association relation, and the domain name with association relation and the returned domain name are judged to be malicious domain names;
an alarm transmission unit: and sending the malicious domain name to a system administrator, and rejecting access of the malicious domain name.
The working principle and the beneficial effects of the technical scheme are as follows: and comparing the domain name association value with a preset association threshold value to obtain whether the domain name and the returned domain name have association relation, accurately judging the malicious domain name, and timely performing early warning to effectively resist the attack of the malicious domain name.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (8)

1. A network security defense system based on a malicious domain name detection method, comprising:
the preparation module: acquiring domain name parameters, randomly matching a query method list, and acquiring detection rules of a search engine based on a certificate transparency protocol;
and a detection module: acquiring corresponding domain name detailed information based on an API query function provided by a search engine censys and the detection rule;
and an analysis module: acquiring a domain name and returning the webpage content corresponding to the domain name, and performing webpage content similarity analysis to obtain a webpage content similarity characteristic value;
and (3) a setting module: performing weight setting processing based on the domain name detailed information and the webpage content similarity characteristic value to obtain a domain name relevance score;
and (3) an association module: and analyzing whether the malicious domain names have association relations or not based on the domain name association degree scores and a preset association threshold value.
2. The system of claim 1, wherein the preparation module comprises:
domain name type acquisition unit: acquiring and analyzing domain name parameters to obtain domain name types;
query rule acquisition unit: obtaining all corresponding query rules based on the domain name type and a type-query rule comparison table;
query list acquisition unit: and randomly arranging all query rules to obtain a query method list.
3. The system of claim 2, wherein the preparation module further comprises:
certificate parameter acquisition unit: obtaining corresponding certificate parameters based on the certificate transparency protocol and the domain name parameters;
a security verification unit: carrying out security verification on the certificate parameters to obtain a security verification result;
rule construction unit: if the security verification result passes, constructing a detection rule of the search engine according to the query parameters corresponding to each query rule in the query method list.
4. The system of claim 1, wherein the detection module further comprises:
request acquisition unit: based on each query parameter in the detection rule, obtaining corresponding request information;
search unit: and inputting request information in an API query function provided by a search engine censys to obtain corresponding domain name detailed information.
5. The system of claim 1, wherein the analysis module comprises:
a text grabbing unit: capturing web page contents of a web page of a returned domain name corresponding to the same input domain name of any two input interfaces to obtain corresponding first text contents and second text contents;
text screening unit: based on the structural text database, matching the first structural text and the second structural text in the first text content and the second text content, and deleting;
clause unit: based on the processed punctuation marks of the first text content and the second text content, preliminary clauses are carried out, and a corresponding first clause set and a corresponding second clause set are obtained;
a substring extraction unit: according to the preset substring length, acquiring first substrings with the same length in each first clause in the first clause set;
dictionary matching unit: performing word matching on each first substring and the dictionary;
a substring processing unit: reserving the first substring which is successfully matched directly, removing the last character of the first substring which is not successfully matched directly, and carrying out word matching with the dictionary again until the matching is successful and reserving the substring with the characters removed;
the word segmentation set construction unit: constructing a first word segmentation set based on all reserved substrings;
syntax analysis unit: inputting each first word in the first word segmentation set into a move-in-specification grammar analyzer in sequence to perform LR (k) grammar analysis to obtain a corresponding first grammar analysis result;
the word segmentation screening unit: deleting the first word segmentation with unqualified first grammar analysis results in the first word segmentation set;
vector acquisition unit: obtaining a corresponding first vector set based on the processed first word segmentation set and a word segmentation-space comparison table of a corresponding matched input interface;
a second analysis unit: acquiring a second vector set of the second clause set;
vector group construction unit: based on the first vector set and the second vector set, matching the first vector and the second vector closest to each other, and constructing a vector group;
distance calculation unit: based on each vector group, calculating a corresponding chebyshev distance;
a feature value calculation unit: and calculating two webpage content similarity characteristic values based on each Chebyshev distance.
6. The system according to claim 5, wherein the feature value calculation unit includes:
the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Representing the similarity characteristic value of the webpage content; />Refers to->A first vector in the set of set vectors; />Refers to->A second vector in the set of set vectors; />Representing the number of total vector sets; />Representing the +.>Chebyshev distance of the group vector set; />Characteristic coefficients representing the vector group; />Representing the +.>The vector weights of the set of vectors,indicate->A first vector in the group of vectors is a new vector obtained through adjustment of the corresponding vector weight; />Indicate->A new vector is obtained by adjusting the corresponding vector weight of the second vector in the group vector group; />A definition function representing chebyshev distance; log represents the sign of the log function.
7. The system of claim 1, wherein the setup module comprises:
configuration option acquisition unit: determining configuration options based on the domain name detailed information and the webpage content similarity characteristic value;
weight setting unit: weight setting processing is carried out on each configuration option to obtain a domain name association score V;
the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Representing a domain name association value, ">Configuring the number of items for domain name association, < >>Indicate->A plurality of configuration option weighting values; />The value of the associated item representing the ith configuration option;
the configuration options are related to a certificate serial number, a certificate signing ending time, a pre-certificate timestamp, web page similarity, a certificate updating time and a certificate generating time.
8. The system of claim 1, wherein the quantization module is configured to:
threshold value judging unit: if the domain name association value is larger than the preset association threshold value, the domain name and the returned domain name have association relation, and the domain name with association relation and the returned domain name are judged to be malicious domain names;
an alarm transmission unit: and sending the malicious domain name to a system administrator, and rejecting access of the malicious domain name.
CN202311337285.3A 2023-10-17 2023-10-17 Network security defense system based on malicious domain name detection method Active CN117081865B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311337285.3A CN117081865B (en) 2023-10-17 2023-10-17 Network security defense system based on malicious domain name detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311337285.3A CN117081865B (en) 2023-10-17 2023-10-17 Network security defense system based on malicious domain name detection method

Publications (2)

Publication Number Publication Date
CN117081865A true CN117081865A (en) 2023-11-17
CN117081865B CN117081865B (en) 2023-12-29

Family

ID=88704668

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311337285.3A Active CN117081865B (en) 2023-10-17 2023-10-17 Network security defense system based on malicious domain name detection method

Country Status (1)

Country Link
CN (1) CN117081865B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180131708A1 (en) * 2016-11-09 2018-05-10 F-Secure Corporation Identifying Fraudulent and Malicious Websites, Domain and Sub-domain Names
CN110049052A (en) * 2019-04-23 2019-07-23 哈尔滨工业大学(威海) The malice domain name detection method of label and attribute similarity based on dom tree
CN110113349A (en) * 2019-05-15 2019-08-09 北京工业大学 A kind of malice encryption traffic characteristics analysis method
US20210320946A1 (en) * 2020-04-13 2021-10-14 Qatar Foundation For Education, Science And Community Development Phishing domain detection systems and methods
CN113688905A (en) * 2021-08-25 2021-11-23 中国互联网络信息中心 Harmful domain name verification method and device
CN114189368A (en) * 2021-11-30 2022-03-15 华中科技大学 Multi-inference engine compatible real-time flow detection system and method
CN115694892A (en) * 2022-09-24 2023-02-03 中软数智信息技术(武汉)有限公司 Network security defense system and method based on network information security
US20230083949A1 (en) * 2021-09-16 2023-03-16 Centripetal Networks, Inc. Malicious homoglyphic domain name detection and associated cyber security applications
CN116389003A (en) * 2019-03-05 2023-07-04 向心有限公司 Method and system for certificate filtering

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180131708A1 (en) * 2016-11-09 2018-05-10 F-Secure Corporation Identifying Fraudulent and Malicious Websites, Domain and Sub-domain Names
CN116389003A (en) * 2019-03-05 2023-07-04 向心有限公司 Method and system for certificate filtering
CN110049052A (en) * 2019-04-23 2019-07-23 哈尔滨工业大学(威海) The malice domain name detection method of label and attribute similarity based on dom tree
CN110113349A (en) * 2019-05-15 2019-08-09 北京工业大学 A kind of malice encryption traffic characteristics analysis method
US20210320946A1 (en) * 2020-04-13 2021-10-14 Qatar Foundation For Education, Science And Community Development Phishing domain detection systems and methods
CN113688905A (en) * 2021-08-25 2021-11-23 中国互联网络信息中心 Harmful domain name verification method and device
US20230083949A1 (en) * 2021-09-16 2023-03-16 Centripetal Networks, Inc. Malicious homoglyphic domain name detection and associated cyber security applications
CN114189368A (en) * 2021-11-30 2022-03-15 华中科技大学 Multi-inference engine compatible real-time flow detection system and method
CN115694892A (en) * 2022-09-24 2023-02-03 中软数智信息技术(武汉)有限公司 Network security defense system and method based on network information security

Also Published As

Publication number Publication date
CN117081865B (en) 2023-12-29

Similar Documents

Publication Publication Date Title
US10423649B2 (en) Natural question generation from query data using natural language processing system
CN107786575A (en) A kind of adaptive malice domain name detection method based on DNS flows
CN107566376A (en) One kind threatens information generation method, apparatus and system
WO2007143914A1 (en) Method, device and inputting system for creating word frequency database based on web information
CN110909531B (en) Information security screening method, device, equipment and storage medium
CN111753171B (en) Malicious website identification method and device
KR20110040147A (en) Apparatus for question answering based on answer trustworthiness and method thereof
TWI656450B (en) Method and system for extracting knowledge from Chinese corpus
CN111859966B (en) Method for generating labeling corpus facing network threat intelligence and electronic device
CN104850574A (en) Text information oriented sensitive word filtering method
CN106803035A (en) A kind of password conjecture set creation method and password cracking method based on username information
CN110572359A (en) Phishing webpage detection method based on machine learning
Rüd et al. Piggyback: Using search engines for robust cross-domain named entity recognition
US12003535B2 (en) Phishing URL detection using transformers
AU2012250880A1 (en) Statistical spell checker
CN112948725A (en) Phishing website URL detection method and system based on machine learning
Xu et al. Using SVM to extract acronyms from text
CN111444713B (en) Method and device for extracting entity relationship in news event
IL292756A (en) A system and method for detecting phishing-domains in a set of domain name system (dns) records
CN107871078A (en) The method that vulnerability information is extracted in non-structured text
Kreaa et al. Arabic words stemming approach using Arabic WordNet
Sagcan et al. Toponym recognition in social media for estimating the location of events
CN117081865B (en) Network security defense system based on malicious domain name detection method
CN107463845B (en) Method and system for detecting SQL injection attack and computer processing equipment
JP5466376B2 (en) Information processing apparatus, first and last name identification method, information processing system, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant