CN116611076B - Domain name matching method and device, electronic equipment and storage medium - Google Patents

Domain name matching method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116611076B
CN116611076B CN202310890305.3A CN202310890305A CN116611076B CN 116611076 B CN116611076 B CN 116611076B CN 202310890305 A CN202310890305 A CN 202310890305A CN 116611076 B CN116611076 B CN 116611076B
Authority
CN
China
Prior art keywords
domain name
matching
matched
tree
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310890305.3A
Other languages
Chinese (zh)
Other versions
CN116611076A (en
Inventor
孙晓申
薛锋
童兆丰
樊兴华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ThreatBook Technology Co Ltd
Original Assignee
Beijing ThreatBook Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ThreatBook Technology Co Ltd filed Critical Beijing ThreatBook Technology Co Ltd
Priority to CN202310890305.3A priority Critical patent/CN116611076B/en
Publication of CN116611076A publication Critical patent/CN116611076A/en
Application granted granted Critical
Publication of CN116611076B publication Critical patent/CN116611076B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a domain name matching method, a device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a domain name to be matched and a domain name filtering list set; constructing a prefix tree according to a first preset domain name set in the domain name filtering list set; constructing a suffix tree according to a second preset domain name set in the domain name filtering list set; and matching the domain name to be matched with the prefix tree and/or the suffix tree to obtain a matching result. By implementing the embodiment of the application, the prefix tree and the suffix tree can be combined to match the domain names, so that the domain names can be matched efficiently, the real-time matching of a large amount of data can be accurately and completely performed, the problem of missing character strings in the matching process can be effectively avoided, and the efficiency is improved.

Description

Domain name matching method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a domain name matching method, a device, an electronic device, and a storage medium.
Background
With the rapid development of networks, a large number of sites have been aggregated on the network, some of which are not lacking in order to jeopardize the interests of the user. In order to ensure the safety of users, the domain name access control technology is adopted in the network safety field. When a client initiates access to a compromised web site, the system matches the domain name that the client initiated access to the domain name in the domain name database to determine whether to allow the client to access and use the resource.
Domain name matching in the prior art has three ways: the method is carried out in an accurate matching mode, namely the domain names of the clients are matched with the domain names in the database one by one; the method comprises the steps of adopting a prefix tree mode to carry out, constructing a prefix tree structure for domain names in a database to carry out matching; and respectively creating regularities for domain names in the database by adopting a regularization matching mode, and matching the domain names to be matched with the regularities.
However, there are many drawbacks in the matching method in the prior art, for example, the implementation method of the prefix tree can only realize efficient matching of all prefix-fixed domain names or suffix-fixed domain names, but cannot realize the situation that the prefix and suffix of the domain name are fixed, so that the matching efficiency of the regular matching method is relatively low, and the regular matching method is not suitable for a scene of real-time matching of a large data volume.
Disclosure of Invention
The embodiment of the application aims to provide a domain name matching method, a device, electronic equipment and a storage medium, which can realize that prefix trees and suffix trees are combined to match domain names, realize high-efficiency matching of domain names, accurately and completely match large-scale data in real time, effectively avoid the problem of missing character strings in the matching process, and improve the efficiency.
In a first aspect, an embodiment of the present application provides a domain name matching method, where the method includes:
acquiring a domain name to be matched and a domain name filtering list set;
constructing a prefix tree according to a first preset domain name set in the domain name filtering list set;
constructing a suffix tree according to a second preset domain name set in the domain name filtering list set;
and matching the domain name to be matched with the prefix tree and/or the suffix tree to obtain a matching result.
In the implementation process, the domain names to be matched are matched according to the prefix tree and the suffix tree respectively, so that the domain names can be matched by combining the prefix tree and the suffix tree, the domain names can be matched efficiently, the real-time matching of a large amount of data can be accurately and completely performed, the problem of missing character strings caused in the matching process is effectively avoided, and the efficiency is improved.
Further, the first preset domain name set includes: prefix fix domain name, precise domain name and middle fuzzy domain name first half part;
the step of constructing a prefix tree according to a first preset domain name set in the domain name filtering list set includes:
traversing each character of the prefix fixed domain name, the accurate domain name and the first half part of the middle fuzzy domain name to construct a prefix tree node, and setting node additional attributes for the prefix tree node to obtain the prefix tree.
In the implementation process, the prefix fixed domain name, the accurate domain name and the front half part of the middle fuzzy domain name in the domain name filtering list are traversed, part of domain names associated with the prefix tree in the domain names can be completely extracted, and then the node additional attribute is set in the domain names, so that the matching is conveniently carried out through the node additional attribute.
Further, the second preset domain name set includes: suffix fixed domain name and middle fuzzy domain name latter half;
the step of constructing a suffix tree according to a second preset domain name set in the domain name filtering list set includes:
and respectively reversing the suffix fixed domain name and the rear half part of the middle fuzzy domain name, traversing each character of the reversed suffix fixed domain name and the reversed rear half part of the middle fuzzy domain name to construct suffix tree nodes, and setting node additional attributes for the suffix tree nodes to obtain the suffix tree.
In the implementation process, the suffix fixed domain name and the middle fuzzy domain name in the domain name filtering list set are traversed, so that the accuracy and the integrity of the suffix tree can be ensured, and the subsequent matching time is shortened.
Further, the step of matching the domain name to be matched with the prefix tree and/or the suffix tree to obtain a matching result includes:
sequentially matching the domain names to be matched with nodes in the prefix tree;
if the node additional attribute matched to the node in the prefix tree contains a fuzzy matching ending mark or an accurate matching ending mark, the matching is successful, and the matching result is that the matching is successful;
if the node additional attribute matched to the node in the prefix tree contains a fuzzy matching continuation mark or is not matched to the node in the prefix tree, matching is continued, and the domain name to be matched is matched with the suffix tree, so that the matching result is obtained.
In the implementation process, the to-be-matched domain names are sequentially matched with the nodes comprising the fuzzy matching ending mark, the accurate matching ending mark and the fuzzy matching continuing mark in the prefix tree, so that the matching condition of the to-be-matched domain names can be rapidly and accurately judged, and the matching efficiency is effectively improved.
Further, the step of matching the domain name to be matched with the suffix tree to obtain the matching result includes:
inverting the character string of the domain name to be matched to obtain an inverted domain name to be matched;
and matching the inverted domain name to be matched with the suffix tree to obtain the matching result.
In the implementation process, the character strings of the domain name to be matched are reversed, so that the domain name to be matched can be matched with the suffix tree conveniently, each character string in the domain name to be matched can be ensured to be matched, and the accuracy of domain name matching is further improved.
Further, the step of matching the inverted domain name to be matched with the suffix tree to obtain the matching result includes:
matching the inverted domain name to be matched with the suffix tree;
if the node additional attribute matched to the node in the suffix tree contains a fuzzy matching ending mark or an accurate matching ending mark, the matching is successful, and the matching result is that the matching is successful;
if the domain name to be matched is not matched with the node in the suffix tree, the matching is failed.
In the implementation process, the inverted domain name to be matched is matched with the suffix tree, and the matching result is judged according to different node additional attributes, so that the matched domain name can be defined more accurately, and the error probability is reduced.
Further, if the node additional attribute of the node, to which the domain name to be matched is matched, in the prefix tree contains a fuzzy matching continuation mark, recording the ID of the current node to which the domain name to be matched is matched, and taking the ID of the current node as an ending ID.
In the implementation process, the ID of the current node is recorded according to different matching conditions, so that the complicated process of searching the node ID again is avoided, and the matching time is shortened.
Further, the step of matching the inverted domain name to be matched with the suffix tree to obtain the matching result further includes:
inverting the character string of the domain name to be matched to obtain an inverted domain name to be matched;
matching the inverted domain name to be matched with the suffix tree;
if the node additional attribute matched to the node in the suffix tree contains a fuzzy matching continuation mark, judging whether the prefix matching list of the current node contains the ending ID or not;
if yes, the matching is successful, and the matching result is that the matching is successful;
if not, the matching is failed, and the matching result is the matching failure.
In the implementation process, when the node additional attribute of the matched node contains the fuzzy matching continuation mark, judgment is further carried out according to the end ID, so that omission in the matching process is avoided.
In a second aspect, an embodiment of the present application further provides a domain name matching apparatus, where the apparatus includes:
the data acquisition module is used for acquiring the domain name to be matched and a domain name filtering list set;
the construction module is used for constructing a prefix tree according to a first preset domain name set in the domain name filtering list set; the suffix tree is also constructed according to a second preset domain name set in the domain name filtering list set;
and the matching module is used for matching the domain name to be matched with the prefix tree and/or the suffix tree to obtain a matching result.
In the implementation process, the domain names to be matched are matched according to the prefix tree and the suffix tree, so that the domain names can be matched by combining the prefix tree and the suffix tree, the domain names can be efficiently matched, the real-time matching of a large amount of data can be accurately and completely performed, the problem of missing character strings caused in the matching process is effectively avoided, and the efficiency is improved.
In a third aspect, an electronic device provided in an embodiment of the present application includes: a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method according to any one of the first aspects when the computer program is executed.
In a fourth aspect, an embodiment of the present application provides a computer readable storage medium, where instructions are stored, when the instructions are executed on a computer, to cause the computer to perform the method according to any one of the first aspects.
In a fifth aspect, embodiments of the present application provide a computer program product, which when run on a computer causes the computer to perform the method according to any of the first aspects.
Additional features and advantages of the disclosure will be set forth in the description which follows, or in part will be obvious from the description, or may be learned by practice of the techniques of the disclosure.
And can be implemented in accordance with the teachings of the specification, the following detailed description of the preferred embodiments of the application, taken in conjunction with the accompanying drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be construed as limiting the scope values, and other related drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a domain name matching method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a prefix tree according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a suffix tree according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a domain name matching device according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only to distinguish the description, and are not to be construed as indicating or implying relative importance.
The following describes in further detail the embodiments of the present application with reference to the drawings and examples. The following examples are illustrative of the application and are not intended to limit the scope of the application.
Example 1
Fig. 1 is a flow chart of a domain name matching method provided by an embodiment of the present application, as shown in fig. 1, the method includes:
s1, acquiring a domain name to be matched and a domain name filtering list set;
s2, constructing a prefix tree according to a first preset domain name set in a domain name filtering list set;
s3, constructing a suffix tree according to a second preset domain name set in the domain name filtering list set;
and S4, matching the domain name to be matched with the prefix tree and/or the suffix tree to obtain a matching result.
In the implementation process, the domain names to be matched are matched according to the prefix tree and the suffix tree, so that the domain names can be matched by combining the prefix tree and the suffix tree, the domain names can be efficiently matched, the real-time matching of a large amount of data can be accurately and completely performed, the problem of missing character strings caused in the matching process is effectively avoided, and the efficiency is improved.
In S1, a domain name filter list set defines domain name matching rules, including prefix-fixed domain names (example: abc) Suffix fixed domain name (example: />abcd.com), precise domain name (example: bcd.com) and prefix suffix fixed middle obfuscated domain names (e.g., abe +.>.com)。
Further, S2 includes:
traversing each character of the prefix fixed domain name, the accurate domain name and the front half part of the middle fuzzy domain name to construct a prefix tree node, and setting node additional attributes for the prefix tree node to obtain a prefix tree.
The first preset domain name set includes: prefix-fixed domain name, precise domain name, and middle-fuzzy domain name first half.
In the implementation process, the prefix fixed domain name, the accurate domain name and the front half part of the middle fuzzy domain name in the domain name filtering list are traversed in sequence, so that part of domain names associated with the prefix tree in the domain names can be completely extracted, and then the node additional attribute is set in the domain names, thereby facilitating the matching through the node additional attribute.
Creating a prefix tree containing prefix-fixed domain name, fixed domain name and front half of middle fuzzy domain name, it should be noted that wild card symbolNot recorded in the prefix tree, the root node is represented by a root.
The tree nodes of each prefix tree need to contain node-attached attributes in addition to the character labels: node IDs, each node ID of the prefix tree is not repeated; isFuzzyEnd, fuzzy match end flag; iscactend, exact match end tag; isFuzzyContinue, fuzzy matching continues to mark, and needs to continue searching in the suffix tree.
Prefix-fixed domain names are encountered when building tree nodes of a prefix treeThe build is terminated while the previous node attachment attribute isFuzzyEnd is set to true. When traversing to the last character of the domain name when building the tree node, setting the node additional attribute isexact end of the last character as true. Intermediate fuzzy domain names encounter +.>The construction is terminated while the previous node attachment attribute isfuzzContinue is set to true.
Further, S3 includes:
and respectively reversing the suffix fixed domain name and the second half part of the middle fuzzy domain name, traversing each character of the reversed suffix fixed domain name and the reversed second half part of the middle fuzzy domain name to construct suffix tree nodes, and setting node additional attributes for the suffix tree nodes to obtain the suffix tree.
The second preset domain name set includes: suffix fixes the domain name and middle obfuscates the latter half of the domain name.
In the implementation process, the suffix fixed domain name and the middle fuzzy domain name in the domain name filtering list set are traversed in sequence, so that the accuracy and the integrity of the suffix tree can be ensured, and the subsequent matching time is shortened.
And inverting the second half part of the intermediate fuzzy domain name with the fixed suffix of the suffix in the domain name filtering list to construct a suffix tree, wherein the root node is also the root.
Tree nodes of the suffix tree need to contain node-attached attributes in addition to the character labels: node IDs, each node ID of the suffix tree is not repeated; isFuzzyEnd, fuzzy match end flag; iscactend, exact match end tag; the prefix match ID (i.e., end ID in the present application) of prefixelist determines whether to end the match together with isfuzzycontinuend.
The suffix fixed domain name terminates the construction when encountering the construction tree node, and simultaneously sets the following node additional attribute isFuzzyEnd to true. The latter half of the middle fuzzy domain name is encountered when a tree node is constructed, the construction is terminated, meanwhile, the node additional attribute isfuzzycontinuue end is set as true, and meanwhile, the node ID with the isFuzzyContinue as true in the prefix tree is added in the node additional attribute prefixIdList.
Further, S4 includes:
sequentially matching the domain names to be matched with nodes in the prefix tree;
if the node additional attribute matched to the node in the prefix tree contains a fuzzy matching ending mark or an accurate matching ending mark, the matching is successful, and the matching result is successful;
if the node additional attribute of the node matched to the prefix tree contains a fuzzy matching continuation mark or is not matched to the node in the prefix tree, the matching is continued, and the domain name to be matched is matched with the suffix tree, so that a matching result is obtained.
In the implementation process, the to-be-matched domain names are sequentially matched with the nodes comprising the fuzzy matching ending mark, the accurate matching ending mark and the fuzzy matching continuing mark in the prefix tree, so that the matching condition of the to-be-matched domain names can be rapidly and accurately judged, and the matching efficiency is effectively improved.
Because there is no precedence relationship between the domain name to be matched and the matching of the prefix tree and the suffix tree, S4 further includes:
sequentially matching the domain names to be matched with nodes in the suffix tree;
if the node additional attribute of the node matched to the suffix tree contains a fuzzy matching ending mark or an accurate matching ending mark, the matching is successful, and the matching result is that the matching is successful;
if the node additional attribute of the node matched to the suffix tree contains a fuzzy matching continuation mark or is not matched to the node in the suffix tree, the matching is continued, and the domain name to be matched is matched with the prefix tree, so that a matching result is obtained.
If the encountered node contains a fuzzy matching ending mark or an accurate matching ending mark in the matching process, the matching is successful, and the domain name to be matched is added to a hit list.
If the domain name to be matched traverses to the last character and still does not meet the nodes of the types, or the situation that the characters are not matched represents the failure of matching, the character string of the domain name to be matched is required to be reversed and then enters the matching process with the prefix tree/suffix tree.
Further, the step of matching the domain name to be matched with the suffix tree to obtain a matching result comprises the following steps:
inverting the character strings of the domain names to be matched to obtain inverted domain names to be matched;
and matching the inverted domain name to be matched with the suffix tree to obtain a matching result.
In the implementation process, the character strings of the domain name to be matched are reversed, so that the domain name to be matched can be matched with the suffix tree conveniently, each character string in the domain name to be matched can be ensured to be matched, and the accuracy of domain name matching is further improved.
Further, the step of matching the inverted domain name to be matched with the suffix tree to obtain a matching result comprises the following steps:
matching the inverted domain name to be matched with the suffix tree;
if the node additional attribute of the node matched to the suffix tree contains a fuzzy matching ending mark or an accurate matching ending mark, the matching is successful, and the matching result is that the matching is successful;
if the domain name to be matched is not matched with the node in the suffix tree, the matching is failed.
In the implementation process, the inverted domain name to be matched is matched with the suffix tree, and the matching result is judged according to different node additional attributes, so that the matched domain name can be defined more accurately, and the error probability is reduced.
The matching process of the suffix tree is similar to that of the prefix tree, and the encountered node contains a fuzzy matching ending mark or an accurate matching ending mark represents successful matching.
Further, if the node additional attribute of the node, which is matched with the domain name to be matched in the prefix tree, contains a fuzzy matching continuation mark, the ID of the current node, which is matched with the domain name to be matched, is recorded, and the ID of the current node is used as an ending ID.
In the implementation process, the ID of the current node is recorded according to different matching conditions, so that the complicated process of searching the node ID again is avoided, and the matching time is shortened.
If the node is continuously fuzzy in the matching process, the partial matching is successful, the ID of the current node is required to be recorded, the character string of the domain name to be matched is reversed, and then the matching process with the suffix tree is carried out.
Further, the step of matching the inverted domain name to be matched with the suffix tree to obtain a matching result further comprises:
inverting the character strings of the domain names to be matched to obtain inverted domain names to be matched;
matching the inverted domain name to be matched with the suffix tree;
if the node additional attribute of the node matched to the suffix tree contains a fuzzy matching continuation mark, judging whether a prefix matching list of the current node contains an end ID or not;
if yes, the matching is successful, and the matching result is successful;
if not, the matching fails, and the matching result is the matching failure.
In the implementation process, when the node additional attribute of the matched node contains the fuzzy matching continuation mark, judgment is further carried out according to the end ID, so that omission in the matching process is avoided.
If the encountered node contains isFuzzyContinueEnd in the matching process, whether the end ID of the end node needing to confirm prefix matching is contained in the prefix matching list of the current node or not is judged, if so, the matching is successful, and the domain name to be matched is added to the hit list.
If the domain name to be matched traverses to the last character, the node of the type is not encountered, or the situation that the characters are not matched represents the failure of matching.
Illustratively, the structure of the prefix tree (A) is as shown in FIG. 2, assuming that the terms abc,ac/>,bcd/>,bcd.com,abeConstructing a prefix tree in the order of com, defining only one ID for each node, and automatically adding the IDs according to the newly added order of the nodes, wherein the IDs are added with the additional attributes isFuzzyEnd of the nodes with the numbers of 3, 4 and 7: true; the additional attribute isexact end of the number 11 node; the additional attribute isFuzzyContinue of node No. 8: true. abe->Com creates nodes in the current prefix tree only for the first half abe, due to +.>The com character string is needed to continue to participate in generating another tree in a mode of fixing domain names by suffixes, and only the additional attribute record prefix id is needed to be added to the node of the other tree, namely 8.
The structure of the suffix tree (B) is shown in fig. 3, and the suffix tree is generated by inverting the character string of the domain name and then regenerating the character string into tree nodes, and traversing the tree after inverting the character string of the domain name to be matched when matching is performed.
Suppose according to abeCom, & gt, of the group of com>abcd.com、/>ced.com、/>The dcd.com sequence builds a tree, each node defines a unique ID, the ID adds the additional attribute isFuzyEnd of the 8, 10, 11 nodes according to the new increasing sequence of the nodes, the additional attribute isFuzyContinueEnd of the 4 nodes, the tree, the prefixIdList, the order of the nodes is [8]]. When the domain name of cd.com needs to be matched, the head of c is not arranged in the A tree, so that the cd.com is matched with the B tree after being inverted, and finally the node 6 of the B tree can be matched, but the node 6 is not provided with an isFuzyEnd true mark, so that the matching fails.
When bcd.cn needs to be matched, the node attribute isFuzzyEnd: true on the corresponding tree when d characters are matched in the A tree, so that the matching is successful.
When the ced.com needs to be matched, the matching in the A tree fails, so the ced.com is inverted and then matched with the B tree, and when the d.com is matched with the No. 4 node, the isFuzzyContinueEnd of the No. 4 node is realized: true, prefix matching list prefix IdList [8], because the A tree matching fails, ID is not ended, the matching is continued in detail, and when the node 10 is finally matched, the node is added with attribute isFuzzyEnd: true, so that the matching is successful.
When the matching of the abed.com is needed, the node number 8 matched to the e character in the A tree is added with the attribute isFuzzyContinue: true, so continue to match d.com inverted to B tree, isfuzzycontinuend of node when matched to node No. 4: true, prefixIdList: [8], because the A tree is matched to node number 8, the match is successful.
The embodiment of the application solves the problem of low efficiency of regular matching by combining the two prefix trees and the suffix tree, and satisfies the real-time and efficient matching of multiple scenes of domain name matching.
Example two
In order to perform a corresponding method of the above embodiment to achieve the corresponding functions and technical effects, a domain name matching apparatus is provided below, as shown in fig. 4, where the apparatus includes:
the data acquisition module 1 is used for acquiring a domain name to be matched and a domain name filtering list set;
a construction module 2, configured to construct a prefix tree according to a first preset domain name set in the domain name filtering list set; the suffix tree is also constructed according to a second preset domain name set in the domain name filtering list set;
and the matching module 3 is used for matching the domain name to be matched with the prefix tree and/or the suffix tree to obtain a matching result.
In the implementation process, the domain names to be matched are matched according to the prefix tree and the suffix tree, so that the domain names can be matched by combining the prefix tree and the suffix tree, the domain names can be efficiently matched, the real-time matching of a large amount of data can be accurately and completely performed, the problem of missing character strings caused in the matching process is effectively avoided, and the efficiency is improved.
Further, the building module 2 is further configured to:
traversing each character of the prefix fixed domain name, the accurate domain name and the front half part of the middle fuzzy domain name to construct a prefix tree node, and setting node additional attributes for the prefix tree node to obtain a prefix tree.
In the implementation process, the prefix fixed domain name, the accurate domain name and the front half part of the middle fuzzy domain name in the domain name filtering list are traversed in sequence, so that part of domain names associated with the prefix tree in the domain names can be completely extracted, and then the node additional attribute is set in the domain names, thereby facilitating the matching through the node additional attribute.
Further, the building module 2 is further configured to:
and respectively reversing the suffix fixed domain name and the second half part of the middle fuzzy domain name, traversing each character of the reversed suffix fixed domain name and the reversed second half part of the middle fuzzy domain name to construct suffix tree nodes, and setting node additional attributes for the suffix tree nodes to obtain the suffix tree.
In the implementation process, the suffix fixed domain name and the middle fuzzy domain name in the domain name filtering list set are traversed in sequence, so that the accuracy and the integrity of the suffix tree can be ensured, and the subsequent matching time is shortened.
Further, the matching module 3 is further configured to:
sequentially matching the domain names to be matched with nodes in the prefix tree;
if the node additional attribute matched to the node in the prefix tree contains a fuzzy matching ending mark or an accurate matching ending mark, the matching is successful, and the matching result is successful;
if the node additional attribute of the node matched to the prefix tree contains a fuzzy matching continuation mark or is not matched to the node in the prefix tree, the matching is continued, and the domain name to be matched is matched with the suffix tree, so that a matching result is obtained.
In the implementation process, the to-be-matched domain names are sequentially matched with the nodes comprising the fuzzy matching ending mark, the accurate matching ending mark and the fuzzy matching continuing mark in the prefix tree, so that the matching condition of the to-be-matched domain names can be rapidly and accurately judged, and the matching efficiency is effectively improved.
Further, the matching module 3 is further configured to:
inverting the character strings of the domain names to be matched to obtain inverted domain names to be matched;
and matching the inverted domain name to be matched with the suffix tree to obtain a matching result.
In the implementation process, the character strings of the domain name to be matched are reversed, so that the domain name to be matched can be matched with the suffix tree conveniently, each character string in the domain name to be matched can be ensured to be matched, and the accuracy of domain name matching is further improved.
Further, the matching module 3 is further configured to:
if the node additional attribute of the node, which is matched with the domain name to be matched to the node in the prefix tree, comprises a fuzzy matching continuation mark, recording the ID of the current node, which is matched with the domain name to be matched, and taking the ID of the current node as an ending ID.
In the implementation process, the ID of the current node is recorded according to different matching conditions, so that the complicated process of searching the node ID again is avoided, and the matching time is shortened.
Further, the matching module 3 is further configured to:
matching the inverted domain name to be matched with the suffix tree;
if the node additional attribute of the node matched to the suffix tree contains a fuzzy matching ending mark or an accurate matching ending mark, the matching is successful, and the matching result is that the matching is successful;
if the domain name to be matched is not matched with the node in the suffix tree, the matching is failed.
In the implementation process, the inverted domain name to be matched is matched with the suffix tree, and the matching result is judged according to different node additional attributes, so that the matched domain name can be defined more accurately, and the error probability is reduced.
Further, the matching module 3 is further configured to:
inverting the character strings of the domain names to be matched to obtain inverted domain names to be matched;
matching the inverted domain name to be matched with the suffix tree;
if the node additional attribute of the node matched to the suffix tree contains a fuzzy matching continuation mark, judging whether a prefix matching list of the current node contains an end ID or not;
if yes, the matching is successful, and the matching result is successful;
if not, the matching fails, and the matching result is the matching failure.
In the implementation process, when the node additional attribute of the matched node contains the fuzzy matching continuation mark, judgment is further carried out according to the end ID, so that omission in the matching process is avoided.
The domain name matching apparatus described above may implement the method of the first embodiment described above. The options in the first embodiment described above also apply to this embodiment, and are not described in detail here.
The rest of the embodiments of the present application may refer to the content of the first embodiment, and in this embodiment, no further description is given.
Example III
An embodiment of the present application provides an electronic device, including a memory and a processor, where the memory is configured to store a computer program, and the processor is configured to execute the computer program to cause the electronic device to perform the domain name matching method of the first embodiment.
Alternatively, the electronic device may be a server.
Referring to fig. 5, fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device may include a processor 51, a communication interface 52, a memory 53, and at least one communication bus 54. Wherein the communication bus 54 is used to enable direct connection communication of these components. Wherein the communication interface 52 of the device in the embodiment of the present application is used for signaling or data communication with other node devices. The processor 51 may be an integrated circuit chip with signal processing capabilities.
The processor 51 may be a general-purpose processor including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but may also be a Digital Signal Processor (DSP), application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. The general purpose processor may be a microprocessor or the processor 51 may be any conventional processor or the like.
The Memory 53 may be, but is not limited to, random access Memory (Random Access Memory, RAM), read Only Memory (ROM), programmable Read Only Memory (Programmable Read-Only Memory, PROM), erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc. The memory 53 has stored therein computer readable instructions which, when executed by the processor 51, enable the apparatus to perform the steps described above in relation to the embodiment of the method of fig. 1.
Optionally, the electronic device may further include a storage controller, an input-output unit. The memory 53, the memory controller, the processor 51, the peripheral interface, and the input/output unit are electrically connected directly or indirectly to each other, so as to realize data transmission or interaction. For example, the components may be electrically coupled to each other via one or more communication buses 54. The processor 51 is arranged to execute executable modules stored in the memory 53, such as software functional modules or computer programs comprised by the device.
The input-output unit is used for providing the user with the creation task and creating the starting selectable period or the preset execution time for the task so as to realize the interaction between the user and the server. The input/output unit may be, but is not limited to, a mouse, a keyboard, and the like.
It will be appreciated that the configuration shown in fig. 5 is merely illustrative, and that the electronic device may also include more or fewer components than shown in fig. 5, or have a different configuration than shown in fig. 5. The components shown in fig. 5 may be implemented in hardware, software, or a combination thereof.
In addition, the embodiment of the present application further provides a computer readable storage medium storing a computer program, where the computer program when executed by a processor implements the domain name matching method of the first embodiment.
The present application also provides a computer program product which, when run on a computer, causes the computer to perform the method described in the method embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus embodiments described above are merely illustrative, for example, of the flowcharts and block diagrams in the figures that illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based devices which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk, etc.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and variations will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
The above description is merely illustrative of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about variations or substitutions within the scope of the present application, and the application is intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be defined by the protection scope of the claims.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Claims (8)

1. A method of domain name matching, the method comprising:
acquiring a domain name to be matched and a domain name filtering list set;
constructing a prefix tree according to a first preset domain name set in the domain name filtering list set;
constructing a suffix tree according to a second preset domain name set in the domain name filtering list set;
matching the domain name to be matched with the prefix tree and/or the suffix tree to obtain a matching result;
the first preset domain name set includes: prefix fix domain name, precise domain name and middle fuzzy domain name first half part;
the step of constructing a prefix tree according to a first preset domain name set in the domain name filtering list set includes:
traversing each character of the prefix fixed domain name, the accurate domain name and the first half part of the middle fuzzy domain name to construct a prefix tree node, and setting node additional attributes for the prefix tree node to obtain the prefix tree;
the second preset domain name set includes: suffix fixed domain name and middle fuzzy domain name latter half;
the step of constructing a suffix tree according to a second preset domain name set in the domain name filtering list set includes:
inverting the suffix fixed domain name and the rear half part of the middle fuzzy domain name respectively, traversing each character of the inverted suffix fixed domain name and the inverted rear half part of the middle fuzzy domain name to construct suffix tree nodes, and setting node additional attributes for the suffix tree nodes to obtain the suffix tree;
the step of matching the domain name to be matched with the prefix tree and/or the suffix tree to obtain a matching result comprises the following steps:
sequentially matching the domain names to be matched with nodes in the prefix tree;
if the node additional attribute matched to the node in the prefix tree contains a fuzzy matching ending mark or an accurate matching ending mark, the matching is successful, and the matching result is that the matching is successful;
if the node additional attribute matched to the node in the prefix tree contains a fuzzy matching continuation mark or is not matched to the node in the prefix tree, matching is continued, and the domain name to be matched is matched with the suffix tree, so that the matching result is obtained.
2. The domain name matching method according to claim 1, wherein the step of matching the domain name to be matched with the suffix tree to obtain the matching result comprises:
inverting the character string of the domain name to be matched to obtain an inverted domain name to be matched;
and matching the inverted domain name to be matched with the suffix tree to obtain the matching result.
3. The domain name matching method according to claim 2, wherein the step of matching the inverted domain name to be matched with the suffix tree to obtain the matching result includes:
matching the inverted domain name to be matched with the suffix tree;
if the node additional attribute matched to the node in the suffix tree contains a fuzzy matching ending mark or an accurate matching ending mark, the matching is successful, and the matching result is that the matching is successful;
if the domain name to be matched is not matched with the node in the suffix tree, the matching is failed.
4. The domain name matching method according to claim 1, wherein if the node additional attribute of the node in the prefix tree to which the domain name to be matched is matched contains a fuzzy match continuation flag, the ID of the current node to which the domain name to be matched is recorded, and the ID of the current node is used as an end ID.
5. The domain name matching method according to claim 4, wherein the step of matching the domain name to be matched with the suffix tree to obtain the matching result comprises:
inverting the character string of the domain name to be matched to obtain an inverted domain name to be matched;
matching the inverted domain name to be matched with the suffix tree;
if the node additional attribute matched to the node in the suffix tree contains a fuzzy matching continuation mark, judging whether the prefix matching list of the current node contains the ending ID or not;
if yes, the matching is successful, and the matching result is that the matching is successful;
if not, the matching is failed, and the matching result is the matching failure.
6. A domain name matching apparatus, the apparatus comprising:
the data acquisition module is used for acquiring the domain name to be matched and a domain name filtering list set;
the construction module is used for constructing a prefix tree according to a first preset domain name set in the domain name filtering list set; the suffix tree is also constructed according to a second preset domain name set in the domain name filtering list set;
the matching module is used for matching the domain name to be matched with the prefix tree and/or the suffix tree to obtain a matching result;
the building module is also for:
the first preset domain name set includes: prefix fix domain name, precise domain name and middle fuzzy domain name first half part;
traversing each character of the prefix fixed domain name, the accurate domain name and the first half part of the middle fuzzy domain name to construct a prefix tree node, and setting node additional attributes for the prefix tree node to obtain the prefix tree;
the second preset domain name set includes: suffix fixed domain name and middle fuzzy domain name latter half;
inverting the suffix fixed domain name and the rear half part of the middle fuzzy domain name respectively, traversing each character of the inverted suffix fixed domain name and the inverted rear half part of the middle fuzzy domain name to construct suffix tree nodes, and setting node additional attributes for the suffix tree nodes to obtain the suffix tree;
the matching module is also used for:
sequentially matching the domain names to be matched with nodes in the prefix tree;
if the node additional attribute matched to the node in the prefix tree contains a fuzzy matching ending mark or an accurate matching ending mark, the matching is successful, and the matching result is that the matching is successful;
if the node additional attribute matched to the node in the prefix tree contains a fuzzy matching continuation mark or is not matched to the node in the prefix tree, matching is continued, and the domain name to be matched is matched with the suffix tree, so that the matching result is obtained.
7. An electronic device comprising a memory for storing a computer program and a processor that runs the computer program to cause the electronic device to perform the domain name matching method according to any one of claims 1 to 5.
8. A computer storage medium, characterized in that it stores a computer program which, when executed by a processor, implements the domain name matching method according to any of claims 1 to 5.
CN202310890305.3A 2023-07-20 2023-07-20 Domain name matching method and device, electronic equipment and storage medium Active CN116611076B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310890305.3A CN116611076B (en) 2023-07-20 2023-07-20 Domain name matching method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310890305.3A CN116611076B (en) 2023-07-20 2023-07-20 Domain name matching method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116611076A CN116611076A (en) 2023-08-18
CN116611076B true CN116611076B (en) 2023-10-27

Family

ID=87685742

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310890305.3A Active CN116611076B (en) 2023-07-20 2023-07-20 Domain name matching method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116611076B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014176754A1 (en) * 2013-04-30 2014-11-06 Hewlett-Packard Development Company, L.P. Histogram construction for string data
CN105824927A (en) * 2016-03-16 2016-08-03 中国互联网络信息中心 Domain name matching method based on tree automaton
CN114024701A (en) * 2020-07-17 2022-02-08 华为技术有限公司 Domain name detection method, device and communication system
CN114791985A (en) * 2022-05-26 2022-07-26 奇安信科技集团股份有限公司 Domain name matching method and device and prefix tree updating method and device
CN115146118A (en) * 2022-07-15 2022-10-04 平安科技(深圳)有限公司 Information retrieval method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014176754A1 (en) * 2013-04-30 2014-11-06 Hewlett-Packard Development Company, L.P. Histogram construction for string data
CN105824927A (en) * 2016-03-16 2016-08-03 中国互联网络信息中心 Domain name matching method based on tree automaton
CN114024701A (en) * 2020-07-17 2022-02-08 华为技术有限公司 Domain name detection method, device and communication system
CN114791985A (en) * 2022-05-26 2022-07-26 奇安信科技集团股份有限公司 Domain name matching method and device and prefix tree updating method and device
CN115146118A (en) * 2022-07-15 2022-10-04 平安科技(深圳)有限公司 Information retrieval method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN116611076A (en) 2023-08-18

Similar Documents

Publication Publication Date Title
US10417063B2 (en) Artificial creation of dominant sequences that are representative of logged events
US7761398B2 (en) Apparatus and method for identifying process elements using request-response pairs, a process graph and noise reduction in the graph
CN108509544B (en) Method and device for acquiring mind map, equipment and readable storage medium
CN110275889B (en) Feature processing method and device suitable for machine learning
KR102114532B1 (en) Information operation
US7257576B2 (en) Method and system for a pattern matching engine
CN109933589B (en) Data structure conversion method for data summarization based on ElasticSearch aggregation operation result
CN111104259A (en) Database recovery method and device and storage medium
CN116611076B (en) Domain name matching method and device, electronic equipment and storage medium
CN111258987A (en) Cross-database data migration method and device
CN105843809B (en) Data processing method and device
CN112434062A (en) Quasi-real-time data processing method, device, server and storage medium
CN112437022A (en) Network flow identification method, equipment and computer storage medium
CN110209890A (en) The querying method and user equipment of information browse data on block chain
US11997168B2 (en) Connecting devices for communication sessions
CN115065558A (en) Attack flow tracing method and device for APT attack
CN111563064B (en) File operation method, system, device and readable storage medium
CN115016782A (en) vue component generation method and device
CN114327471A (en) SQL-based data blood margin analysis method and device, electronic equipment and storage medium
US8762381B2 (en) Storing multipart XML documents
CN106776257B (en) Response time statistical method and device for system performance test
WO2021217397A1 (en) Node.js component vulnerability detection method and system
CN116069738B (en) Root zone file generation method, terminal equipment and computer readable storage medium
CN110415045A (en) A kind of method, apparatus, equipment and medium to browser advertisement
CN111199021A (en) Copyright protection method and device based on block chain

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant