CN100511229C - Domain name information storage and inquiring method and system - Google Patents

Domain name information storage and inquiring method and system Download PDF

Info

Publication number
CN100511229C
CN100511229C CNB2006100603451A CN200610060345A CN100511229C CN 100511229 C CN100511229 C CN 100511229C CN B2006100603451 A CNB2006100603451 A CN B2006100603451A CN 200610060345 A CN200610060345 A CN 200610060345A CN 100511229 C CN100511229 C CN 100511229C
Authority
CN
China
Prior art keywords
node
character string
string
stored
present node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2006100603451A
Other languages
Chinese (zh)
Other versions
CN101055574A (en
Inventor
刘竟
郑志彬
刘廷永
孙知信
宫婧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CNB2006100603451A priority Critical patent/CN100511229C/en
Publication of CN101055574A publication Critical patent/CN101055574A/en
Application granted granted Critical
Publication of CN100511229C publication Critical patent/CN100511229C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种域名类字符串的存储及查询方法以及系统,包括:将待存储的域名类字符串之间进行比较,把待存储域名类字符串相同前缀部分用一个节点存放,作为公共节点;将待存储域名类字符串不同部分作为该公共节点的子节点插入存放;对于完全不匹配的两个域名类字符串,则作为兄弟节点插入存放。查询时,A、将节点内字符串与待查询字符串进行比较和判断;B、若二者完全匹配,取待查询字符串与节点内字符串不匹配部分作为新的待查询字符串,取当前节点内字符串的子节点作为新的当前节点,返回A;C、若二者不完全匹配,则取当前节点的兄弟节点作为新的当前节点,与待查询字符串进行下一轮比较和判断,返回步骤B。该方法提高了域名类字符串查询速度。

A method and system for storing and querying domain name character strings, comprising: comparing the domain name character strings to be stored, and storing the part with the same prefix of the domain name character strings to be stored as a common node; Different parts of the stored domain name strings are inserted and stored as child nodes of the common node; for two domain name strings that do not match completely, they are inserted and stored as sibling nodes. When querying, A. compare and judge the character string in the node with the character string to be queried; B. If the two match completely, take the unmatched part of the character string to be queried and the character string in the node as the new character string to be queried, and take The child node of the string in the current node is used as the new current node, and returns A; C. If the two do not match completely, take the sibling node of the current node as the new current node, and perform the next round of comparison with the string to be queried and Judgment, return to step B. This method improves the query speed of domain name class strings.

Description

一种域名类信息的存储及查询方法以及系统 A storage and query method and system for domain name information

技术领域 technical field

本发明涉及通信技术领域,尤其涉及一种域名类信息的存储及快速查询方法以及域名类信息存储查询系统。The invention relates to the field of communication technology, in particular to a storage and fast query method for domain name information and a storage and query system for domain name information.

背景技术 Background technique

随着计算机技术的飞速发展,信息的数量呈几何级数增长。如此大的信息量对于查找和存放信息都带来了不少困难。我们就需要有效找到数据的方法,查找的目的在于从一些数据中寻找一个特定的值,由此产生的各种查找方法都是为了追求更高的效率与更方便的操作。对于域名的查找同样如此,例如在具体的某个网络应用中如果需要知道某个域名是否已经被保存,如果已经被保存,就需要对发来的数据包做一次重定向。这种情况下,就需要很快的知道某个域名是否被存储了,如果花费大量时间在查询上会使网络的性能大打折扣,并且可能出现很多网络问题。With the rapid development of computer technology, the amount of information is increasing exponentially. Such a large amount of information has brought many difficulties to finding and storing information. We need an effective way to find data. The purpose of searching is to find a specific value from some data. The resulting various search methods are all for the pursuit of higher efficiency and more convenient operation. The same is true for the domain name search. For example, in a specific network application, if you need to know whether a certain domain name has been saved, if it has been saved, you need to redirect the sent data packets. In this case, it is necessary to quickly know whether a domain name is stored. If a large amount of time is spent on the query, the performance of the network will be greatly reduced, and many network problems may occur.

现有的相关检索技术主要有前缀树,BSD radix树两种。Existing related retrieval technologies mainly include prefix tree and BSD radix tree.

BSD radix树是一种为查找IP地址而设计的数据结构。BSD radix树的节点的存储结构如图1所示,其内部节点和叶子节点使用的都是radix_node结构体,只是少数字段的定义有所不同。首先通过内部节点来查看radix_node结构体中的各个字段。该结构主要保存了指向子节点的指针和指向父节点的指针,还有需要检测的bit位的位置信息。BSDradix树的查找过程分为三步完成:第一步,如图2所示,寻找叶子节点,假设现在需要在这个radix树中查找某个条路由。从树的顶端开始,根据沿途内部节点指定的bit进行测试。首先测试根节点的测试位,如果根据根节点的测试位测试结果是1那么进入左子树,如果测试结果是0就进入右子树。下面按照上述方法继续判断,将会遇到一个待测试字段为负值的节点,即叶子节点,于是查找操作将停止在此处。第二步,辨重:如果在第一步中找到的叶子节点与查找键不满足匹配的条件,则需要遍历这个叶子节点的重复键链表。由于重复键链表中的叶子节点与第一步中找到的叶子节点的键值(也就是IP地址)是完全相同的,只是掩码呈逐渐缩短的趋势,因此可能在重复键链表中存在网络匹配的可能。重复键处理的过程如图3所示。第三步,回溯:到目前为止,只是使用作为查找键的IP地址在radix树中根据内部节点指示的bit测试位置找到了某个叶子节点,并进行了重复键处理,仍然没有找到匹配的叶子节点。这并不能排除在radix树中还存在有其它可能满足网络匹配条件的叶子节点,因此就需要沿着来时的内部节点路径向树顶回溯,寻求网络匹配的可能。回溯过程如图4所示。回溯途中经过的是一系列的内部节点,对于每一个内部节点,将会判断它是否挂的有掩码链表,掩码链表在图4中用粗实线表示。没有掩码链表的内部节点将不予考虑,直接通过。如果某个内部节点挂的有掩码链表,那说明在它的子树中可能存在着网络匹配的可能,需要停下来做一下判断再决定是否继续回溯。A BSD radix tree is a data structure designed for looking up IP addresses. The storage structure of the nodes of the BSD radix tree is shown in Figure 1. The internal nodes and leaf nodes all use the radix_node structure, but the definitions of a few fields are different. First, check the fields in the radix_node structure through internal nodes. This structure mainly stores pointers to child nodes and pointers to parent nodes, as well as position information of bits to be detected. The search process of the BSDradix tree is divided into three steps: the first step, as shown in Figure 2, is to find the leaf node, assuming that a certain route needs to be searched in the radix tree now. Starting at the top of the tree, tests are performed against bits specified by internal nodes along the way. First test the test bit of the root node, if the test result of the root node is 1, then enter the left subtree, if the test result is 0, then enter the right subtree. Continue to judge according to the above method, and you will encounter a node with a negative value in the field to be tested, that is, a leaf node, so the search operation will stop here. The second step is identification: if the leaf node found in the first step does not meet the matching condition with the search key, it is necessary to traverse the duplicate key linked list of the leaf node. Since the leaf node in the duplicate key list is exactly the same as the key value (that is, the IP address) of the leaf node found in the first step, but the mask is gradually shortened, there may be a network match in the duplicate key list possible. The process of repeated key processing is shown in Figure 3. The third step, backtracking: so far, only using the IP address as the lookup key to find a certain leaf node in the radix tree according to the bit test position indicated by the internal node, and performing duplicate key processing, but still no matching leaf is found node. This does not rule out that there are other leaf nodes that may meet the network matching conditions in the radix tree, so it is necessary to trace back to the top of the tree along the internal node path at the time of arrival to seek the possibility of network matching. The backtracking process is shown in Figure 4. A series of internal nodes are passed through on the way back. For each internal node, it will be judged whether it has a mask linked list. The mask linked list is represented by a thick solid line in Figure 4. Internal nodes without a mask list will not be considered and passed directly. If there is a mask linked list attached to an internal node, it means that there may be a possibility of network matching in its subtree, and it is necessary to stop and make a judgment before deciding whether to continue backtracking.

BSD radix树是一种基于以二进制表示的键值的查找树,尤其适合于处理非常长的、可变长度的键值。字符串当然也可以看作是一个二进制的键值,但是字符串中的每个字符的可能取值只有几十种(a-z,A-Z,等等),比起纯的二进制键值的每个字节可以取256种的情况小很多,但该方法将字符串也当作二进制流来做查询,因此用这种方法查询字符串不能很好体现出速度的优势。The BSD radix tree is a search tree based on key values represented in binary, especially suitable for processing very long, variable-length key values. Of course, a string can also be regarded as a binary key value, but there are only dozens of possible values for each character in the string (a-z, A-Z, etc.), compared to each character of a pure binary key value The case where 256 sections can be selected is much smaller, but this method also treats the string as a binary stream for query, so querying the string with this method cannot well reflect the advantage of speed.

前缀树多在数据挖掘中被使用,其存储方式如图5所示。前缀树如果把它的节点的字符集限定在26个英文字母之内,可以看作是最多只有26叉的树型结构。前缀树也可以提供一种相对比较快速的查询方法。但前缀树的结构保存了很多多余的字符串信息,这导致了在查找域名的时候会引起不必要的回溯。例如:如图4所示的前缀树中要查找“中国人”,首先从根节点找起,“中”字匹配,进入“中国”子树。然后从头开始比较“中国”是否匹配,造成回溯。接着又进入“中国人”节点,又重新开始匹配,大量的时间浪费在不必要的回溯上了。该方法用于域名类字符串存储时,既浪费存储空间,又由于树结构深度深,查找速度就慢,并且,这种技术同样不能压缩数据的存储量。The prefix tree is mostly used in data mining, and its storage method is shown in Figure 5. If the prefix tree limits the character set of its nodes to 26 English letters, it can be regarded as a tree structure with at most 26 branches. The prefix tree can also provide a relatively fast query method. However, the structure of the prefix tree saves a lot of redundant string information, which leads to unnecessary backtracking when looking up domain names. For example: to search for "Chinese" in the prefix tree shown in Figure 4, first search from the root node, match the word "中", and enter the "Chinese" subtree. Then compare "China" from the beginning to see if it matches, causing backtracking. Then it entered the "Chinese" node and started matching again. A lot of time was wasted on unnecessary backtracking. When this method is used to store domain name strings, it wastes storage space, and because of the deep tree structure, the search speed is slow, and this technique also cannot compress the storage capacity of data.

发明内容 Contents of the invention

本发明所要解决的技术问题是:提供一种适应快速查找域名类字符串的存储方式;The technical problem to be solved by the present invention is to provide a storage method suitable for quickly searching domain name strings;

本发明要解决的另一问题是:提供一种与上述存储方法相对应的域名类字符串查询方法,该查询方法可以快速有效地判断某个域名类字符串是否已经被存储。Another problem to be solved by the present invention is to provide a domain name string query method corresponding to the above storage method, which can quickly and effectively determine whether a certain domain name string has been stored.

本发明还提供一种域名类信息存储查询系统。The invention also provides a domain name information storage and query system.

本发明为解决上述技术问题所采用的技术方案为:The technical scheme that the present invention adopts for solving the problems of the technologies described above is:

一种域名类字符串的存储方法,所述方法包括:将待存储域名类字符串之间进行比较,把所述待存储域名类字符串相同前缀部分用一个节点存放,作为公共节点;将待存储域名类字符串不同部分作为该公共节点的子节点插入存放;对于完全不匹配的两个域名类字符串,则作为兄弟节点插入存放。A method for storing domain name character strings, the method comprising: comparing the domain name character strings to be stored, storing the same prefix part of the domain name character strings to be stored as a common node; Different parts of the stored domain name strings are inserted and stored as child nodes of the common node; for two domain name strings that do not match completely, they are inserted and stored as sibling nodes.

所述的方法,其中:在已存放一个完整字符串的节点后面插入一个空节点。The method, wherein: an empty node is inserted behind the node that has stored a complete character string.

所述的方法,其中:所述的方法包括如下具体步骤:The method, wherein: the method includes the following specific steps:

A、将待存储字符串与节点内字符串进行比较判断,当待存储字符串与当前节点内字符串完全匹配时,返回结果为真;当待存储字符串与当前节点内字符串完全不匹配时,进入步骤B;当待存储字符串与当前节点内字符串部分匹配时,进入步骤C;A. Compare and judge the string to be stored with the string in the node. When the string to be stored completely matches the string in the current node, the return result is true; when the string to be stored does not match the string in the current node at all When , go to step B; when the character string to be stored matches part of the character string in the current node, go to step C;

B、判断当前节点是否有兄弟节点,如果当前节点有兄弟节点,则把所述兄弟节点作为新的当前节点,返回步骤A;如果当前节点没有兄弟节点,则将待插入节点作为当前节点的兄弟节点插入;B. Determine whether the current node has a sibling node, if the current node has a sibling node, then use the sibling node as the new current node, and return to step A; if the current node has no sibling node, then use the node to be inserted as the sibling of the current node node insertion;

C、将不完全匹配的待存储字符串和/或当前节点内字符串分成相同前缀和不匹配部分,保留相同前缀作为当前节点,不匹配部分作为当前节点的子节点插入。C. Divide the incompletely matched character string to be stored and/or the character string in the current node into the same prefix and the unmatched part, retain the same prefix as the current node, and insert the unmatched part as the child node of the current node.

所述的方法,其中:所述步骤C中当待存储字符串完全匹配,且当前节点内字符串部分匹配时,包括如下具体处理:将当前节点内字符串分成第一相同部和第一不同部,保留第一相同部作为当前节点,把第一不同部作为当前节点的子节点插入存放,并在第一不同部节点后面插入一空节点。The method, wherein: in the step C, when the character string to be stored is completely matched and the character string in the current node is partially matched, the following specific processing is included: the character string in the current node is divided into the first identical part and the first different part. Part, keep the first same part as the current node, insert and store the first different part as the child node of the current node, and insert an empty node behind the first different part node.

所述的方法,其中:所述步骤C中当节点内字符串完全匹配,且待存储字符串部分匹配时,其包括如下具体步骤:The method, wherein: in the step C, when the character strings in the nodes are completely matched and the character strings to be stored are partially matched, it includes the following specific steps:

C1、将待存储字符串分成第二相同部和第二不同部;C1. Dividing the character string to be stored into a second identical part and a second different part;

C2、判断当前节点是否有子节点;C2. Determine whether the current node has child nodes;

C3、若有子节点,将所述第二不同部作为新的待存储字符串,子节点作为新的当前节点内字符串进行比较,重复步骤A;否则保留节点内字符串,将所述第二不同部作为当前节点的子节点插入存放,并在第二不同部节点后面插入一空节点。C3. If there is a child node, use the second different part as a new character string to be stored, compare the child node as a new character string in the current node, and repeat step A; otherwise keep the character string in the node, and use the second character string The second different part is inserted and stored as a child node of the current node, and an empty node is inserted behind the second different part node.

所述的方法,其中:所述步骤C中当节点内字符串部分匹配,且待存储字符串部分匹配时,包括如下处理:将待存储字符串分成相同部和第二不同部;将节点内字符串分成相同部和第一不同部;保留相同部作为当前节点,把第一不同部作为当前节点的子节点插入存放,把第二不同部作为其兄弟节点插入存放,或把第二不同部作为当前节点的子节点插入存放,把第一不同部作为其兄弟节点插入存放。The method, wherein: in the step C, when the character string in the node is partially matched and the character string to be stored is partially matched, the following processing is included: the character string to be stored is divided into the same part and the second different part; Divide the string into the same part and the first different part; keep the same part as the current node, insert and store the first different part as a child node of the current node, insert and store the second different part as its sibling node, or insert and store the second different part It is inserted and stored as a child node of the current node, and the first different part is inserted and stored as its sibling node.

一种域名类字符串的查询方法,所述方法包括如下步骤:A method for querying domain name character strings, said method comprising the steps of:

A、将节点内字符串与待查询字符串进行比较,判断节点内字符串是否与待查询字符串完全匹配;A. Compare the character string in the node with the character string to be queried, and determine whether the character string in the node exactly matches the character string to be queried;

B、若节点内字符串完全匹配,取待查询字符串与节点内字符串不匹配部分作为新的待查询字符串,取当前节点内字符串的子节点作为新的当前节点,返回步骤A;B. If the character string in the node matches completely, take the unmatched part of the character string to be queried and the character string in the node as the new character string to be queried, and take the child node of the character string in the current node as the new current node, and return to step A;

C、若节点内字符串与待查询字符串不完全匹配,则取当前节点的兄弟节点作为新的当前节点内字符串,返回步骤A。C. If the character string in the node does not exactly match the character string to be queried, take the sibling node of the current node as the new character string in the current node, and return to step A.

所述的查询方法,其中:所述步骤A包括以下具体步骤:The query method, wherein: the step A includes the following specific steps:

A1、判断当前节点是否有效,若当前节点无效时直接返回真,有效时进入步骤A2;A1. Determine whether the current node is valid, if the current node is invalid, return true directly, and if it is valid, enter step A2;

A2、将待查询字符串与节点内字符串进行比较,判断节点内字符串是否与待查询字符串完全匹配。A2. Compare the character string to be queried with the character string in the node, and determine whether the character string in the node completely matches the character string to be queried.

所述的查询方法,其中:所述步骤C包括如下具体步骤:The query method, wherein: the step C includes the following specific steps:

C1、若节点内字符串与待查询字符串不完全匹配时,取当前节点的兄弟节点作为新的当前节点内字符串;C1. If the string in the node does not exactly match the string to be queried, take the sibling node of the current node as the new string in the current node;

C2、判断新的当前节点是否有效,若新的当前节点无效时直接返回假,有效时返回步骤A2。C2. Judging whether the new current node is valid, if the new current node is invalid, directly return false, and if valid, return to step A2.

一种域名类信息存储查询系统,包括一插入模块,一DNS-Tree存储模块,以及分别与所述插入模块和DNS-Tree存储模块连接的内存管理模块,所述内存管理模块用于比较域名类字符串,并对DNS-Tree存储模块的节点内存进行分配和管理;A domain name class information storage query system, comprising a plug-in module, a DNS-Tree storage module, and a memory management module connected to the plug-in module and the DNS-Tree storage module respectively, the memory management module is used to compare domain name class string, and allocate and manage the node memory of the DNS-Tree storage module;

还包括与所述DNS-Tree存储模块连接的一查询处理模块,所述查询处理模块用于将待查询字符串与DNS-Tree存储模块中的字符串进行比较和判断,得出待查询字符串是否已经在DNS-Tree存储模块中的结果。Also includes a query processing module connected with the DNS-Tree storage module, the query processing module is used to compare and judge the character string to be queried with the character string in the DNS-Tree storage module to obtain the character string to be queried Is the result already in the DNS-Tree storage module.

本发明的有益效果为:本发明在充分考虑和利用了域名类字符串的特点后,采用了如上的域名类字符串存储方法,该方法不仅有效地减少了内存的需求量,而且由于这种存放方式建立的域名服务树(DNSTree)树结构不深,并且树结构平衡,降低了查询检索时的时间复杂度和空间复杂度,因此采用本发明的查询方法可以快速有效地判断某个域名类字符串是否已经被存储,加快了查询速度,提高了检索效率。The beneficial effects of the present invention are: after fully considering and utilizing the characteristics of domain name strings, the present invention adopts the above storage method for domain name strings. This method not only effectively reduces the demand for memory, but also due to this The tree structure of the domain name service tree (DNSTree) established by the storage method is not deep, and the tree structure is balanced, which reduces the time complexity and space complexity when querying and retrieving. Therefore, the query method of the present invention can quickly and effectively judge the domain name category. Whether the string has been stored, speeds up the query speed and improves the retrieval efficiency.

附图说明 Description of drawings

图1为BSD radix树的节点存储结构示意图;Figure 1 is a schematic diagram of the node storage structure of the BSD radix tree;

图2为BSD radix树测试根节点的过程示意图;Figure 2 is a schematic diagram of the process of testing the root node of the BSD radix tree;

图3为BSD radix树重复键处理过程图;Figure 3 is a diagram of the BSD radix tree duplicate key processing process;

图4为BSD radix树回溯过程示意图;Figure 4 is a schematic diagram of the BSD radix tree backtracking process;

图5为前缀树存储方式示意图;Fig. 5 is a schematic diagram of a prefix tree storage method;

图6为本发明DNSTree的树结构示意图;Fig. 6 is the tree structure diagram of DNSTree of the present invention;

图7为本发明DNSTree存储方法流程图;Fig. 7 is a flowchart of the DNSTree storage method of the present invention;

图8为本发明DNSTree查询算法流程图;Fig. 8 is a flowchart of DNSTree query algorithm of the present invention;

图9为本发明域名类存储查询系统示意图。FIG. 9 is a schematic diagram of a storage query system for domain names according to the present invention.

具体实施方式 Detailed ways

下面根据附图和实施例对本发明作进一步详细说明:Below according to accompanying drawing and embodiment the present invention will be described in further detail:

由于域名字符串有它的特殊性,即域名不会太长,而且域名大多是有意义的方便记忆的字符串,所以常常可以找到许多相同的前缀如:www,ftp等等,我们将具有上述特点与域名字符串类似的字符串称为域名类字符串。本发明根据这两个特点对于域名类存储和查询方法进行改进,以便加快域名类字符串的查找。下面以域名字符串为例加以说明,例如给出如下6个域名字符串:Because the domain name string has its particularity, that is, the domain name will not be too long, and most of the domain names are meaningful and easy to remember strings, so often you can find many identical prefixes such as: www, ftp, etc., we will have the above Character strings similar to domain name strings are called domain name-like strings. According to these two characteristics, the present invention improves the domain name class storage and query methods so as to speed up the search of domain name class character strings. The following uses domain name strings as an example to illustrate, for example, the following 6 domain name strings are given:

www.baidu.comwww.baidu.com

www.google.com,www.google.com,

www.goobersite.comwww.goobersite.com

www.google.com.cnwww.google.com.cn

www.yahoo.com.cnwww.yahoo.com.cn

www.yahoo.comwww.yahoo.com

本发明的存储方法是:将待存储的域名字符串之间进行比较,把各个域名字符串的的相同的前缀部分用一个节点存放,不相同的部分作为子节点插入存放,完全不相同的字符串作为兄弟节点插入存放。例如www.baidu.com和www.google.com只有www.是相同的前缀,那么www.就可以作为公共节点,baidu.com和google.com就作为公共节点www.的两个子节点插入。当www.yahoo.com作为待存储字符串与www.baidu.com比较,也只有www.是相同的前缀,同样www.作为公共节点,不同部分yahoo.com作为baidu.com的兄弟节点插入。依此类推,得到如图6所示的DNSTree树型结构。这样的存放方式在存放有大量相同前缀字符串时会节省大量内存。下面是图6的简单表示法:The storage method of the present invention is as follows: compare the domain name character strings to be stored, store the same prefix part of each domain name character string with a node, and insert and store different parts as sub-nodes, completely different characters Strings are inserted and stored as sibling nodes. For example, www.baidu.com and www.google.com only have www. as the same prefix, then www. can be used as a public node, and baidu.com and google.com are inserted as two child nodes of the public node www.. When www.yahoo.com is compared with www.baidu.com as the character string to be stored, only www. has the same prefix. Similarly, www. is used as a public node, and different parts of yahoo.com are inserted as sibling nodes of baidu.com. By analogy, the DNSTree tree structure shown in Figure 6 is obtained. This storage method will save a lot of memory when storing a large number of strings with the same prefix. Here is a simplified representation of Figure 6:

www.www.

     +baidu.com+baidu.com

     +goo+goo

          +bersite.com+bersite.com

        +gle.com+gle.com

                 +.cn+.cn

                 +“”+ ""

   +yahoo.com+yahoo.com

              +.cn+.cn

              +“”+ ""

“”""

图中的“”表示空节点,空节点是用来表示前面的字符串已经是一个完整的字符串了,后面不需要再添加任何的字符了。例如www.google.com和www.google.com.cn这两个字符串插入树中会有曾这样的形状www.google.comThe "" in the figure represents an empty node, which is used to indicate that the previous string is already a complete string, and there is no need to add any characters later. For example, when the two strings www.google.com and www.google.com.cn are inserted into the tree, there will be a shape like www.google.com

               +.cn+.cn

               +“”+ ""

www.google.com后面加一个空节点“”,就表示www.google.com是一个完整的字符串,否则如果没有“”就难以区别是否有www.google.com这个字符串的存在。Adding an empty node "" after www.google.com means that www.google.com is a complete string, otherwise it is difficult to distinguish whether there is a string of www.google.com without "".

以下根据图7所示的DNSTree存储方法流程图具体描述本发明域名字符串存储方法。首先对流程图中的缩略语和定义进行解释:The method for storing domain name strings of the present invention will be specifically described below according to the flow chart of the DNSTree storage method shown in FIG. 7 . First, the abbreviations and definitions in the flowchart are explained:

INN_STR:结点内字符串,表示一个节点中所保存的字符串。INN_STR: In-node character string, representing a character string saved in a node.

PPI_STR:节点内字符串匹配部分,表示两个字符串相比较节点内字符串的相同前缀部分,例如abcde(结点内字符串)和abckef相比较,其PPI_STR为abc;PPI_STR: the string matching part in the node, which means that two strings are compared with the same prefix part of the string in the node, for example, abcde (string in the node) is compared with abckef, and its PPI_STR is abc;

BPI_STR:节点内字符串的不匹配部分,表示两个字符串相比较节点内字符串的不相同部分。例如abcde(结点内字符串)和abckef相比较,其BPI_STR为de;BPI_STR: The non-matching part of the string in the node, indicating that the two strings are compared to the different parts of the string in the node. For example, abcde (string in the node) is compared with abckef, and its BPI_STR is de;

WTI_STR:待插入字符串,表示某个要插入到DNSTree的字符串;WTI_STR: String to be inserted, indicating a string to be inserted into DNSTree;

PPW_STR:待插入字符串的匹配部分,表示两个字符串相比较待插入字符串的相同前缀部分。例如abcde和abckef(待插入字符串)相比较,其PPW_STR为abc;PPW_STR: The matching part of the string to be inserted, indicating that the two strings are compared to the same prefix part of the string to be inserted. For example, abcde is compared with abckef (string to be inserted), and its PPW_STR is abc;

BPW_STR:待插入字符串的不匹配部分,表示两个字符串相比较待插入字串的不相同部分。例如:abcde和abckef(待插入字符串)相比较,其BPW_STR为kef。BPW_STR: The unmatched part of the string to be inserted, which means the different part of the string to be inserted compared with the two strings. For example: abcde is compared with abckef (string to be inserted), and its BPW_STR is kef.

上述完全匹配的意思,表示某个字符串可以是另一字符串的前缀。如:INN_STR完全匹配,表示INN_STR与WTI_STR相比较,INN_STR可以作为WTI_STR的前缀,例如:abcd(INN_STR)和abcdefg(WTI_STR)。再如:WTI_STR完全匹配,表示INN_STR与WTI_STR相比较,WTI_STR可以作为INN_STR的前缀,例如:abcd(WTI_STR)和abcdefg(INN_STR)。The meaning of the above exact match means that a certain string can be the prefix of another string. For example: INN_STR is completely matched, which means that INN_STR is compared with WTI_STR, and INN_STR can be used as a prefix of WTI_STR, for example: abcd(INN_STR) and abcdefg(WTI_STR). Another example: WTI_STR is completely matched, which means that INN_STR is compared with WTI_STR, and WTI_STR can be used as the prefix of INN_STR, for example: abcd(WTI_STR) and abcdefg(INN_STR).

部分匹配的意思,表示两个字符串有一部分相同的前缀但是不是完全匹配,例如:abcke与abkei。其中:INN_STR部分匹配,表示INN_STR与WTI_STR相比较,INN_STR和WTI_STR有相同的前缀,例如:abcde(INN_STR)和abeiek(WTI_STR)有相同的前缀ab,但是INN_STR后面还必须有不同的部分;WTI_STR部分匹配,表示INN_STR与WTI_STR相比较,INN_STR和WTI_STR有相同的前缀。例如:abcde(WTI_STR)和abeiek(INN_STR)有相同的前缀ab,但是WTI_STR后面还必须有不同的部分。Partial match means that two strings have part of the same prefix but not a complete match, for example: abcke and abkei. Among them: INN_STR partly matches, which means that INN_STR is compared with WTI_STR. INN_STR and WTI_STR have the same prefix, for example: abcde (INN_STR) and abeiek (WTI_STR) have the same prefix ab, but there must be different parts after INN_STR; WTI_STR part Match, meaning that INN_STR is compared with WTI_STR, and INN_STR and WTI_STR have the same prefix. For example: abcde(WTI_STR) and abeiek(INN_STR) have the same prefix ab, but there must be different parts after WTI_STR.

完全不匹配,表示两个字符串完全没有相同的前缀。No match at all, meaning that the two strings do not have the same prefix at all.

根据上述定义,本发明对于域名类字符串存储方法的步骤为:According to the above definition, the steps of the method for storing domain name strings in the present invention are as follows:

1、WTI_STR与INN_STR相比较,WTI_STR与INN_STR都完全匹配进入步骤2,完全不匹配进入步骤3,部分匹配进入步骤4;1. Comparing WTI_STR with INN_STR, both WTI_STR and INN_STR completely match and go to step 2, completely mismatch and go to step 3, and partial match go to step 4;

2、返回结果为真;2. The return result is true;

3、判断当前节点有没有兄弟结点,如果有把兄弟结点当作当前节点,进入1。如果没有则作为兄弟节点插入。3. Determine whether the current node has a sibling node. If there is a sibling node as the current node, enter 1. If not, it is inserted as a sibling node.

4.部分匹配情况分以下三个子情况处理:4. Partial matching is divided into the following three sub-cases:

1).WTI_STR完全匹配,INN_STR部分匹配:1).WTI_STR is fully matched, and INN_STR is partially matched:

a.把INN_STR分成PPI_STR和BPI_STR,a. Divide INN_STR into PPI_STR and BPI_STR,

b.将PPI_STR保留,BPI_STR作为当前节点的子节点插入到DNSTree树中,在BPI_STR节点后面再插入一个空节点。b. Reserve PPI_STR, insert BPI_STR into the DNSTree tree as a child node of the current node, and insert an empty node after the BPI_STR node.

2).INN_STR完全匹配,WTI_STR部分匹配2).INN_STR matches completely, and WTI_STR partially matches

a.判断INN_STR节点是否有子节点a. Determine whether the INN_STR node has child nodes

b.如果有子节点,那么BPW_STR作为新的WTI_STR,INN_STR的子节点为新的当前节点,然后转入1。b. If there is a child node, then BPW_STR will be the new WTI_STR, and the child node of INN_STR will be the new current node, and then turn to 1.

c.如果没有子节点,PPI_STR节点保留,BPW_STR作为当前节点的子节点插入,然后再插入的子节点后面在插入一个空节点。c. If there is no child node, the PPI_STR node is reserved, and BPW_STR is inserted as a child node of the current node, and then an empty node is inserted after the inserted child node.

3).INN_STR部分匹配,WTI_STR部分匹配。3). Partial match of .INN_STR and partial match of WTI_STR.

a.把WTI_STR分成PPW_STR和BPW_STR;a. Divide WTI_STR into PPW_STR and BPW_STR;

b.将PPI_STR作为当前节点保留,BPI_STR作为当前节点的子节点插入,BPW_STR作为BPI_STR节点的兄弟节点插入。b. Reserve PPI_STR as the current node, insert BPI_STR as the child node of the current node, and insert BPW_STR as the sibling node of the BPI_STR node.

由此可见,由于将域名中相同的前缀部分用一个公共节点存放,因此根据此方法建立的DNSTree树结构不深,并且树结构平衡,无疑在查找域名过程中会降低查询检索时的时间复杂度和空间复杂度,提高了查询速度;同时这种存储方法有效地减少了内存的需求量。It can be seen that since the same prefix part of the domain name is stored in a common node, the DNSTree tree structure established according to this method is not deep, and the tree structure is balanced, which will undoubtedly reduce the time complexity of query retrieval in the process of finding domain names And space complexity, improve the query speed; at the same time, this storage method effectively reduces the memory requirement.

针对这种DNSTree树结构,其相应的查找过程如图8所示,描述如下:For this DNSTree tree structure, its corresponding search process is shown in Figure 8, described as follows:

相关概念:Related concepts:

INN_STR:节点内字符串,表示节点中所存储的字符串。INN_STR: In-node character string, representing the character string stored in the node.

WTI_STR:带比较字符串,表示某个需要查询是否已经存在的字符串。WTI_STR: with a comparison string, indicating that a certain string needs to be queried whether it already exists.

1.判断当前节点是否有效,无效直接返回真,有效进入步骤2;1. Determine whether the current node is valid, return true directly if invalid, and enter step 2 if valid;

2.判断INN_STR是否与WTC_STR完全匹配,如果匹配进入步骤3,如果不匹配进入步骤4;2. Determine whether INN_STR matches WTC_STR completely, if they match, go to step 3, if not, go to step 4;

3.取WTC_STR的不匹配部分作为新的WTC_STR,取当前节点的子节点作为新的当前节点,返回步骤1;3. Take the unmatched part of WTC_STR as the new WTC_STR, take the child nodes of the current node as the new current node, and return to step 1;

4.取当前节点的兄弟节点作为新的当前节点;4. Take the sibling node of the current node as the new current node;

5.判断新的当前节点是否有效,如果无效返回假,如果有效返回步骤2,判断新的当前节点与WTC_STR是否完全匹配,进入下一循环。5. Determine whether the new current node is valid, if invalid, return false, if valid, return to step 2, determine whether the new current node matches WTC_STR completely, and enter the next cycle.

本发明用于存储查询域名类字符串的系统如图9所示。包括四个主要的模块:插入模块:插入模块主要的功能就是按照DNS-Tree树结构把字符串插入到DNS-Tree树的特定节点上。这个模块主要在DNS-Tree初始化的时候使用,用该部分完成树的初始化。内存管理模块:包括比较器和控制部分,内存管理模块主要管理DNS-Tree中的节点内存怎样的分配和管理。由于域名字符串都不长,所需的内存属于小块内存,所以内存管理模块采用现有技术内存池的方法来管理小块内存。查询处理模块:查询模块的功能是查看待查询字符串是否已经在DNS-Tree中已经保存了。将待查询字符串进行处理,并与DNS-Tree中的字符串进行比较和判断,得出待查询字符串是否已经在DNS-Tree中的结果。DNS-Tree为存储模块,即一内存区。The system for storing query domain name character strings according to the present invention is shown in FIG. 9 . It includes four main modules: Insertion module: The main function of the insertion module is to insert strings into specific nodes of the DNS-Tree tree according to the DNS-Tree tree structure. This module is mainly used when DNS-Tree is initialized, and this part is used to complete the initialization of the tree. Memory management module: including the comparator and the control part, the memory management module mainly manages the allocation and management of node memory in the DNS-Tree. Because the domain name character strings are not long, the required memory belongs to the small block memory, so the memory management module adopts the method of the memory pool in the prior art to manage the small block memory. Query processing module: the function of the query module is to check whether the character string to be queried has been saved in the DNS-Tree. Process the string to be queried, compare and judge with the string in the DNS-Tree, and obtain the result of whether the string to be queried is already in the DNS-Tree. DNS-Tree is a storage module, that is, a memory area.

可以理解的是,对本领域普通技术人员来说,可以根据本发明的技术方案及其发明构思加以等同替换或改变,而所有这些改变或替换都应属于本发明所附的权利要求的保护范围。It can be understood that those skilled in the art can make equivalent replacements or changes according to the technical solutions and inventive concepts of the present invention, and all these changes or replacements should belong to the protection scope of the appended claims of the present invention.

Claims (10)

1, a kind of storage means of domain name kind string, described method comprises: will compare between the domain name kind string to be stored, described domain name kind string same prefix to be stored is partly deposited with a node, as common node; Domain name kind string different piece to be stored is deposited as the child node insertion of this common node; For complete unmatched two domain name kind string, then insert and deposit as the brotgher of node.
2, method according to claim 1 is characterized in that: insert an empty node in the node back of depositing a complete character string.
3, method according to claim 2 is characterized in that: described method comprises following concrete steps:
A, character string to be stored and intranodal character string are compared judgement, when character string was mated fully in character string to be stored and the present node, return results was true; When character string does not match fully in character string to be stored and the present node, enter step B; When character string is partly mated in character string to be stored and the present node, enter step C;
B, judge whether present node has the brotgher of node, if present node has the brotgher of node, then the described brotgher of node as new present node, return steps A; If present node does not have the brotgher of node, then will be inserted into node and insert as the brotgher of node of present node;
Character string is divided into same prefix and the part that do not match in C, the character string to be stored that will not exclusively mate and/or the present node, keeps same prefix as present node, and the part that do not match is inserted as the child node of present node.
4, method according to claim 3, it is characterized in that: work as character string to be stored among the described step C and mate fully, and when character string is partly mated in the present node, comprise following concrete processing: character string in the present node is divided into the different portions with first of first identical portions, keep first identical portions as present node, the first different portions are deposited as the child node insertion of present node, and inserted node in the first different portions node back.
5, method according to claim 3 is characterized in that: mate fully when the intranodal character string among the described step C, and character string to be stored is when partly mating, it comprises following concrete steps:
C1, character string to be stored is divided into the different portions with second of second identical portions;
C2, judge whether present node has child node;
C3, if child node is arranged, as new character string to be stored, child node compares as character string in the new present node, repeating step A with the described second different portions; Otherwise keep the intranodal character string, the described second different portions are inserted as the child node of present node deposit, and inserted node in the second different portions node back.
6, method according to claim 3 is characterized in that: partly mate when the intranodal character string among the described step C, and character string to be stored comprises following processing: character string to be stored is divided into the different portions with second of identical portions when partly mating; The intranodal character string is divided into the different portions with first of identical portions; Keep identical portions as present node, the first different portions are deposited as the child node insertion of present node, the second different portions are deposited as its brotgher of node insertion, or the second different portions are deposited as the child node insertion of present node, the first different portions are inserted as its brotgher of node deposit.
7, a kind of querying method of domain name kind string, described method comprises the steps:
A, intranodal character string and character string to be checked are compared, whether character string mates fully with character string to be checked in the decision node;
B, if the intranodal character string is mated fully, get character string to be checked and intranodal character string and do not match part as new character string to be checked, the child node of getting the interior character string of present node is returned steps A as new present node;
C, if intranodal character string and character string to be checked are not exclusively mated, the brotgher of node of then getting present node is returned steps A as character string in the new present node.
8, querying method according to claim 7 is characterized in that: described steps A comprises following concrete steps:
A1, judge whether present node is effective, true if present node directly returns when invalid, enter steps A 2 in the time of effectively;
A2, character string to be checked and intranodal character string are compared, whether character string mates fully with character string to be checked in the decision node.
9, querying method according to claim 8 is characterized in that: described step C comprises following concrete steps:
C1, if when intranodal character string and character string to be checked are not exclusively mated, the brotgher of node of getting present node is as character string in the new present node;
C2, judge whether new present node is effective,, return steps A 2 in the time of effectively if new present node directly returns vacation when invalid.
10, a kind of domain name kind information stores inquiry system, it is characterized in that: comprise an insert module, one DNS-Tree memory module, and the memory management module that is connected with the DNS-Tree memory module with described insert module respectively, described memory management module is used for the comparison domain name kind string, and the node memory of DNS-Tree memory module is distributed and manages;
Also comprise a query processing module that is connected with described DNS-Tree memory module, whether described query processing module is used for character string with character string to be checked and DNS-Tree memory module and compares and judge, draw character string to be checked result in the DNS-Tree memory module.
CNB2006100603451A 2006-04-13 2006-04-13 Domain name information storage and inquiring method and system Expired - Fee Related CN100511229C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2006100603451A CN100511229C (en) 2006-04-13 2006-04-13 Domain name information storage and inquiring method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2006100603451A CN100511229C (en) 2006-04-13 2006-04-13 Domain name information storage and inquiring method and system

Publications (2)

Publication Number Publication Date
CN101055574A CN101055574A (en) 2007-10-17
CN100511229C true CN100511229C (en) 2009-07-08

Family

ID=38795413

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2006100603451A Expired - Fee Related CN100511229C (en) 2006-04-13 2006-04-13 Domain name information storage and inquiring method and system

Country Status (1)

Country Link
CN (1) CN100511229C (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102306161A (en) * 2011-07-22 2012-01-04 浙江百世技术有限公司 Method for multi-region repeated detection and equipment
CN106776657B (en) * 2015-11-25 2021-05-04 阿里巴巴集团控股有限公司 Domain name retrieval method and device
CN107153647B (en) * 2016-03-02 2021-12-07 北京字节跳动网络技术有限公司 Method, apparatus, system and computer program product for data compression
CN105871726A (en) * 2016-03-21 2016-08-17 哈尔滨工程大学 Mode matching method for dynamically adding tree node and unit based on common prefix
CN107870925B (en) * 2016-09-26 2021-08-20 华为技术有限公司 A string filtering method and related device
CN108984780B (en) * 2018-07-25 2021-10-22 郑州云海信息技术有限公司 Method and apparatus for managing disk data based on data structure supporting duplicate key-value tree
CN110120942B (en) * 2019-04-17 2022-01-25 新华三信息安全技术有限公司 Security policy rule matching method and device, firewall equipment and medium
CN111523783A (en) * 2020-04-14 2020-08-11 西云图科技(北京)有限公司 A data storage method for a water system
CN112380324B (en) * 2020-12-02 2022-02-01 北京微步在线科技有限公司 Method, system and medium for determining domain name and its father domain name

Also Published As

Publication number Publication date
CN101055574A (en) 2007-10-17

Similar Documents

Publication Publication Date Title
CN100511229C (en) Domain name information storage and inquiring method and system
Li et al. Packet forwarding in named data networking requirements and survey of solutions
CN110808910B (en) OpenFlow flow table energy-saving storage framework supporting QoS and method thereof
CN101594319B (en) Entry lookup method and entry lookup device
CN101154228A (en) A segmented pattern matching method and device thereof
CN102110123B (en) Method for establishing inverted index
CN102521334A (en) Data storage and query method based on classification characteristics and balanced binary tree
CN102236706B (en) Fast fuzzy pinyin inquiry method of mass Chinese file names
CN106528647B (en) One kind carrying out the matched method of term based on cedar even numbers group dictionary tree algorithm
CN108134739B (en) Route searching method and device based on index trie
CN102867049B (en) Chinese PINYIN quick word segmentation method based on word search tree
CN108846016A (en) A kind of searching algorithm towards Chinese word segmentation
CN112667636B (en) Index establishing method, device and storage medium
CN111831803A (en) Sensitive information detection method, device and storage medium
CN116414824A (en) Administrative division information identification and standardization processing method, device and storage medium
CN108021569A (en) The structure of AC automatic machines and Chinese multi-model matching method and relevant apparatus
CN115827702B (en) Software white list query method based on bloom filter
US20120054198A1 (en) Table creating and lookup method used by network processor
CN116319555A (en) Route forwarding method for virtual private network
CN108549679B (en) File extension fast matching method and device for URL analysis system
CN106469218A (en) A kind of Boolean expression storage based on bitmap, matching process and system
Wang et al. Statistical optimal hash-based longest prefix match
CN115714752B (en) Packet classification method and device, forwarding chip and electronic equipment
Belazzougui et al. Compressed string dictionary look-up with edit distance one
Belazzougui Faster and space-optimal edit distance “1” dictionary

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20090708

Termination date: 20180413