JP2015018497A - Multidimensional range retrieval device and multidimensional range retrieval method - Google Patents

Multidimensional range retrieval device and multidimensional range retrieval method Download PDF

Info

Publication number
JP2015018497A
JP2015018497A JP2013146656A JP2013146656A JP2015018497A JP 2015018497 A JP2015018497 A JP 2015018497A JP 2013146656 A JP2013146656 A JP 2013146656A JP 2013146656 A JP2013146656 A JP 2013146656A JP 2015018497 A JP2015018497 A JP 2015018497A
Authority
JP
Japan
Prior art keywords
prefix
bit
index key
section
bit string
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP2013146656A
Other languages
Japanese (ja)
Other versions
JP6155920B2 (en
Inventor
祥治 西村
Yoshiharu Nishimura
祥治 西村
Original Assignee
日本電気株式会社
Nec Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電気株式会社, Nec Corp filed Critical 日本電気株式会社
Priority to JP2013146656A priority Critical patent/JP6155920B2/en
Publication of JP2015018497A publication Critical patent/JP2015018497A/en
Application granted granted Critical
Publication of JP6155920B2 publication Critical patent/JP6155920B2/en
Application status is Active legal-status Critical
Anticipated expiration legal-status Critical

Links

Images

Abstract

PROBLEM TO BE SOLVED: To increase the detection speed of a hit page at a time of multidimensional range retrieval.SOLUTION: A multidimensional range retrieval device includes: an acquisition unit for acquiring a starting end index key and a termination end index key representing the minimum point and the maximum point of a specific section on a space-filling curve; an extraction unit for extracting prefix data that can represent bit strings of index keys included in the specific section on the basis of each bit string of the starting end index key and the termination end index key; and a determination unit for determining overlap between a prefix section on the space-filling curve formed by the aggregation of index keys commonly having a prefix of prefix data extracted by the extraction unit and an inquiry section of multidimensional range retrieval.

Description

  The present invention relates to an index scanning technique in multidimensional range search.

  Range search for extracting records within a certain range to the database is one of basic inquiry processes. Many index structures have been proposed to efficiently perform this range search. The B-tree is known as an index structure suitable for a one-dimensional range search. A B-tree divides a record set (data set) by the range of index values, distributes the divided record sets on each page, and uses a tree structure for the start or end of the record range covered by each page. to manage. According to the B-tree, pages in which records within a certain range are stored can be efficiently acquired.

  A U (Universal) B-tree is known as one of multi-dimensional index structures for expanding a B-tree and efficiently executing a multi-dimensional range search. The UB tree is a method of mapping a point in a multidimensional space to one dimension using a Z curve (Z curve) which is a space filling curve, and storing the mapped point in the B tree as an index key. is there. In order to map a point in the multidimensional space to a point in the one dimensional space, a multidimensional rectangular region that is continuous in the multidimensional space is represented as a set of sections on the one dimension. For this reason, the multidimensional range search is expressed as a set of (one-dimensional) range searches on the B-tree.

  However, it is known that the number of one-dimensional range searches increases enormously when a multidimensional range search is simply decomposed into a set of one-dimensional range searches. Therefore, Non-Patent Document 1 below efficiently finds a page (hereinafter referred to as a hit page) that includes a record within a given multidimensional range, instead of decomposing the multidimensional range search. An algorithm is proposed.

  Non-Patent Document 1 below uses the relationship between the bit pattern in the Z-curve bit representation and the fractal shape in the multi-dimensional space of the Z curve in order to efficiently check the overlap of the area covered by the page and the search range. (See FIG. 11 of Non-Patent Document 1 below). The Z curve generates an index key by alternately mixing bit strings of respective attributes. In the method of Non-Patent Document 1 below, when the number of attributes is n, the bit string generated by the Z curve is divided into n bits. An n-bit-long partial bit string can be associated with a hypercube in a multidimensional space, and the position of the partial bit string (the number of delimiters) represents the scale in the multidimensional space. A bit pattern represents a position in a multidimensional space.

  The method of Non-Patent Document 1 below uses this property of the Z curve to decompose the area covered by the page into a set of super rectangular parallelepipeds as follows. The start point of the page section is α, and the end point is β. First, the technique finds the smallest hypercube containing α and β. In this method, α and β are separated by n bits, and it is checked whether or not the divided partial bit strings match in the order from the most significant side to the least significant side. This check is repeated until a partial bit string that does not match is found. Here, it is assumed that the partial bit strings at the j-th delimiter position do not match. Next, the method finds a set of super rectangular parallelepipeds that completely overlaps the area corresponding to the page section in the hyper cube found as described above. Let α [i] and β [i] be a bit string of length n in the i-th segment of α and β. For each of α and β, each bit pattern from the j-th segment to the last segment is observed, and the largest hypercube included in the page section is found in each i-th hypercube.

Tomas Skopal, etc., "A new range query algorithm for Universal B-trees", 2006/9, Figs. 3 and 11, Algorithm 5 and 6, http://www.sciencedirect.com/science/article/pii/ S0306437905000116

  The above-described method has a limitation that it can be applied only to a Z curve which is a simple space filling curve. This limits the range of index design. For example, the bit length of each attribute is different in the above-mentioned method, and the space filling curve that mixes the bit strings of each attribute or the bit string mixing order can be freely performed without performing zero padding for the bit string of the short bit length attribute. A space filling curve with a degree cannot be used. This is because the algorithm is designed based on the correspondence between the partial bit string in which the index key is divided by the same bit length as the number of attributes and the simple fractal shape of the Z curve.

  As a result, the index design according to the specified contents of the range search and the distribution of the data set to be indexed cannot be performed, and the above-described method may reduce the efficiency of the range search process.

  The present invention has been made in view of the above-described problems, and provides a technique for speeding up the detection of hit pages during multidimensional range search.

  Each aspect of the present invention employs the following configurations in order to solve the above-described problems.

  The first aspect relates to a multidimensional range search device. A multidimensional range search apparatus according to a first aspect is a starting point representing a minimum point and a maximum point of a specific section on a space filling curve used for one-dimensionalization of a multidimensional space to which a data set having a plurality of attributes is mapped. An acquisition unit that acquires an index key and an end index key, and prefix data that can represent a bit string of an index key included in the specific section is extracted based on each bit string of the start index key and the end index key acquired by the acquisition unit To determine the overlap between the prefix section on the space-filling curve formed by a set of index keys having a common prefix of the prefix data extracted by the extraction section and the prefix data extracted by the extraction section and the query section of the multidimensional range search Part. The space filling curve used in the first aspect has a one-to-one correspondence between prefix data and a prefix section composed of a set of points having the prefix indicated by the prefix data in common, and the prefix length of the prefix data The prefix section has a negative correlation with the size of the prefix section.

  The second aspect relates to a multidimensional range search method using a space filling curve having the same characteristics as the first aspect. The multidimensional range search method according to the second aspect includes a starting point representing a minimum point and a maximum point of a specific section on a space filling curve used for one-dimensionalization of a multidimensional space to which a data set having a plurality of attributes is mapped. An index key and an end index key are acquired, and based on the acquired start end index key and end index key bit string, prefix data that can represent a bit string of the index key included in the specific section is extracted, and the extracted prefix And determining whether a prefix section on a space filling curve formed by a set of index keys having a common data prefix and an inquiry section of a multidimensional range search are overlapped.

  Another aspect of the present invention may be a program that causes at least one computer to execute the method of the second aspect, or a computer-readable recording medium that records such a program. May be. This recording medium includes a non-transitory tangible medium.

  According to each aspect described above, it is possible to provide a technique for speeding up the detection of hit pages during multidimensional range search.

It is a figure which shows notionally the hardware structural example of the multidimensional range search device (search device) in embodiment of this invention. It is a figure which shows notionally the process structural example of the multidimensional range search device (search device) in 1st Embodiment. It is a figure which shows the example of an index storage part. It is a flowchart which shows the operation example of the multidimensional range search apparatus (search apparatus) in the above-mentioned embodiment. It is a figure which shows the space filling curve and prefix area | region utilized in an Example. It is a figure which shows the space filling curve and index key which are used in an Example. It is a figure which shows notionally the extraction of the prefix data contained in the specific area in an Example. It is a figure which shows notionally the duplication determination of an inquiry area and a prefix area.

  Embodiments of the present invention will be described below. In addition, embodiment mentioned below is an illustration and this invention is not limited to the structure of the following embodiment.

  The multidimensional range search apparatus according to the embodiment of the present invention described below acquires a multidimensional range search query, and determines an overlap between a query section (search space) of this query and a certain section (specific section). . Here, the multidimensional range search is a range search in which a plurality of attributes are specified as conditions, and the number of specified attributes is not particularly limited as long as the number is specified.

〔Device configuration〕
FIG. 1 is a diagram conceptually illustrating a hardware configuration example of a multidimensional range search apparatus (hereinafter simply referred to as a search apparatus) 10 according to an embodiment of the present invention. The search device 10 according to the present embodiment is a so-called computer, and includes, for example, a CPU (Central Processing Unit) 2, a memory 3, an input / output interface (I / F) 4, and the like that are connected to each other via a bus 5. The memory 3 is a RAM (Random Access Memory), a ROM (Read Only Memory), a hard disk, or the like.

  The input / output I / F 4 receives input of user operations such as a communication device that communicates with another computer via a network (not shown), a device that controls access to a portable recording medium, a keyboard, a mouse, and the like. It can be connected to an input device, an output device that provides information to the user, such as a display device or a printer. However, the hardware configuration of the search device 10 is not limited.

[Processing configuration]
FIG. 2 is a diagram conceptually illustrating a processing configuration example of the search device 10 in the first embodiment. The search device 10 according to the first embodiment includes an acquisition unit 11, an extraction unit 12, a determination unit 13, and the like. Each of these processing units is realized, for example, by executing a program stored in the memory 3 by the CPU 2. Further, the program may be installed from a portable recording medium such as a CD (Compact Disc) or a memory card or another computer on the network via the input / output I / F 4 and stored in the memory 3. Good.

  A data set including a plurality of attributes to be searched by the search device 10 is spatially divided and divided by a space-based space division method until the number of data included in each partial space is equal to or less than a threshold value. Data sets are distributed over the data pages associated with each subspace. On the other hand, each data point of the multidimensional space to which the data set is mapped is represented by each multidimensional index key (hereinafter simply referred to as an index key) generated by unifying the multidimensional space with a space filling curve. ).

Such mapping information between the index key and the data page is stored in an index storage unit (not shown).
FIG. 3 is a diagram illustrating an example of the index storage unit. As shown in FIG. 3, the index storage unit stores a plurality of entries each including index data and page information. The index storage unit may be realized by the search device 10 or may be realized by another device.

  The index data stored in the index storage unit has a data format that can specify the position and range of a certain subspace of the multidimensional space, and the longest common prefix and wildcard character of the index key included in the corresponding subspace And in combination. When there is one index key included in the corresponding partial space, the index data indicating the partial space is the index key itself. The page information stored in the index storage unit indicates the location of the data page associated with the partial space indicated by the index data included in the same entry. The data page itself may be held in the search device or may be held in another device.

  The entries stored in the index storage unit are sorted in the order of the names of the index data as shown in FIG. That is, in the index storage unit, the page information is sorted in the order of the name of the index data. The index storage unit may store each entry with a well-known tree structure such as a B-tree.

  In the present embodiment, a well-known fixed curve such as a Z curve, a Hilbert curve, or the like may be used as the space filling curve used for the one-dimensionalization of the multi-dimensional space, or a more generalized curve. May be used. For example, the generalized space filling curve makes the multidimensional space one-dimensional by mixing the bit strings of the respective attributes in an arbitrary order without changing the arrangement order of the bit strings of the respective attributes. The space-filling curve used in the present embodiment has a one-to-one correspondence between prefix data and a prefix section formed by a set of index keys having the prefix of the prefix data in common, and the prefix data prefix length and What is necessary is just to have the characteristic in which the magnitude | size of a prefix area has a negative correlation. The description of the prefix data will be described later.

  The acquisition unit 11 acquires a start index key and an end index key representing the minimum and maximum points of a specific section on the space filling curve used for one-dimensionalization of the multidimensional space. The specific section is set to a section on the space filling curve corresponding to a certain data page, for example. This data page is indicated by page information stored in the index storage unit. In this case, the start index key is obtained by replacing the wildcard character of the index data corresponding to the data page with 0, and is the minimum value of the specific section. The end index key is obtained by replacing the wildcard character of the index data with 1, and is the maximum value of the specific section. However, the specific section is not limited to the data page, and can be set to an arbitrary section.

  The acquisition unit 11 may acquire the start index key and the end index key from a portable recording medium, another computer, or the like via the input / output I / F 4, and the input unit may be used based on an input screen or the like. May be acquired as information input by operating. The method for obtaining the start index key and the end index key is not limited.

  The extraction unit 12 extracts prefix data that can represent a bit string of an index key included in the specific section, based on each bit string of the start index key and the end index key acquired by the acquisition unit 11. The prefix data is a bit string having a fixed bit length and is a bit string formed only from the prefix or a bit string formed from the prefix and at least one wildcard character. A prefix is a partial bit string continuously common from the first bit between index keys included in a section on a space filling curve represented by prefix data. Here, one wildcard character is also treated as 1-bit data. For example, when the start index key is “000011” and the end index key is “000111”, the extraction unit 12 extracts “000011” and “0001 **” as prefix data. In this example, “*” is used as the wildcard character. However, wildcard characters are not limited.

  Thus, the prefix data extracted by the extraction unit 12 corresponds to one partial space indicating at least one index key in the multidimensional space to which the data set is mapped. In this embodiment, the prefix data has a one-to-one correspondence with a prefix section (partial space) formed by a set of index keys having the prefix of the prefix data in common, and the prefix length of the prefix data A space filling curve is used that has a negative correlation with the size of the prefix interval. Here, the negative correlation between the prefix length of the prefix data and the size of the prefix section means that the longer the prefix length, the smaller the prefix section corresponding to the prefix data and the shorter the prefix length. This means that the prefix section corresponding to the prefix data becomes larger.

  For example, the extraction unit 12 can extract the prefix data based on the bit strings of the start index key and the end index key as follows. The extraction unit 12 has a replacement rule including at least one of bit inversion and replacement with a wildcard character for each bit pattern that can be taken with a predetermined bit length that does not depend on the number of attributes of the data set. Using the replacement rule corresponding to the bit pattern of the partial bit string obtained by calculating the common prefix length in the key and dividing each target bit string lower than the common prefix length in the start index key and the end index key by a predetermined bit length, Extract prefix data. Here, a bit string lower than the common prefix length in the start index key and the end index key is expressed as a target bit string.

  The replacement rule possessed by the extraction unit 12 is set so as to adapt to the space filling curve and the predetermined bit length used in the present embodiment. Here, the predetermined bit length is not particularly limited as long as it is 1 bit or more. For example, the predetermined bit length can be set to a length (for example, 64 bits or 128 bits) that can be efficiently processed by a computer. In this way, the replacement rule application process can be made more efficient, and as a result, the prefix data extraction process can be further accelerated.

  The determination unit 13 determines the overlap between the prefix section on the space-filling curve formed by the set of index keys having the prefix of the prefix data extracted by the extraction section 12 and the query section for multidimensional range search. . Here, according to the prefix data, the maximum value and the minimum value of the prefix section can be easily obtained. If all the wildcard characters are replaced with 1, the maximum value of the prefix section can be acquired, and if all the wildcard characters are replaced with 0, the minimum value of the prefix section can be acquired. Further, the information regarding the inquiry section of the multidimensional range search may be held in advance by the determination unit 13, or may be acquired by the acquisition unit 11 in the same manner as the start index key and the end index key. The determination unit 13 can easily determine whether or not both sections overlap, for example, by comparing the maximum value and minimum value of the prefix section with the maximum value and minimum value of the inquiry section. Here, the overlap between the prefix section and the inquiry section means at least a part of overlap between both sections.

  The determination part 13 can perform duplication determination as follows, for example. The determination unit 13 extracts the range of the bit string of the prefix section for each attribute from each index key representing the minimum point and the maximum point of the prefix section. Further, the determination unit 13 extracts the bit string range of the inquiry section for each attribute from each index key representing the minimum and maximum points of the inquiry section. And the determination part 13 determines duplication with a prefix area and an inquiry area by determining the duplication of the range of a bit string of a prefix area and an inquiry area about each attribute, respectively. Here, the overlapping of the bit string range between the prefix section and the inquiry section means at least a part of the overlapping of both sections. The determination unit 13 determines that both sections do not overlap when at least one attribute that does not overlap the bit string range of the prefix section and the bit string range of the query section is detected. Otherwise, that is, all attributes If the bit string ranges in both sections overlap, it can be determined that both sections overlap.

[Operation example]
Hereinafter, the multidimensional range search method according to the embodiment of the present invention will be described with reference to FIG. FIG. 4 is a flowchart showing an operation example of the search device 10 in the above-described embodiment. In the following description, the search device 10 is an execution subject of each process, but the above-described processing units and other devices included in the search device 10 may be an execution subject.

  Here, in the multidimensional range search method according to the present embodiment, a data set composed of a plurality of attributes is divided by a space-based space division method and is distributed and arranged in a plurality of data pages. It is assumed that it exists.

  First, the search device 10 uses a start index key and an end index key representing the minimum and maximum points of a specific section on a space filling curve used for one-dimensionalization of a multidimensional space to which a data set having a plurality of attributes is mapped. Is acquired (S41). Here, the space filling curve used in the multidimensional range search method in the present embodiment is as described above. The specific section is as described above.

  The search device 10 extracts prefix data that can represent the bit string of the index key included in the specific section based on the bit strings of the start index key and the end index key acquired in (S41) (S42). The prefix data is as described above. Also, the prefix data extraction method can be realized by the method described above as the processing of the extraction unit 12.

  Subsequently, the search device 10 includes a prefix section on a space-filling curve formed by a set of index keys having the prefix of the prefix data extracted in (S42), and a query section for multidimensional range search. Duplication is determined (S43). This overlapping determination method can be realized by the method described above as the processing of the determination unit 13.

[Operation and effect of this embodiment]
As described above, in the present embodiment, based on the bit strings of the start index key and the end index key representing the minimum and maximum points of the specific section on the space filling curve, the bit string of the index key included in the specific section is represented. The obtained prefix data is extracted, and the overlap between the prefix section represented by the prefix data and the query section of the multidimensional range search is determined. Based on the determination of the overlap, it can be determined whether or not the specific section overlaps with the inquiry section.

  Further, in the space filling curve used in this embodiment, the prefix data and the prefix section represented by the prefix data correspond one-to-one, and the prefix length of the prefix data and the size of the prefix section have a negative correlation. If it has the characteristic which has, the space filling curve itself will not be restrict | limited. That is, a well-filled rule-filled space-filling curve may be used, or by mixing the bit strings of each attribute in an arbitrary order without changing the arrangement order of the bit strings of each attribute, One-dimensional space filling curves may be used.

  As described above, according to the present embodiment, by using prefix data that is a fixed-length bit string formed from only a prefix or a prefix and at least one wildcard character, an inquiry interval and a specific interval are Duplication can be confirmed efficiently, and as a result, the speed of hit page detection at the time of multidimensional range search can be increased.

  Furthermore, according to the present embodiment, since the space filling curve having the above-described characteristics is used, the restrictions on the index design and the restrictions on the page division method of the data set can be greatly relaxed. That is, in this embodiment, not only a space filling curve in which the order of mixing bit strings such as a Z curve and a Hilbert curve is fixed, but also a generalized space filling curve can be used. Therefore, according to the present embodiment, it is possible to perform an optimal index design according to the specified content of the range search and the distribution of the data set to be indexed. it can.

  Further, in the present embodiment, the prefix data is used by using a replacement rule corresponding to the bit pattern of the partial bit string obtained by dividing each target bit string lower than the common prefix length in the start index key and the end index key by a predetermined bit length. Is extracted. This replacement rule includes at least one of bit inversion and replacement with a wild card character for each bit pattern that can be taken with the predetermined bit length. Thus, according to the present embodiment, prefix data can be efficiently extracted by bit processing on the start index key and the end index key.

  In addition, the predetermined bit length used in the above-described extraction of prefix data can be set to a value that does not depend on the number of attributes of the data set. Therefore, according to the present embodiment, the restriction that a bit string having a bit length equal to the number of attributes must be used as a processing unit is eliminated, and a processing unit that can make computer processing such as 8 bits, 16 bits, and 32 bits efficient can be freely performed. Therefore, the efficiency of the range search process can be improved.

[Extended example]
In the above-described embodiment, the prefix data can be extracted more efficiently as follows.

  The extraction unit 12 sequentially extracts the prefix data by scanning the target bit string of the start index key from the lower order to the upper order and scanning the target bit string of the end index key from the upper order to the lower order. As described above, the target bit string means a bit string lower than the common prefix of the start index key and the end index key. The determination unit 13 determines the overlap between the prefix section corresponding to the prefix data and the inquiry section in the order extracted by the extraction unit 12, and the search unit 14 selects the corresponding pages in the order determined by the determination unit 13 as overlapping. Search for information.

  According to the common prefix of the start index key and the end index key, it is possible to represent a boundary that divides the specific section into two on the space filling curve: the start index key side and the end index key side. The prefix bit is extracted by scanning the target bit string of the start index key from the lower order to the upper order from the start index key to the boundary from the start index key of the specific section toward the boundary. This is equivalent to gradually cutting out the prefix section in the direction. Further, according to the scanning direction of the target bit string, since it is sequentially extracted from prefix data having a long prefix length, it is gradually cut out from a narrow prefix section.

  On the other hand, scanning the target bit string of the terminal index key from the upper side to the lower side to extract prefix data means that the section from the terminal index key of the specific section to the boundary is changed from the boundary side to the terminal index key. This is equivalent to gradually cutting out the prefix section in the direction to go. Further, according to the scanning direction of the target bit string, since it is sequentially extracted from prefix data having a short prefix length, it is gradually cut out from a wide prefix section.

  As described above, this embodiment decomposes not only a fractal shape (hypercube) such as a Z curve but also a hypercube (prefix section) whose lengths of all sides are not the same from a multidimensional space. Therefore, according to the extended example, the prefix data can be extracted in the ascending order on the space filling curve, that is, the index order, and the page information can be extracted in a state of being sorted in the index order. As a result, when the storage positions of the data pages are arranged in the index order, the access to the data pages can be made efficient.

  Examples will be given below to describe the above-described embodiment in more detail. The present invention is not limited in any way by the following examples.

In this embodiment, a space filling curve obtained by generalizing the Z curve is used.
FIG. 5 is a diagram illustrating a space filling curve and a prefix section used in the embodiment. In this embodiment, as shown in FIG. 5, a space-filling curve is used in which the degree of mixing of bit strings for each attribute is given a degree of freedom. One example that can be realized with this space filling curve is the Z curve. In this embodiment, the data set is mapped to a two-dimensional space, and the two-dimensional space is made one-dimensional with the space filling curve. Specifically, the space filling curve includes two attribute bit strings “x1, x2, x3” and “y1, y2, y3” having a length of 3 bits as “y1, x1, y2, y3, x2, x3”. ”And mix. The arrows shown in FIG. 5 indicate that the two-dimensional space is filled with the space filling curve.

  FIG. 5 shows four prefix data and prefix sections corresponding to each prefix data. In the upper left figure, the prefix section corresponding to the prefix data “0 ******” is indicated by diagonal lines (gray background). In the upper right diagram, the prefix section corresponding to the prefix data “00 ***” is indicated by diagonal lines (gray background). In the lower left figure, the prefix section corresponding to the prefix data “000 ***” is indicated by diagonal lines (gray background). In the lower right diagram, the prefix section corresponding to the prefix data “0000 **” is indicated by diagonal lines (gray background). Thus, in the space filling curve of the present embodiment, the prefix data and the prefix section indicated by the prefix data correspond one-to-one. Furthermore, it can be seen that the prefix length (number of 0s) of the prefix data and the size of the prefix section show a negative correlation. That is, the longer the prefix length, the smaller the prefix interval, and the shorter the prefix length, the larger the prefix interval.

  In the present embodiment, “*” is used as a wild card character forming prefix data. Thus, the bit string in which “*” is replaced with 0 indicates the minimum value of the prefix section corresponding to the prefix data, and the bit string in which “*” is replaced with 1 is the maximum value of the prefix section corresponding to the prefix data. Indicates.

  In the present embodiment, a section covered by a certain data page is set as a specific section, and the search apparatus 10 acquires the start index key α and the end index key β of the specific section, and acquires the specific section and the multidimensional range search. The overlap with the inquiry interval is determined. Hereinafter, the i-th bit value from the most significant bit of the start index key α and the end index key β is denoted as α [i] and β [i]. The most significant bit is expressed as the 0th bit.

  The search apparatus 10 according to the present embodiment detects a common prefix between the start index key α and the end index key β and calculates a common prefix length. The common prefix length can be calculated by, for example, the number of 0s consecutive from the most significant (bit operation called number of leading zero) in a bit string obtained by exclusive OR (XOR) of both bit strings. Here, the common prefix length is expressed as k.

  The search device 10 divides the target bit string after the (k + 1) th bit in the start index key α and the end index key β by a predetermined bit length (1 in this embodiment), and uses a replacement rule corresponding to each bit data. Extract prefix data. In this embodiment, the most significant bit is 0th, and the bit string after the bit lower by 2 bits of the common prefix is the target bit string, but the bit string after the bit lower by 1 bit of the common prefix (kth bit) A bit string formed from bits lower than the bit) may be the target bit string, and the replacement rule may include that the kth bit does nothing.

  In this embodiment, the search device 10 (extraction unit 12) changes the reference bit to 1 when the reference bit of each target bit string of the start index key and the end index key is 0, and is lower than the reference bit. The first replacement rule that does not perform bit substitution when the reference bit is 1, and the reference bit is changed to 0 when the reference bit is 1, and A bit lower than the reference bit is changed to a wild card character, and when the reference bit is 0, a second replacement rule is performed in which bit replacement is not performed. Accordingly, the search device 10 (extracting unit 12) applies the first replacement rule to the target bit string of the start index key and applies the second replacement rule to the target bit string of the end index key.

  If the bit length of the start index key and the end index key is n, and the variable m is a positive integer not less than (k + 1) and not more than (n−1), the search device 10 extracts prefix data as follows: To do. When α [m] is 0, the search device 10 replaces α [m] with 1 in the bit string of α, and replaces α [m + 1] and subsequent characters with a wildcard character “*”. Extract one of the prefix data. Further, if β [m] is 1, the search device 10 replaces α [m] with 0 in the bit string of α, and replaces α [m + 1] and subsequent characters with the wild card character “*”. Thus, one of the prefix data is extracted. If the search apparatus 10 scans the target bit string of α one bit at a time from the lower order to the higher order and scans the target bit string of β one bit at a time from the higher order to the lower order, as in the above-described extended example, In the sorted state, prefix data can be extracted sequentially.

  The relationship between each bit data included in the target bit string of the start index key and the specific section can be described as follows. When the bit included in the target bit string is 0, it is guaranteed that the index key represented by the bit string obtained by replacing the bit with 1 exists in the specific section. This is because the fact that the bit included in the target bit string is 0 exists on the side where the starting index key is smaller. Therefore, in the above replacement rule, the bit is replaced from 0 to 1, and the bits lower than the bit are replaced with the wildcard character *. On the other hand, when the bit included in the target bit string is 1, it is guaranteed only that the index key represented by the bit string obtained by replacing the bit with 0 does not exist in the specific section. Therefore, in the replacement rule, the bit is not replaced.

  The relationship between each bit data included in the target bit string of the end index key and the specific section is the reverse of that in the case of the start end index key. That is, when the bit included in the target bit string is 1, it is guaranteed that the index key represented by the bit string obtained by replacing the bit with 0 exists in the specific section. Therefore, in the above replacement rule, the bit is replaced from 1 to 0, and the bits lower than the bit are replaced with the wildcard character *. On the other hand, when the bit included in the target bit string is 0, it is only guaranteed that the index key represented by the bit string obtained by replacing the bit with 1 does not exist in the specific section. Therefore, in the replacement rule, the bit is not replaced.

  The search device 10 determines whether the prefix section corresponding to the prefix data extracted as described above and the inquiry section overlap. Thereby, when it is determined that the prefix section corresponding to at least one prefix data overlaps with the inquiry section, the search device 10 can determine that the section (specific section) covered by a certain data page overlaps with the inquiry section. it can.

Hereinafter, the present embodiment will be described in more detail with reference to FIGS.
FIG. 6 is a diagram showing a space filling curve and index keys used in the present embodiment. The space filling curve shown in FIG. 6 includes two attribute bit strings “a, b, c” and “x, y, z” having a length of 3 bits as “x, a, b, y, c, z”. ”And mix. In FIG. 6, a specific section (section covered by a certain data page) is indicated by a solid line on the space filling curve, and the start index key and the end index key are indicated by circles. The start index key is “000110” and the end index key is “101000”.

FIG. 7 is a diagram conceptually illustrating the extraction of prefix data included in a specific section in the present embodiment. In FIG. 7, specific sections, start index keys, end index keys, and prefix sections are clearly shown in the multidimensional index structure of FIG. The specific section is indicated by diagonal lines (gray background), and each prefix section extracted from the specific section is indicated by a region surrounded by a thick line in the specific section.
FIG. 8 is a diagram conceptually illustrating overlap determination between an inquiry section and a prefix section. In FIG. 8, an inquiry section indicated by a broken line is further added to the contents of FIG.

  The search device 10 acquires the start index key “000110” and the end index key “101000”. Since the common prefix length is 0 (zero), the search device 10 sets the first and subsequent bit strings in the start index key and the end index key as target bit strings “00110” and “01000”. The search device 10 classifies the target bit string “00110” of the start index key α by 1 bit from the lower bit to the upper bit, and sets the target bit string “01000” of the end index key β to 1 from the upper bit to the lower bit. Divide bit by bit. The search device 10 extracts prefix data by applying a replacement rule to each bit to be classified.

  Specifically, the search device 10 extracts prefix data from the target bit string “00110” of the starting index key as follows. Since α [5] is 0, by replacing α [5] with 1, prefix data “000111” is extracted. The starting index key “000110” is also extracted as prefix data. Since α [4] and α [3] are 1, nothing is done, and since α [2] is 0, α [2] is replaced with 1, and α [3] and subsequent characters are wildcard characters. By replacing with “*”, the prefix data “001 ***” is extracted. Furthermore, since α [1] is 0, α [1] is replaced with 1, and after α [2] is replaced with the wildcard character “*”, the prefix data “01 ***” * "Is extracted.

  Further, the search device 10 extracts prefix data from the target bit string “01000” of the terminal index key as follows. Since β [1] is 0, nothing is done, and since β [2] is 1, β [2] is replaced with 0, and β [3] and subsequent characters are replaced with the wildcard character “*”. By the replacement, prefix data “100 ***” is extracted. Since β [3] and thereafter are 0, nothing is done. Finally, the end index key “101000” itself is also extracted as prefix data.

  However, when replacing the least significant bit of the start index key and the end index key, the start index key itself and the end index key itself are also included in the prefix section, so the least significant bit is replaced with the wildcard character “*”. You may make it do. In this way, it is not necessary to add the start index key itself and the end index key itself as prefix data.

  In this way, the search device 10 scans each bit string of the start index key and the end index key, thereby performing prefix data “00011 ***”, “001 ***”, “01 ***”, “100”. *** "and" 101000 "can be extracted. Each prefix section indicated by each extracted prefix data is as shown in FIG.

  The search device 10 determines the overlap between each prefix section indicated by the prefix data extracted in this way and the inquiry section shown in FIG. Since the maximum value “000111” of the prefix data “00011 *” is smaller than the minimum value “001110” of the inquiry interval, the prefix interval does not overlap with the inquiry interval.

  Therefore, the search device 10 determines the overlap between the prefix section indicated by the prefix data “001 ***” and the inquiry section as follows. The search device 10 acquires the minimum value “001000” and the maximum value “001111” from the prefix data “001 ***”, acquires the minimum value “001110” and the maximum value “111001” of the inquiry section, and Each bit string is extracted. From the prefix data, the minimum value bit string “010” and the maximum value bit string “011” are extracted for the attribute “a, b, c”, and the minimum value bit string “000” is related to the attribute “x, y, z”. ”And the maximum value bit string“ 011 ”are extracted. Similarly, the minimum value bit string “011” and the maximum value bit string “110” for the attribute “a, b, c” are extracted from the inquiry section, and the minimum value bit string “110” for the attribute “x, y, z” is extracted. 010 ”and the maximum value bit string“ 101 ”are extracted. Accordingly, the search device 10 extracts “011” from the bit string range “010” of the attribute “a, b, c” regarding the prefix section, and the bit string range “000” of the attribute “x, y, z”. From this, “011” can be extracted. Similarly, the search device 10 extracts “110” from the bit string range “011” of the attribute “a, b, c” for the inquiry section, and the bit string range “010” of the attribute “x, y, z”. “101” can be extracted. In the search device 10, since the attribute “x, y, z” and the attribute “a, b, c” partially overlap the bit string range, the prefix section and the query section of the prefix data “001 ***” overlap. Then it can be determined.

  The duplication is similarly determined for the prefix section of the prefix data “01 ***”. The search device 10 acquires the minimum value “010000” and the maximum value “011111” from the prefix data “01 ***”, acquires the minimum value “001110” and the maximum value “111001” of the inquiry section, and sets each attribute. Are extracted respectively. From the prefix data, the minimum value bit string “100” and the maximum value bit string “111” are extracted for the attribute “a, b, c”, and the minimum value bit string “000” is related to the attribute “x, y, z”. ”And the maximum value bit string“ 011 ”are extracted. Similarly, the minimum value bit string “011” and the maximum value bit string “110” are extracted for the attribute “a, b, c” from the inquiry section, and the minimum value bit string “110” is extracted for the attribute “x, y, z”. The bit string “010” and the maximum value bit string “101” are extracted. Accordingly, the search device 10 extracts “111” from the bit string range “100” of the attribute “a, b, c” regarding the prefix section, and the bit string range “000” of the attribute “x, y, z”. From this, “011” can be extracted. Similarly, the search device 10 extracts “110” from the bit string range “011” of the attribute “a, b, c” for the inquiry section, and the bit string range “010” of the attribute “x, y, z”. “101” can be extracted. The search apparatus 10 determines that the prefix section and the inquiry section of the prefix data “01 ***” are different because the bit ranges of the attributes “x, y, z” and the attributes “a, b, c” partially overlap. It can be determined that they overlap.

  The duplication is similarly determined for the prefix section of the prefix data “100 ***”. The search apparatus 10 acquires the minimum value “100000” and the maximum value “100111” from the prefix data “100 ***”, acquires the minimum value “001110” and the maximum value “111001” of the inquiry section, and Each bit string is extracted. From the prefix data, the minimum value bit string “000” and the maximum value bit string “001” are extracted for the attribute “a, b, c”, and the minimum value bit string “100” is extracted for the attribute “x, y, z”. ”And the maximum value bit string“ 111 ”are extracted. From the inquiry section, the bit string “011” having the minimum value and the bit string “110” having the maximum value are extracted with respect to the attribute “a, b, c”, and the bit string “010” having the minimum value is extracted with respect to the attribute “x, y, z”. ”And the maximum value bit string“ 101 ”are extracted. Accordingly, the search device 10 extracts “001” from the bit string range “000” of the attribute “a, b, c” regarding the prefix section, and the bit string range “100” of the attribute “x, y, z”. “111” can be extracted from Similarly, the search device 10 extracts “110” from the bit string range “011” of the attribute “a, b, c” for the inquiry section, and the bit string range “010” of the attribute “x, y, z”. “101” can be extracted. Here, in the attribute “a, b, c”, since both sections do not overlap, the search device 10 can determine that the prefix section of the prefix data “100 ***” and the inquiry section do not overlap.

  In the above example, a replacement rule that scans one bit at a time is used. Even if a replacement rule that scans a plurality of bits is used, the same prefix data as in the above example can be extracted. For example, a replacement rule that scans every s bits can be used in order to increase the efficiency of processing in a computer. In this case, the replacement rule includes at least one of bit inversion and replacement with a wildcard character so that each of all possible bit patterns of s bits has the same result as the above replacement rule. Should be set. For example, when bit scanning is performed every two bits, replacement rules for all bit patterns “00”, “01”, “10”, and “11” of the reference bit string are set. Specifically, if the reference bit string is “00”, the reference bit string is replaced with “01” and each lower bit of the reference bit string is replaced with “*”, and the reference bit string Is replaced with “1 *” and each bit lower than the reference bit string is replaced with “*”. If the reference bit string is “01”, a pattern is executed in which the reference bit string is replaced with “1 *” and each bit lower than the reference bit string is replaced with “*”. If the reference bit string is “10”, a pattern is executed in which the reference bit string is replaced with “11” and each lower bit of the reference bit string is replaced with “*”. If the reference bit string is “11”, no replacement is performed.

  As described above, according to the present embodiment, the prefix section included in the specific section is efficiently extracted by simple bit operation on each bit string of the start index key and the end index key in the specific section covering a certain data page. Furthermore, the overlap between each prefix data representing each prefix section and the inquiry section can be efficiently determined by a simple bit operation.

  Further, in the above embodiment, the space filling curve obtained by generalizing the Z curve is exemplified. However, as long as the space filling curve has the above-described correspondence between the prefix data and the prefix section, the Hilbert curve or its derivative is used. Even if it is a curve, the same operation and effect can be obtained. In this case, a replacement rule corresponding to the Hilbert curve or its derivative curve may be provided.

  In addition, in the flowchart used by the above-mentioned description, although several process (process) is described in order, the execution order of the process performed by this embodiment is not restrict | limited to the description order. In the present embodiment, the order of the illustrated steps can be changed within a range that does not hinder the contents.

2 CPU
3 Memory 4 Input / output I / F
10 Multidimensional range search device (search device)
DESCRIPTION OF SYMBOLS 11 Acquisition part 12 Extraction part 13 Judgment part 14 Search part

Claims (11)

  1. An acquisition unit for acquiring a start index key and an end index key representing a minimum point and a maximum point of a specific section on a space filling curve used for one-dimensionalization of a multidimensional space to which a data set including a plurality of attributes is mapped;
    An extraction unit that extracts prefix data that can represent a bit string of an index key included in the specific section based on each bit string of a start index key and an end index key acquired by the acquisition unit;
    A determination unit for determining an overlap between a prefix interval on the space-filling curve formed by a set of index keys having the prefix of the prefix data extracted by the extraction unit and a query interval for multidimensional range search;
    With
    The space filling curve has a one-to-one correspondence between the prefix data and the prefix section composed of a set of points having the prefix indicated by the prefix data in common, and the prefix length of the prefix data and the prefix section Has a negative correlation with the size of
    Multidimensional range search device.
  2. The space filling curve does not change the arrangement order of the bit strings of each attribute, and mixes the bit strings of each attribute in an arbitrary order, thereby making the multidimensional space one-dimensional,
    The extraction unit has a replacement rule including at least one of bit inversion and wild card character replacement for each bit pattern that can be taken with a predetermined bit length independent of the number of attributes, and the start index key and the end Calculating a common prefix length in the index key, and corresponding to a bit pattern of a partial bit string obtained by dividing each target bit string lower than the common prefix length in the start index key and the end index key by the predetermined bit length Extracting the prefix data using a replacement rule;
    The multidimensional range search apparatus according to claim 1.
  3. The extraction unit extracts the prefix data by scanning the target bit string of the start index key from lower to higher and scanning the target bit string of the end index key from higher to lower. To
    The multidimensional range search apparatus according to claim 2.
  4. The extraction unit changes the reference bit to 1 when the reference bit of each target bit string of the start index key and the end index key is 0, and converts a bit lower than the reference bit to a wild card character. A first replacement rule that does not perform bit replacement when the reference bit is 1, and when the reference bit is 1, the reference bit is changed to 0, and a bit lower than the reference bit is changed A second replacement rule that does not perform bit replacement when the reference bit is 0 is used instead of the wild card character, the first replacement rule is applied to the target bit string of the start index key, and the end index Applying the second replacement rule to the target bit string of the key;
    The multidimensional range search apparatus according to claim 2 or 3.
  5. The determination unit
    From each index key representing the minimum point and the maximum point of the prefix section, the range of the bit string of the prefix section is extracted for each attribute,
    From each index key representing the minimum point and the maximum point of the inquiry interval, the range of the bit string of the inquiry interval is extracted for each attribute,
    Determining overlap between the prefix section and the inquiry section by determining each bit attribute range overlap between the prefix section and the inquiry section,
    The multidimensional range search apparatus according to claim 1.
  6. Obtaining a start index key and an end index key representing a minimum point and a maximum point of a specific section on a space-filling curve used for one-dimensionalization of a multidimensional space to which a data set having a plurality of attributes is mapped;
    Extracting prefix data that can represent a bit string of an index key included in the specific section based on each bit string of the acquired start index key and end index key,
    Determining an overlap between a prefix interval on the space-filling curve formed by a set of index keys having a common prefix of the extracted prefix data and a query interval for multidimensional range search;
    Including
    The space filling curve has a one-to-one correspondence between the prefix data and the prefix section composed of a set of points having the prefix indicated by the prefix data in common, and the prefix length of the prefix data and the prefix section Has a negative correlation with the size of
    Multidimensional range search method.
  7. The space filling curve does not change the arrangement order of the bit strings of each attribute, and mixes the bit strings of each attribute in an arbitrary order, thereby making the multidimensional space one-dimensional,
    The prefix data extraction calculates a common prefix length in the start index key and the end index key, and is provided for each bit pattern that can be taken with a predetermined bit length independent of the number of attributes. Partial bit string obtained by dividing each target bit string lower than the common prefix length in the start index key and the end index key by a predetermined bit length among a plurality of replacement rules including at least one of replacement with characters The prefix data is extracted using a replacement rule corresponding to the bit pattern of
    The multidimensional range search method according to claim 6.
  8. The extraction of the prefix data is performed by scanning the target bit string of the start index key from lower to higher and scanning the target bit string of the end index key from higher to lower. Extract,
    The multidimensional range search method according to claim 7.
  9. The plurality of previous replacement rules change the reference bit to 1 when the reference bit of each target bit string of the start index key and the end index key is 0, and replace the bits lower than the reference bit with a wild card. A first replacement rule that does not perform bit substitution when the reference bit is 1, and when the reference bit is 1, the reference bit is changed to 0 and the lower order of the reference bit Including a second replacement rule that changes a bit to a wildcard character and does not perform bit replacement if the reference bit is 0;
    In the extraction of the prefix data, the first replacement rule is applied to the target bit string of the start index key, and the second replacement rule is applied to the target bit string of the end index key.
    The multidimensional range search method according to claim 6 or 7.
  10. From each index key representing the minimum point and the maximum point of the prefix section, the range of the bit string of the prefix section is extracted for each attribute,
    From each index key representing the minimum point and the maximum point of the inquiry interval, the range of the bit string of the inquiry interval is extracted for each attribute,
    Determining overlapping of bit string ranges between the prefix section and the inquiry section for each attribute,
    Further including
    The determination of duplication is based on a determination result of duplication of the bit string range, and determines duplication between the prefix section and the inquiry section.
    The multidimensional range search method according to any one of claims 6 to 9.
  11.   A program that causes at least one computer to execute the multidimensional range search method according to any one of claims 6 to 10.
JP2013146656A 2013-07-12 2013-07-12 Multidimensional range search apparatus and multidimensional range search method Active JP6155920B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2013146656A JP6155920B2 (en) 2013-07-12 2013-07-12 Multidimensional range search apparatus and multidimensional range search method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2013146656A JP6155920B2 (en) 2013-07-12 2013-07-12 Multidimensional range search apparatus and multidimensional range search method

Publications (2)

Publication Number Publication Date
JP2015018497A true JP2015018497A (en) 2015-01-29
JP6155920B2 JP6155920B2 (en) 2017-07-05

Family

ID=52439412

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2013146656A Active JP6155920B2 (en) 2013-07-12 2013-07-12 Multidimensional range search apparatus and multidimensional range search method

Country Status (1)

Country Link
JP (1) JP6155920B2 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001209651A (en) * 2000-01-24 2001-08-03 Hitachi Ltd Method and device for retrieving multi-dimensional vector and recording medium having multi-dimensional vector retrieval program recorded thereon
US20030004938A1 (en) * 2001-05-15 2003-01-02 Lawder Jonathan Keir Method of storing and retrieving multi-dimensional data using the hilbert curve

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001209651A (en) * 2000-01-24 2001-08-03 Hitachi Ltd Method and device for retrieving multi-dimensional vector and recording medium having multi-dimensional vector retrieval program recorded thereon
US20030004938A1 (en) * 2001-05-15 2003-01-02 Lawder Jonathan Keir Method of storing and retrieving multi-dimensional data using the hilbert curve

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
SHOJI NISHIMURA、外3名: "MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware Services", MOBILE DATA MANAGEMENT(MDM), JPN6017006571, 9 June 2011 (2011-06-09), pages p.7−16 *
西村 祥治、外1名: "一般化されたZ−Orderを用いた多次元索引の提案", 第5回データ工学と情報マネジメントに関するフォーラム (第11回日本データベース学会年次大会), JPN6017006574, 31 May 2013 (2013-05-31), JP, pages p.1−8 *
西村 祥治: "ビッグデータ処理を支える先進技術", NEC技報, vol. 第65巻,第2号, JPN6017006573, 1 September 2012 (2012-09-01), JP, pages p.69−72 *

Also Published As

Publication number Publication date
JP6155920B2 (en) 2017-07-05

Similar Documents

Publication Publication Date Title
Zhang et al. Keyword search in spatial databases: Towards searching by document
Cao et al. Collective spatial keyword querying
Kumar et al. Time-series bitmaps: a practical visualization tool for working with large time series databases
Agarwal et al. Computing the discrete Fréchet distance in subquadratic time
US9053120B2 (en) Grouping and differentiating files based on content
JP2007538343A (en) Geographic text indexing system and method
Yuan et al. TripleBit: a fast and compact system for large scale RDF data
US7945569B2 (en) Method and apparatus for querying spatial data
Ferragina et al. Lightweight data indexing and compression in external memory
KR20130062889A (en) Method and system for data compression
US7730316B1 (en) Method for document fingerprinting
JP2014534486A (en) Method, system, and computer program for scalable data duplication
US9922102B2 (en) Templates for defining fields in machine data
Chien et al. Geometric Burrows-Wheeler transform: Linking range searching and text indexing
Eldawy SpatialHadoop: towards flexible and scalable spatial processing using mapreduce
US10102253B2 (en) Minimizing index maintenance costs for database storage regions using hybrid zone maps and indices
US9448999B2 (en) Method and device to detect similar documents
JP2004341940A (en) Similar image retrieval device, similar image retrieval method, and similar image retrieval program
JP4114600B2 (en) Variable length character string search device, variable length character string search method and program
US7818303B2 (en) Web graph compression through scalable pattern mining
Hernández et al. Compressed representations for web and social graphs
US10496624B2 (en) Index key generating device, index key generating method, and search method
Lee et al. Efficient spatial query processing for big data
US7853598B2 (en) Compressed storage of documents using inverted indexes
Belazzougui et al. Improved compressed indexes for full-text document retrieval

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20160603

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20170222

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20170228

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20170419

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20170509

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20170522

R150 Certificate of patent or registration of utility model

Ref document number: 6155920

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150