WO2023152965A1

WO2023152965A1 - Data providing device, data providing method, and program

Info

Publication number: WO2023152965A1
Application number: PCT/JP2022/005660
Authority: WO
Inventors: 晋二古庄
Original assignee: 晋二古庄
Priority date: 2022-02-14
Filing date: 2022-02-14
Publication date: 2023-08-17

Abstract

The data providing device according to the embodiment is arranged hierarchically and includes: a request receiving unit configured to receive, from a layer one level above, a data acquisition request specifying the location of a concatenated series, that is, a series obtained by concatenating a plurality of series managed by other data providing devices or database servers existing on a layer one level below; a search unit configured to search for at least a value corresponding to the location from among values contained in the concatenated series and a series number of a series containing the value corresponding to the location from among the plurality of series that are concatenated to obtain the concatenated series; and a response unit configured to transmit a search result returned by the search unit to the request source of the data acquisition request as a response to the data acquisition request.

Description

Data providing device, data providing method and program

The present invention relates to a data providing device, a data providing method and a program.

In recent years, due to the development of various sensor devices and observation devices, it has become possible to obtain a large amount of data representing sensing results and observation results (so-called big data). Therefore, it is required to extract desired data by performing various operations (for example, retrieval, combination, aggregation, etc.) on big data.

In Patent Document 1, after creating virtual tabular data (view) by freely combining tabular data of a plurality of big data, desired data is searched and displayed from the virtual tabular data Techniques are disclosed. This technology realizes the above search and display by creating internal data called SVL (Sorted Value List), INV (Inverted Record Index) and IND (Indirect Record Index) from tabular data. .

WO2019/163610

Here, it is generally assumed that tabular data is distributed on various database servers, and there is a need to freely combine and use these tabular data. However, when tabular data existing on a certain database server is moved to another database server, the user needs to access the database server after the transfer when using the tabular data. Therefore, the tabular data may not be available to the user. On the other hand, this means that the manager of tabular data cannot freely change the arrangement of the tabular data (in other words, the tabular data cannot be arranged flexibly). Moreover, although it is desired to present the data to the user as one piece of tabular data, it may be divided into a plurality of tabular data in the internal system of the data presenter. For example, there are data for each month from January to December internally, but there is a case where it is desired to collectively present data for one year to the user.

An embodiment of the present invention has been made in view of the above points, and aims to hierarchize tabular data and enable flexible arrangement thereof.

In order to achieve the above object, a data providing device according to one embodiment is a data providing device that is hierarchically arranged and includes a plurality of a request receiving unit configured to receive a data acquisition request in which the position of the combined sequence, which is a sequence obtained by combining the sequences of the a search unit configured to search for at least a value corresponding to a position and a sequence number of a sequence including a value corresponding to the position among a plurality of sequences that are sources of the combined sequence; and a response unit configured to transmit a search result retrieved by the search unit as a response to the data acquisition request to the source of the data acquisition request.

By layering tabular data, it becomes possible to arrange them flexibly.

It is a figure which shows an example of a bisection search. FIG. 4 is a diagram showing an example of multi-section search; It is a figure which shows an example of the whole structure of the data provision system which concerns on this embodiment. It is a figure which shows an example of the hardware constitutions of the data provision apparatus which concerns on this embodiment. It is a figure showing an example of functional composition of a data offer device concerning this embodiment. It is a flowchart which shows an example of the data provision process which concerns on this embodiment. 4 is a diagram showing sequences in Example 1. FIG. FIG. 4 is a diagram showing an example of a multiple sequence multi-section search in Example 1; FIG. 10 is a diagram showing key information in Example 1, maximum key information smaller than the key information in each sequence, and the sum of cumulative numbers of their weights; FIG. 10 is a diagram showing the result of sorting the key information and the maximum key information smaller than the key information in each series according to the sum of the cumulative numbers of weights in the first embodiment; FIG. 10 is a diagram showing a search range of a multiple sequence multi-section search when i=1 in Example 1; FIG. 4 is a diagram showing a confidence interval when i=1 in Example 1; FIG. 10 is a diagram showing a search range of a multiple sequence multi-section search when i=3 in Example 1; FIG. 4 is a diagram showing a confidence interval when i=3 in Example 1; FIG. 10 is a diagram showing a search range of a multiple sequence multi-section search when i=5 to 8 in Example 1; FIG. 4 is a diagram showing confidence intervals when i=5 to 8 in Example 1; FIG. 10 is a diagram showing a search range of a multiple sequence multi-section search when i=12 to 13 in Example 1; FIG. 4 is a diagram showing confidence intervals when i=12 to 13 in Example 1; FIG. 10 is a diagram showing a search range of a multiple sequence multi-section search when i=15 to 16 in Example 1; FIG. 4 is a diagram showing confidence intervals when i=15 to 16 in Example 1; FIG. 10 is a diagram showing a search range of a multiple sequence multi-section search when i=18 to 21 in Example 1; FIG. 10 is a diagram showing confidence intervals when i=18 to 21 in Example 1; FIG. 10 is a diagram showing a search range of a multiple sequence multi-section search when i=24 to 27 in Example 1; FIG. 10 is a diagram showing confidence intervals when i=24 to 27 in Example 1; FIG. 10 is a diagram showing D5A data in Example 2; FIG. 10 is a diagram showing an example of a weighted multiple sequence multi-section search in Example 2; FIG. 10 is a diagram showing the result of sorting the key information and the maximum key information smaller than the key information in each sequence according to the sum of the cumulative numbers of weights in Example 2; FIG. 10 is a diagram showing a search range of weighted multiple sequence multi-section search when i=12 to 14 in Example 2; FIG. 10 is a diagram showing confidence intervals when i=12 to 14 in Example 2;

An embodiment of the present invention will be described below. In the following embodiments, it is possible to hierarchize tabular data representing a group of data such as big data, enable them to be arranged flexibly, and provide the data contained in the tabular data to the user. A possible data providing system 1 will be described. Note that tabular data is also called "table" or "table data", and individual data constituting tabular data is also called "record". Each record is composed of one or more items, and such items are also called columns, fields, attributes, and the like.

Here, it is assumed that tabular data is represented by internal data called SVL, ACM (Accumulation Array), INV and NNC (Natural Numbered Column). Hereinafter, the tabular data represented by SVL, ACM and INV will be referred to as "D5A data".

SVL is a list-format data structure whose elements are values that appear in a certain item in tabular data in ascending order and are unique. ACM is a data structure in the form of a list whose elements are values pointing to the positions next to the end position on INV of each element of SVL with respect to the item. Each element of this ACM gives the number of occurrences of values below the element on the SVL corresponding to the element. INV is a data structure in the form of a list whose elements are transposed record numbers. NNC is a data structure in the form of a list whose elements are natural numbers obtained by replacing appearance values in one column with storage positions on the SVL. NNC is necessary when giving a record number and reading a stored value of a column.

Since SVL, ACM, and INV are all known data structures, detailed description thereof will be omitted. For example, SVL and INV are described in Patent Document 1 above. Also, for example, the ACM is similar to the IND described in Patent Document 1 above. The IND described in the above-mentioned Patent Document 1 uses a value indicating the start position on INV of each element of SVL as an element, whereas ACM, as described above, uses the value next to the end position on INV. A value indicating a position is used as an element. In other words, ACM is obtained by deleting the leading element of IND described in Patent Document 1 above.

Also, hereinafter, a combination of a plurality of D5A data (that is, a combination of a plurality of D5A data vertically or horizontally or both by UNION, JOIN, etc.) will also be referred to as "virtual D5A data". Also, when the D5A data is explicitly not virtual D5A data, it is called "real D5A data", and when it is simply written as "D5A data", it means real D5A data or virtual D5A data.

At this time, in the data providing system 1 according to the present embodiment, a function called a D5A hub is hierarchically (1 ), and responds to the data acquisition request from the upper hierarchy with the data acquired from the lower hierarchy to the request source. The D5A hub is a function that implements the following (1) to (3).

(1) Receive a data acquisition request specifying the position of the SVL or INV of a certain item of virtual D5A data from the D5A hub or user terminal located one level higher.

(2) Acquire data such as values corresponding to the position from the D5A hub or database server one level below.

(3) Send the data acquired in (2) above as a response to the requester of the data acquisition request.

At this time, in the above (2), using a search method called weighted multiple series multi-section search, the position of the item in the D5A data of the combination source of the virtual D5A data is specified One data acquisition request is sent to the D5A hub or database server in the hierarchy of , and a value or the like corresponding to that position is obtained as a response. Here, the weighted multiple series multi-section search is a search technique that extends the multi-section search to multiple series and uses the weights given to the values of those series. A series is a list format data structure, and SVL and INV, for example, correspond to the series. Note that a weighted multiple sequence multi-section search in which all weights are "1" may simply be called a "multiple sequence multi-section search".

As a result, even if real D5A data placed on a certain database server is moved to another database server, the user can retrieve desired data (records or It is possible to acquire the value of a certain item contained in the record. Therefore, the administrator of the real D5A data can freely arrange the real D5A data on the database server, and as a result, flexible arrangement of the real D5A data is realized.

In the SVL of the virtual D5A data, the value of the SVL is the D5A data (actual D5A data or virtual D5A data) that is the combination source of the virtual D5A data. shall not have. That is, the SVL of the virtual D5A data is data in which the values appearing in a certain item are used as elements, and the values are listed in ascending order and in the order of the identification number that identifies the D5A data that is the connection source of the virtual D5A data. Shall be a structure.

<Bisection search and multisection search>
Here, an outline of the multisection search that is the premise of the weighted multiple sequence multisection search and the bisection search that is the premise of the multisection search will be described. In the following, it is assumed that the size of the search target is known, and the values are arranged in ascending or descending order (duplicate values may exist).

・Bisection search A bisection search is a combination of the key value and the center position of the search range (or, if the number of elements in the search range is even and there is no center position, the two elements in the center of the search range). This is a search method that narrows down the search range according to the magnitude relationship with the value at the position of one of the elements).

For example, as shown in FIG. 1, a sequence having elements (Alice, Bob, Dolly, Helen, Kate, Louis, Peter, Tom) is searched, and the key value is "Kate". At this time, first, with the entire series as the search range, "Helen" at the center position "3" is compared with the key value "Kate". Since 'Helen' is smaller than 'Kate', the search range is from '4' to '7', and 'Louis' at the center position '5' is compared with the key value 'Kate'. do. Since 'Louis' is greater than 'Kate', then position '4' is the search range, and 'Kate' at position '4' matches the key value, so the key value 'Kate' was found It will be.

In this way, bisection search is a search method for searching for positions from values. Moreover, since the bisection search requires many searches, the number of times of communication with the database server also increases, and it is not necessarily an efficient search method.

・Multi-section search Multi-section search is an extension of bi-section search. The search range is divided into multiple sections, and the search range is narrowed down according to the magnitude relationship between the top element of each section and the key value. It is a search method.

For example, as shown in FIG. 2, a sequence having elements (Alice, Bob, Dolly, Helen, Kate, Louis, Peter, Tom) is searched, and the key value is "Kate". At this time, first, with the entire sequence as the search range, the search range is divided into a plurality of sections (four sections in the example shown in FIG. 2), and then the leading elements of these sections are compared with the key value. , to identify the largest interval less than or equal to the key value. In FIG. 2, a black circle means that the element is included, and a white circle means that the element is not included. Therefore, each section is represented by a left-closed-right-opened half-open section. However, the first section must include the first element of the search range, and the last section must include the last element of the search range.

In the example shown in FIG. 2, the first segment is (Alice, Bob), the second segment is (Dolly, Helen, Kate), the third segment is (Louis, Peter), and the fourth segment is (Tom). Therefore, the largest section below the key value is the second section.

Next, with the second section as the search range, the search range is divided into a plurality of sections (three sections in the example shown in FIG. 2), and the leading elements of these sections are compared with the key value, Identify the largest interval less than or equal to the key value.

In the example shown in FIG. 2, the first section is (Dolly), the second section is (Helen), the third section is (Kate), and the largest section below the key value is the third section and its first element. matches the key value. Therefore, the key value "Kate" is retrieved.

In this way, multi-section search is an efficient search method because the number of searches is less than bi-section search, and as a result, the number of communications with the database server is also reduced. On the other hand, since the multi-section search is a search technique for searching for positions from values, it is not possible to find the value corresponding to that position when the position is given.

The weighted multiple sequence multi-section search described in this embodiment extends the multi-section search to multiple sequences and utilizes the weights given to the values of those sequences. The method is also different. In general, when there are multiple sequences (multiple sequences), even if the total size (that is, the size of the entire multiple sequences) is known, it is not obvious to find the value at the i-th position. The details of the weighted multiple sequence multi-section search will be described later.

<Overall Configuration of Data Providing System 1>
FIG. 3 shows the overall configuration of the data providing system 1 according to this embodiment. As shown in FIG. 3, the data providing system 1 according to the present embodiment includes one or more data providing devices 10 (data providing devices 10-1 to 10-5 in the example shown in FIG. 3), one or more A database server 20 (database servers 20-1 to 20-3 in the example shown in FIG. 3) and one or more user terminals 30 are included. Each data providing device 10, each database server 20, and each user terminal 30 are communicably connected via a communication network such as the Internet, for example.

Here, in the data providing system 1 according to the present embodiment, the data providing devices 10 are hierarchically arranged with the user terminal 30 as the highest layer (0th layer) and each database server 20 as the lowest layer. . For example, in the example shown in FIG. 3, the user terminal 30 is on the 0th layer, the data providing device 10-1 is on the 1st layer, the data providing devices 10-2 to 10-4 are on the 2nd layer, and the database servers 20-1 to 20 -3 is the third layer. The data providing device 10-5 is the third layer, and although not shown, the database server 20 exists in a layer below the data providing device 10-5.

The data providing device 10 is a device that functions as a D5A hub. Each data providing device 10 is considered to have virtual D5A data that is a combination of D5A data existing one level below itself.

For example, the data providing device 10-2 combines the D5A data 1000-1 to 1000-2 arranged in the database server 20-1 and the D5A data 1000-3 arranged in the database server 20-2 to obtain the D5A It is assumed to have data 1200-1. Similarly, for example, the data providing device 10-3 is considered to have D5A data 1200-2 which is a combination of the D5A data 1000-4 to 1000-5 located in the database server 20-3. Similarly, for example, the data providing device 10-1 receives the D5A data 1200-1 held by the data providing device 10-2, the D5A data 1200-2 held by the data providing device 10-3, and the data providing device 10-1. 4 is considered to have D5A data 1200-3 that is a combination of D5A data 1200-3 that 4 has.

However, each data providing device 10 does not need to hold (all of) the virtual D5A data, which is a combination of the D5A data existing one level below itself, in memory. Considered to have virtual D5A data that is a combination of D5A data existing in the hierarchy one level lower, in response to a data acquisition request from the hierarchy one level higher for a certain item of the virtual D5A data, By appropriately making a data acquisition request to another data providing device 10 or database server 20 existing in the hierarchy one level below itself, a value can be returned in response to the data acquisition request from the hierarchy one level above. means

The database server 20 is a server in which the actual D5A data is arranged. For example, in the example shown in FIG. 3, D5A data 1000-1 to 1000-2 are arranged as actual D5A data in the database server 20-1. Similarly, D5A data 1000-3 is arranged in database server 20-2 as actual D5A data. Similarly, D5A data 1000-4 to 1000-5 are arranged in the database server 20-3 as actual D5A data.

In addition, the database server 20 transmits data obtained from its own real D5A data as a response to a data obtaining request from the data providing device 10 or the user terminal 30 one level higher.

The user terminal 30 sends a data acquisition request specifying the position of a certain series (for example, SVL or INV of a certain item of virtual D5A data) to the data providing device 10 or the database server 20 one level lower. is a terminal that sends As the user terminal 30, for example, a PC (personal computer), a smart phone, a tablet terminal, a wearable device, or the like can be used.

Note that the overall configuration of the data providing system 1 shown in FIG. 3 is an example, and is not limited to this, and other configurations may be used.

<Hardware Configuration of Data Providing Device 10>
FIG. 4 shows the hardware configuration of the data providing device 10 according to this embodiment. As shown in FIG. 4, the data providing device 10 according to the present embodiment is implemented with the same hardware configuration as a general computer or computer system, and includes an input device 101, a display device 102, an external I/F 103, and , a communication I/F 104 , a processor 105 and a memory device 106 . Also, each of these pieces of hardware is communicably connected via a bus 108 .

The input device 101 is, for example, a keyboard, mouse, touch panel, various physical buttons, and the like. The display device 102 is, for example, a display, a display panel, or the like. Note that the data providing device 10 does not have to have at least one of the input device 101 and the display device 102 .

The external I/F 103 is an interface with an external device such as the recording medium 103a. The data providing device 10 can perform reading and writing of the recording medium 103a via the external I/F 103. FIG. Examples of the recording medium 103a include flexible disks, CDs (Compact Discs), DVDs (Digital Versatile Disks), SD memory cards (Secure Digital memory cards), USB (Universal Serial Bus) memory cards, and the like.

The communication I/F 104 is an interface for connecting the data providing device 10 to a communication network. The processor 105 is, for example, various arithmetic units such as a CPU (Central Processing Unit) and a GPU (Graphics Processing Unit). The memory device 106 is, for example, various storage devices such as RAM (Random Access Memory), ROM (Read Only Memory), HDD (Hard Disk Drive), SSD (Solid State Drive), and flash memory.

The data providing device 10 according to the present embodiment has the hardware configuration shown in FIG. 4, so that the data providing process described later can be realized. Note that the hardware configuration shown in FIG. 4 is an example, and the data providing device 10 may have, for example, a plurality of processors 105, a plurality of memory devices 106, Various hardware other than the illustrated hardware may be provided.

<Functional Configuration of Data Providing Device 10>
FIG. 5 shows the functional configuration of the data providing device 10 according to this embodiment. As shown in FIG. 5 , the data providing device 10 according to this embodiment has a request receiving section 201 , a searching section 202 and a result responding section 203 . These units are implemented by, for example, processing executed by one or more programs installed in the data providing device 10 by the processor 105 or the like. The data providing device 10 according to this embodiment also has a storage unit 204 . The storage unit 204 can be implemented by a memory device 106 such as a RAM, HDD, SSD, flash memory, or the like.

The request receiving unit 201 receives data acquisition requests from other data providing devices 10 or user terminals 30 one level higher. Here, in this data acquisition request, the sequence position i corresponding to a certain item of the virtual D5A data considered to be possessed by itself is specified.

Searching section 202 uses a weighted multiple sequence multi-section search to determine the value corresponding to position i specified in the data acquisition request, the sequence number of the sequence having that value, and the weight of that value within that sequence. Retrieve the running count, the value's weight within the sequence, and the value's position within the sequence. However, if all weights are "1", no weights need be retrieved.

The result response unit 203 obtains the value corresponding to the position i specified in the data acquisition request, the sequence number of the sequence having that value, the cumulative number of weights of that value within that sequence, and the value within that sequence. The weight and its position in the sequence of values are sent as a response to the data acquisition request. However, if all weights are "1", no weights need be sent.

The storage unit 204 stores various data. For example, the storage unit 204 stores data representing intermediate calculation results and search results of the weighted multiple sequence multi-section search.

<Data provision processing>
Data provision processing according to the present embodiment will be described with reference to FIG. In addition, below, the data provision process which a certain data provision apparatus 10 performs is demonstrated.

The request receiving unit 201 receives a data acquisition request from another data providing device 10 or user terminal 30 one level higher (step S101). Here, in this data acquisition request, the sequence position i corresponding to a certain item of the virtual D5A data considered to be possessed by itself is specified.

Next, the retrieving unit 202 weights a certain series of D5A data held by the database server 20 or virtual D5A data considered to be held by another data providing device 10 one level below itself. By multi-sequence multi-section search, the value corresponding to position i, the sequence number of the sequence having that value, the cumulative number of weights of that value within that sequence, the weight of that value within that sequence, and that value position in the series of (step S102). However, when all the weights are "1" (Example 1 described later), the weights may not be retrieved. Note that the processing of this step will be described in detail in the first and second embodiments that will be described later.

Then, the result response unit 203 transmits the value retrieved in step S102, the series number, the cumulative number of weights, the weight, and the position in the series as a response to the data acquisition request (step S103). However, when all the weights are "1" (Example 1 described later), the weights and positions in the sequence may not be transmitted.

<Example 1>
The weighted multiple sequence multi-section search in Example 1 will be described below. In this embodiment, the sequence to be searched is the SVL, and the position i specified in the data acquisition request is the position i of the sequence (hereinafter also referred to as "combined sequence") in which a plurality of SVLs (sequences) are vertically combined. Assume that there is In addition, in the present embodiment, there are three data providing devices 10 in the hierarchy one level below the data providing device 10, and they are considered to have the SVL (sequence) of the three virtual D5A data shown in FIG. shall be Here, the weight is a value given to each value included in the search target sequence. When both the search target sequence and the position i specified in the data acquisition request are SVL, "1" is used for all weights.

The sizes of the sequences to be searched (in this embodiment, the size of each of sequences #0, #1 and #2 is "10" and the size of the combined sequence vertically combining them is "30") is known. and It is also assumed that which data providing device 10 (or database server 20) has which sequence number sequence is known. It should be noted that since the sequence to be searched is SVL, it is arranged in ascending order (duplicate values may exist).

At this time, searching section 202 performs multiple sequence multi-section search on sequences #0, #1, and #, the value, the sequence number of the sequence containing that value, the cumulative number of weights, and the value within the sequence. and sample the positions of For example, as shown in FIG. 8, each of sequences #0, #1, and #2 is divided into four sections (sections 1 to 4), and the leading elements of these sections are sampled. Here, sampling of an element is performed by sending a data acquisition request designating the position of the element in the series to the data providing device 10 (or database server 20) one level below having the series containing the element. Just send it. As a result, the data providing device 10 (or the database server 20) in the next lower layer returns the value of the element, the series number, and the total number of weights as a response.

The following data are sampled by the multi-sequence multi-section search shown in FIG. Hereinafter, this data will be called "key information". The key information in this embodiment is expressed in the format of (value, sequence number, cumulative number of weights of the value within the sequence, position of the value within the sequence). It should be noted that the position n is expressed in the form of "@n".

(Elza, #0, 1, @0)
(Frank, #0, 4, @3)
(Louis, #0,7,@6)
(Roy, #0, 10, @9)
(Alice, #1, 1, @0)
(Bob, #1, 4, @3)
(Elza, #1, 7, @6)
(Genny, #1, 10, @9)
(Alice, #2, 1, @0)
(Dolly, #2, 4, @3)
(Kate, #2, 7, @6)
(Tom, #2, 10, @9)
Here, hereinafter, (value, sequence number, total number of weights of that value in the sequence) will be referred to as a "comparison key".

Next, in each of series #0, #1, and #2, searching section 202 uses the comparison key to find the maximum data smaller than each piece of key information, and weights the data for each piece of key information. Calculate the sum Σ of the accumulated numbers. At this time, in order to obtain the maximum data that is smaller than each key information, the data in the same series as the key information should be sampled at the position one position below. (value, sequence number) is used to sample the maximum data below the relevant key information by multi-section search.

For example, regarding the key information (Elza, #0, 1, @0), there is no data smaller than the key information in series #0. On the other hand, in series #1, the largest data smaller than the key information is (Dolly, #1, 6, @5), and in series #2, the largest data smaller than the key information is (Dolly, #2 , 4, @3). Therefore, the sum Σ of the accumulated numbers of weights is Σ=0+6+4=10.

Similarly, for example, with respect to key information (Frank, #0, 4, @3), the maximum data smaller than the key information in sequence #0 is (Frank, #0, 3, @2), and in sequence #1 The largest data smaller than the key information is (Elza, #1, 7, @6), and the largest data smaller than the key information in sequence #2 is (Dolly, #2, 4, @3). Therefore, the sum Σ of the accumulated numbers of weights is Σ=3+7+4=14.

Similarly, with respect to other key information, in each of series #0, #1, and #2, the maximum data smaller than the key information is obtained (including the case where the maximum data smaller than the key information does not exist). ), the sum Σ of the accumulated weights is calculated.

From the above, each key information, the maximum data smaller than the key information in each of series #0, #1, and #2, and the sum Σ of the cumulative number of weights of these data are summarized as shown in FIG. become. Note that "-" means that there is no maximum data smaller than the key information and no sampling is performed.

Next, the search unit 202 sorts (key information, the maximum data smaller than the key information in each series, and the sum Σ of the cumulative numbers of the weights of these data) by the value of Σ, and the result is It is held in the storage unit 204 . For example, sorting FIG. 9 in ascending order of Σ results in FIG. 10 . This FIG. 10 (hereinafter, a table like this FIG. 10 will be referred to as a “key information table”) is held in the storage unit 204 .

At this time, Σ represents the position of the value included in the key information corresponding to Σ in the combined series. is the search result. That is, in the example shown in FIG. 10, when i=0, 2, 4, 9, 10, 11, 14, 17, 22, 23, 28, 29, the search unit 202 retrieves the key corresponding to i=Σ Information is the search result.

On the other hand, if there is no Σ such that i=Σ, the search unit 202 obtains search results by recursively performing a multi-sequence multi-section search on smaller intervals. Specific description will be made below according to the value of i.

・When i=1 At this time, the record of Σ=0 at position “0” and the record of Σ=2 at position “1” of the key information table shown in FIG. determine the search range for Specifically, in each of the series #0, #1 and #2, the comparison key of the key information contained in the record at the position "0" is equal to or higher than the comparison key of the key information contained in the record at the position "1". The search range is a half-open section (left-closed-right-open) below the key. As a result, when i=1, the search range shown in FIG. 11 becomes the search range of the multiple sequence multi-section search.

By performing a multiple sequence multi-section search on the search range shown in FIG. 11, (Alice, #1, 1, @0) and (Alice, #1, 2, @1) are obtained as shown in FIG. shall be sampled. At this time, when these sampling results are stably sorted using the value as a key, as shown in FIG. An interval ending with the largest data smaller than the information is obtained. In this example, all elements within this interval have been sampled, so this interval is a confidence interval (that is, an interval whose position in the joint sequence can be determined). Therefore, the search result when i=1 is (Alice, #1, 2, @1). If the interval is not a confidence interval, a recursive multi-sequence multi-section search is performed on the interval.

・When i=3 At this time, a multiple sequence multisection search is performed using the Σ=2 record at position “1” and the Σ=4 record at position “2” in the key information table shown in FIG. determine the search range for Specifically, in each of the series #0, #1 and #2, the comparison key of the key information contained in the record at the position "1" is greater than or equal to the comparison key of the key information contained in the record at the position "2". The search range is a half-open section (left-closed-right-open) below the key. As a result, when i=3, the search range shown in FIG. 13 becomes the search range of the multiple sequence multi-section search.

By performing a multiple sequence multi-section search on the search range shown in FIG. 13, (Alice, #1, 2, @1) and (Bob, #1, 3, @2) as shown in FIG. (Alice, #2, 1, @0) are sampled. At this time, when these sampling results are stably sorted using values as keys, as shown in FIG. An interval ending with the largest data smaller than the information is obtained. In this example, all elements within this interval have been sampled, so this interval is a confidence interval. Therefore, the search result when i=3 is (Bob, #1, 3, @2). If the interval is not a confidence interval, a recursive multi-sequence multi-section search is performed on the interval.

・When i=5 to 8 At this time, the record of Σ=4 at position “2” and the record of Σ=9 at position “3” of the key information table shown in FIG. Determines the search range for section search. Specifically, in each of the series #0, #1, and #2, the comparison key of the key information contained in the record at position "2" is greater than or equal to the comparison key of the key information contained in the record at position "3". The search range is a half-open section (left-closed-right-open) below the key. As a result, when i=5 to 8, the search range shown in FIG. 15 becomes the search range of the multiple sequence multi-section search.

By performing a multi-sequence multi-section search on the search range shown in FIG. 15, (Bob, #1, 3, @2) and (Bob, #1, 4, @3) as shown in FIG. (Dolly, #1, 5, @4), (Dolly, #1, 6, @5), (Alice, #2, 1, @0), (Bob, #2, 2, @1) and (Dolly , #2, 3, @2) are sampled. At this time, when these sampling results are stably sorted using the value as a key, as shown in FIG. An interval ending with the largest data smaller than the information is obtained. In this example, all elements within this interval have been sampled, so this interval is a confidence interval. Therefore, the search result when i=5 is (Bob, #2, 2, @1), the search result when i=6 is (Dolly, #1, 5, @4), and the search result when i=7 is The search result is (Dolly, #1, 6, @5), and the search result when i=8 is (Dolly, #2, 3, @2). If the interval is not a confidence interval, a recursive multi-sequence multi-section search is performed on the interval.

・When i=12 to 13 At this time, the record of Σ=11 at position “5” and the record of Σ=14 at position “6” of the key information table shown in FIG. Determines the search range for section search. Specifically, in each of the series #0, #1 and #2, the comparison key of the key information contained in the record at the position "5" is greater than or equal to the comparison key of the key information contained in the record at the position "6". The search range is a half-open section (left-closed-right-open) below the key. As a result, when i=12 to 13, the search range shown in FIG. 17 becomes the search range of the multiple sequence multisection search.

By performing a multiple series multi-section search on the search range shown in FIG. 17, (Elza, #0, 1, @0) and (Frank, #0, 2, @1) as shown in (Frank, #0,3,@2), (Dolly, #1,6,@5), (Elza, #1,7,@6) and (Dolly, #2,4,@3) are sampled shall have been At this time, when these sampling results are stably sorted using the value as a key, as shown in FIG. An interval ending with the largest data smaller than the information is obtained. In this example, all elements within this interval have been sampled, so this interval is a confidence interval. Therefore, the search result when i=12 is (Frank, #0, 2, @1), and the search result when i=13 is (Frank, #0, 3, @2). If the interval is not a confidence interval, a recursive multi-sequence multi-section search is performed on the interval.

・When i=15 to 16 At this time, using the Σ=14 record at position “6” and the Σ=17 record at position “7” in the key information table shown in FIG. Determines the search range for section search. Specifically, in each of the series #0, #1 and #2, the comparison key of the key information contained in the record at the position "6" is equal to or higher than the comparison key of the key information contained in the record at the position "7". The search range is a half-open section (left-closed-right-open) below the key. As a result, when i=15 to 16, the search range shown in FIG. 19 becomes the search range of the multiple sequence multi-section search.

By performing a multiple series multi-section search on the search range shown in FIG. 19, (Frank, #0, 3, @2) and (Frank, #0, 4, @3) and (Elza, #1,7,@6), (Frank, #1,8,@7), (Frank, #1,9,@8) and (Dolly, #2,4,@3) are sampled shall have been At this time, when these sampling results are stably sorted using the value as a key, as shown in FIG. An interval ending with the largest data smaller than the information is obtained. In this example, all elements within this interval have been sampled, so this interval is a confidence interval. Therefore, the search result when i=15 is (Frank, #1, 8, @7), and the search result when i=16 is (Frank, #1, 9, @8). If the interval is not a confidence interval, a recursive multi-sequence multi-section search is performed on the interval.

・When i=18 to 21 At this time, using the record of Σ=17 at position “8” and the record of Σ=22 at position “9” of the key information table shown in FIG. Determines the search range for section search. Specifically, in each of the series #0, #1 and #2, the comparison key of the key information contained in the record at position "8" is equal to or higher than the comparison key of the key information contained in the record at position "9". The search range is a half-open section (left-closed-right-open) below the key. As a result, when i=18 to 21, the search range shown in FIG. 21 becomes the search range of the multiple sequence multi-section search.

By performing a multiple series multi-section search on the search range shown in FIG. 21, (Frank, #0, 4, @3) and (Kate, #0, 5, @4) and (Kate, #0, 6, @5), (Frank, #1, 8, @7), (Frank, #1, 9, @8), (Genny, #1, 10, @9) and (Dolly , #2, 4, @3), (Helen, #2, 5, @4) and (Helen, #2, 6, @5) are sampled. At this time, when these sampling results are stably sorted using values as keys, as shown in FIG. An interval ending with the largest data smaller than the information is obtained. In this example, all elements within this interval have been sampled, so this interval is a confidence interval. Therefore, the search result when i=18 is (Helen, #2, 5, @4), the search result when i=19 is (Helen, #2, 6, @5), and the search result when i=20 is The search result is (Kate, #0, 5, @4), and the search result when i=21 is (Kate, #0, 6, @5). If the interval is not a confidence interval, a recursive multi-sequence multi-section search is performed on the interval.

・When i=24 to 27 At this time, using the record of Σ=23 at position “9” and the record of Σ=28 at position “10” in the key information table shown in FIG. Determines the search range for section search. Specifically, in each of the series #0, #1 and #2, the comparison key of the key information contained in the record at the position "9" is equal to or higher than the comparison key of the key information contained in the record at the position "10". The search range is a half-open section (left-closed-right-open) below the key. As a result, when i=24 to 27, the search range shown in FIG. 23 becomes the search range of the multiple sequence multi-section search.

By performing a multiple series multi-section search on the search range shown in FIG. 23, (Kate, #0, 6, @5) and (Louis, #0, 7, @6) as shown in (Louis, #0, 8, @7), (Peter, #0, 9, @8), (Genny, #1, 10, @9), (Kate, #2, 7, @6) and (Louis , #2, 8, @7) and (Peter, #2, 9, @8) are sampled. At this time, when these sampling results are stably sorted using the value as a key, as shown in FIG. An interval ending with the largest data smaller than the information is obtained. In this example, all elements within this interval have been sampled, so this interval is a confidence interval. Therefore, the search result when i=24 is (Louis, #0, 8, @7), the search result when i=25 is (Louis, #2, 8, @7), and the search result when i=26 is The search result is (Peter, #2, 9, @8), and the search result for i=27 is (Peter, #2, 9, @8). If the interval is not a confidence interval, a recursive multi-sequence multi-section search is performed on the interval.

<Example 2>
The weighted multiple sequence multi-section search in Example 2 will be described below. In this embodiment, it is assumed that the sequence to be searched is the SVL, and the position i specified in the data acquisition request is the position i of the INV corresponding to the SVL (combined sequence) vertically combining a plurality of SVLs. Also, in this embodiment, two data providing apparatuses 10 exist in the hierarchy one level below the data providing apparatus 10, and are assumed to have two virtual D5A data shown in FIG. However, illustration of the NNC is omitted in FIG. SVL#0 and SVL#1 shown in FIG. 25 are hereinafter referred to as "series #0" and "series #1", respectively.

Here, ACM[-1]=0 is defined, and ACM[j]-ACM[j-1] is used as the weight of the value of position j in SVL. For example, in the example shown in FIG. 25, the weight of "Bob" at position "0" of SVL#0 is "2", and the weight of "Bob" at position "1" is "1". That is, the sum of the weights of the positions j of the SVL is ACM[j].

It is assumed that the sizes of the sequences to be searched (in this embodiment, the size of each of sequences #0 and #1 is "5" and the size of the combined sequence vertically combining them is "10"). It is also assumed that which data providing device 10 (or database server 20) has which sequence number sequence is known. It should be noted that since the sequence to be searched is SVL, it is arranged in ascending order (duplicate values may exist). Also, it should be noted that when the two virtual D5A data shown in FIG. 25 are vertically combined, the elements of INV can be appropriately read. For example, when vertically combining virtual D5A data including SVL#0, ACM#0 and INV#0 in the front and virtual D5A data including SVL#1, ACM#1 and INV#1 in the back, INV#1 The elements can be read as "10", "15", "9", "13", "14", "8", "11", and "12" in order from the top. In the following, it is assumed that the virtual D5A data including SVL#0, ACM#0 and INV#0 are vertically linked and the virtual D5A data including SVL#1, ACM#1 and INV#1 are linked vertically.

At this time, searching section 202 performs a multi-section search on sequences #0 and #1 to find the value, the sequence number of the sequence containing the value, the cumulative number of weights, and the position of the value within the sequence. to sample. For example, as shown in FIG. 26, each of the sequences #0 and #1 is divided into three sections (sections 1 to 3), and the leading elements of these sections are sampled.

The following data are sampled by the weighted multiple series multi-section search shown in FIG. Hereinafter, this data will be called "key information". The key information in this embodiment is represented in the format of (value, sequence number, cumulative number of weights of the value within the sequence, weight of the value, position of the value within the sequence).

(Bob, #0, 2, 2, @0)
(Cathy, #0, 5, 2, @2)
(Dolly, #0, 8, 2, @4)
(Alice, #1, 2, 2, @0)
(Cathy, #1, 5, 2, @2)
(Elly, #1, 8, 1, @4)
Here, hereinafter, (value, sequence number, total number of weights of that value in the sequence) will be referred to as a "comparison key".

Next, in each of series #0 and #1, searching section 202 uses the comparison key to find the maximum data that is smaller than each piece of key information, Calculate the sum Σ. At this time, in order to obtain the maximum data that is smaller than each key information, the data in the same series as the key information should be sampled at the position one position below. (value, sequence number) is used to sample the maximum data below the relevant key information by multi-section search.

For example, regarding the key information (Bob, #0, 2, 2, @0), there is no data smaller than the key information in series #0. On the other hand, in sequence #1, the maximum data smaller than the key information is (Alice, #1, 2, 2, @0). Therefore, the sum Σ of the accumulated numbers of weights is Σ=0+2=2.

Similarly, for example, regarding key information (Cathy, #0, 5, 2, @2), the largest data smaller than the key information in series #0 is (Bob, #0, 3, 1, @1). There is, and in series #1, the largest data smaller than the key information is (Alice, #1, 2, 2, @0). Therefore, the sum Σ of the accumulated numbers of weights is Σ=3+2=5.

Similarly, with respect to other key information, in each of series #0 and #1, the maximum data smaller than the key information is obtained (including the case where there is no maximum data smaller than the key information), and the weight is calculated.

Next, the search unit 202 sorts (key information, the maximum data smaller than the key information in each series, and the sum Σ of the cumulative numbers of the weights of these data) by the value of Σ, and the result is It is held in the storage unit 204 . As a result, a key information table as shown in FIG. 27 is held in the storage unit 204 in this embodiment.

At this time, Σ represents the position of the value included in the key information corresponding to Σ in the INV corresponding to the combined sequence. The search result is the key information corresponding to . That is, in the example shown in FIG. 27, when i=0, 2, 5, 8, 11, 15, the search unit 202 obtains the key information corresponding to i=Σ as the search result. Also, within the range of the magnitude of the weight, it is an INV element corresponding to the same SVL/ACM element, and the INV element can be specified. For example, as will be described later, when i=12 to 14, (Dolly, #0,8,2,@4) with weight "2" corresponds to i=11,12, and similarly weight "2". (Elly, #1,7,2,@3) will correspond to i=13,14.

On the other hand, if there is no Σ such that i=Σ, search section 202 obtains a search result by recursively performing a weighted multiple sequence multi-section search on smaller intervals. Since this is the same as the first embodiment except that the weights are different, the case of i=12 to 14 will be described below as an example.

・When i=12 to 14 At this time, the record of Σ=11 at position “4” and the record of Σ=15 at position “5” of the key information table shown in FIG. 27 are used to perform weighted multiplexing. Determine the search range for sequence multi-section search. Specifically, in each of the series #0 and #1, the comparison key of the key information included in the record at position "4" or more and less than the comparison key of the key information included in the record at position "5" A half-open section (left closed, right open) is set as a search range. As a result, when i=12, the search range shown in FIG. 28 becomes the search range of the weighted multiple sequence multi-section search.

By performing a weighted multiple sequence multi-section search on the search range shown in FIG. 28, (Dolly, #0, 6, 1, @3) and (Dolly, #0, 8, 2, @4), (Cathy, #1, 5, 2, @2) and (Elly, #1, 7, 2, @3) are sampled. At this time, when these sampling results are stably sorted using values as keys, as shown in FIG. An interval ending with the largest data smaller than the information is obtained. In this example, all elements within this interval have been sampled, so this interval is a confidence interval.

At this time, (Dolly, #0, 8, 2, @4) has a weight of "2", so it corresponds to i=11, 12, and (Elly, #1, 7, 2, @3) also has a weight of Since it is "2", it corresponds to i=13,14.

Therefore, the search result when i=12 is (Dolly, #0, 8, 2, @4), and the search result when i=13, 14 is (Elly, #1, 7, 2, @3). Become. If the interval is not a confidence interval, a recursive weighted multiple sequence multi-section search is performed on the interval.

The present invention is not limited to the specifically disclosed embodiments described above, and various modifications, alterations, combinations with known techniques, etc. are possible without departing from the scope of the claims. .

1 data providing system 10 data providing device 20 database server 30 user terminal 101 input device 102 display device 103 external I/F
103a recording medium 104 communication I/F
105 processor 106 memory device 107 bus 201 request receiving unit 202 searching unit 203 result responding unit 204 storage unit

Claims

A data providing device arranged hierarchically,
To receive a data acquisition request designating the position of a combined sequence, which is a sequence combining multiple sequences managed by other data providing devices or database servers existing in the hierarchy one level lower, from the hierarchy one level higher. a configured request receiver;
Searching at least the value corresponding to the position among the values included in the combined sequence and the sequence number of the sequence containing the value corresponding to the position among the plurality of sequences from which the combined sequence is combined. a search unit configured to
a response unit configured to transmit, as a response to the data acquisition request, a search result retrieved by the search unit to the source of the data acquisition request;
A data providing device having
In the series, a weight and a cumulative value obtained by accumulating the weight in ascending or descending order of the value are given to the value,
The search unit is
dividing the search range into one or more sections with the entire series of each of the plurality of series that are the concatenation sources of the combined series as a search range, and sampling the leading value of each section;
A value corresponding to the position specified in the data acquisition request and a value corresponding to the position using the sampled value, the sequence number of the sequence containing the value, and the accumulated value given to the value 2. The data providing device according to claim 1, configured to retrieve at least the sequence number of the sequence containing the value.
The search unit is
if there is a cumulative number that matches the position specified in the data acquisition request among the cumulative values corresponding to the sampled values, at least the sampled values and the sequence number of the sequence containing the values; as the search result, and
If there is no cumulative number that matches the position specified in the data acquisition request among the cumulative values corresponding to the sampled values, the search range narrower than the search range is divided into one or more sections. , sample the first value in each interval, and
A value corresponding to the position specified in the data acquisition request and a value corresponding to the position using the sampled value, the sequence number of the sequence containing the value, and the accumulated value given to the value 3. The data providing device according to claim 2, configured to retrieve at least the sequence number of the sequence containing the value.
if there is no cumulative number matching the position specified in the data acquisition request among the cumulative values corresponding to the sampled values, a maximum cumulative number less than the position specified in the data acquisition request; A search range narrower than the search range is determined using the minimum cumulative number exceeding the position specified in the data acquisition request, and then the search range narrower than the search range is divided into one or more sections. 4. A data provider as claimed in Claim 3 and above, arranged to sample the value at the beginning of each interval.
5. The data providing device according to any one of claims 2 to 4, wherein the sequence is SVL, and the data acquisition request specifies the position of SVL combining a plurality of SVLs.
The data providing device according to claim 5, wherein all of the weights are 1.
5. The data providing device according to any one of claims 2 to 4, wherein said sequence is SVL, and said data acquisition request specifies the position of INV corresponding to SVL combining a plurality of SVLs.
For the j-th value of the SVL obtained by combining the plurality of SVLs, the value of the -1-th element of the ACM corresponding to the SVL is set to 0, and the value of the j-th element of the ACM corresponding to the SVL and j- 8. The data providing device according to claim 7, wherein a difference from the value of the first element is given as said weight.
If the position specified in the data acquisition request is the position of the INV corresponding to the SVL combining the plurality of SVLs, the j-th value of the SVL is the j-1th ACM of the INV elements. 7. The data providing device according to claim 6, which corresponds to the element from the position represented by the value of the element of ACM to the position one position before the position represented by the value of the j-th element of ACM.
A data providing device arranged hierarchically,
Receipt of a request to receive a data acquisition request specifying the position of a combined sequence, which is a sequence combining multiple sequences managed by another data providing device or database server existing in the hierarchy one level lower, from the hierarchy one level higher a procedure;
Searching at least the value corresponding to the position among the values included in the combined sequence and the sequence number of the sequence containing the value corresponding to the position among the plurality of sequences from which the combined sequence is combined. a search procedure to
a response procedure for transmitting a search result retrieved by the search procedure to a requester of the data acquisition request as a response to the data acquisition request;
Data provision method to perform.
In the hierarchically arranged data providing device,
Receipt of a request to receive a data acquisition request specifying the position of a combined sequence, which is a sequence combining multiple sequences managed by another data providing device or database server existing in the hierarchy one level lower, from the hierarchy one level higher a procedure;
Searching at least the value corresponding to the position among the values included in the combined sequence and the sequence number of the sequence containing the value corresponding to the position among the plurality of sequences from which the combined sequence is combined. a search procedure to
a response procedure for transmitting a search result retrieved by the search procedure to a requester of the data acquisition request as a response to the data acquisition request;
program to run.