WO2010134440A1 - データ結合システム及びデータ結合方法 - Google Patents
データ結合システム及びデータ結合方法 Download PDFInfo
- Publication number
- WO2010134440A1 WO2010134440A1 PCT/JP2010/057893 JP2010057893W WO2010134440A1 WO 2010134440 A1 WO2010134440 A1 WO 2010134440A1 JP 2010057893 W JP2010057893 W JP 2010057893W WO 2010134440 A1 WO2010134440 A1 WO 2010134440A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- combination
- value
- destination data
- record
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/40—Data acquisition and logging
Definitions
- the present invention relates to a data combination system and a data combination method.
- a process of combining the data is performed.
- a data processing apparatus that combines content data included in a first table and record data included in a second table when attribute values and item values included in the respective data match.
- bined source data For data included in a certain data group (hereinafter referred to as “combined source data”), data that matches the value of a predetermined item from another data group (hereinafter referred to as “combined data”) is extracted.
- the data combination processing for combining the combination source data and the extracted combination destination data there may be a plurality of combination destination data in which the values of predetermined items match in the combination destination data group.
- the present invention has been made in view of such problems, and a data combining system capable of improving the success rate of data combining while improving the combining accuracy of data to be combined in data combining processing. It is an object of the present invention to provide a data combination method.
- the data combination system of the present invention is a data combination system selected from a combination source data group including a combination source data including identification items and key items and a plurality of combination destination data including identification items and key items.
- a data combination system for combining selected combination destination data that is combination destination data a combination source data storage unit that stores combination source data, a combination destination data storage unit that stores a combination destination data group, and a combination source data
- the value of the identification item included in the one combination source data stored in the storage unit matches the value of the identification item included in the combination destination data stored in the combination destination data storage unit or included in the combination destination data
- the key item value corresponding to the predetermined identification range set based on the value of the identification item to be included and the value of the key item included in the combination source data is included in the combination destination data
- a data determination unit that selects the combination destination data as the selection combination destination data when the first predetermined range set based on the value of the eye is satisfied, and the selection combination destination data selected by the data determination unit Data
- the data combination method of the present invention is selected from a combination source data group including a combination source data including an identification item and a key item and a plurality of combination destination data including the identification item and the key item.
- a data merging method for merging with selective merging destination data that is one merging destination data, wherein the value of the identification item included in one merging source data matches or matches the value of the identification item included in the merging destination data.
- the value of the key item that falls within the predetermined identification range set based on the value of the identification item included in the combination destination data and that is included in the combination source data becomes the value of the key item included in the combination destination data.
- the predetermined identification range is a finite range including the value of the identification item included in the combination destination data
- the first predetermined range is a finite range including the value of the key item of the combination destination data. It is characterized by being.
- the identification items when combining the combination destination data with the combination source data, match or the predetermined identification range set based on the value of the identification item included in the combination destination data If the value of the key item of the combination source data falls within the first predetermined range set based on the value of the key item of the combination destination data, the combination destination data is selected. Since it is combined with the combination source data as the combination destination data, the accuracy of data combination can be improved.
- the identification item is an item in which a value having no concept of a range is stored, the identification item is an item that stores, for example, a user identifier as a value.
- the identification item is an item in which a value having the concept of a range is stored
- the identification item is an item that stores, for example, a network prefix based on an identifier of an IP address as a value.
- the key item is an item that can take a continuous or discrete value, and the concept of a range can be associated with the value.
- the data determination unit includes the value of the identification item included in the combination destination data from the plurality of combination destination data stored in the combination destination data storage unit. Extracts the join destination data that matches the value of the identification item or matches the value of the identification item included in the join source data within the specified identification range set based on the value of the identification item contained in the join destination data
- the extraction combination destination Data is selected as selective combination destination data.
- the key item value included in the join source data is the key item of the extract join destination data.
- the extracted and combined data is combined with the combination source data, so that the accuracy of data combination can be further improved and the key It is possible to reduce the processing load of selection / combination destination data selection based on item values.
- the combination source data and the combination destination data include a plurality of key items
- the data determining means determines that each value of the plurality of key items included in the combination source data is extracted and combined. In the case of corresponding to each first predetermined range set based on the values of a plurality of key items included in the data, the extraction combination destination data is selected as the selection combination destination data.
- each key item of the join source data is set with respect to each key item in the join destination data.
- the selected data is selected as the combined data to be combined with the original data, so that the accuracy of combining the selected combined data and the original data is improved. It becomes possible to improve the success rate of the combination.
- the data determination means includes a first predetermined range in which the value of the key item included in the combination source data is set based on the value of the key item included in the extraction combination destination data. If the value of the key item included in the combination source data corresponds to the second predetermined range set adjacent to the first predetermined range, the extracted combination destination data Is selected as selective combination destination data.
- any of the extraction combination destination data is selected as the selection combination destination data to be combined with the combination source data. It is possible to improve the success rate of data combination. That is, in these configurations, when the value of the key item of the combination source data falls within the first predetermined range related to the value of the key item of the combination destination data, or the value of the key item falls within the first predetermined range.
- the combination destination data is selected as the selection combination destination data to be combined with the combination source data. Therefore, it is possible to prevent the join process associated with the join source data from failing because the join destination data to be selected cannot be selected.
- the combination source data and the combination destination data include a plurality of key items
- the data determination unit is set based on the values of the plurality of key items included in the extracted combination destination data.
- Each value of a plurality of key items included in the combination source data corresponds to each first predetermined range or each second predetermined range set adjacent to the first predetermined range.
- the extracted combination destination data is selected as selective combination destination data.
- each key item of the join source data is set with respect to each key item in the join destination data.
- the combination destination data is selected as the combination data to be combined with the combination source data when it corresponds to the second predetermined range set adjacent to the first predetermined range. It is possible to improve the success rate of data combining while maintaining the combining accuracy between the selected combining destination data and the combining source data.
- the data determination unit has a key item value included in the combination source data within a first predetermined range from the plurality of combination destination data stored in the combination destination data storage unit.
- Applicable join destination data is extracted as extraction join destination data, and the value of the identification item included in the extraction join destination data matches the value of the identification item contained in the join source data, or the identification included in the extraction join destination data
- the extraction combination destination data is selected as the selection combination destination data.
- the accuracy of data combination is improved. This can be further improved.
- the data determination unit is configured such that the value of the identification item included in the extraction combination destination data does not match the value of the identification item included in the combination source data, and the extraction combination destination data If the value of the identification item included in the combination source data does not correspond to the predetermined identification range set based on the value of the included identification item, the second predetermined range adjacent to the first predetermined range Re-extract other join destination data having a value that sets the range as the key item from the join destination data storage means as the extraction join destination data, and the value of the identification item is the join source from the re-extracted extracted join destination data.
- the extraction combination destination data that matches the value of the identification item included in the data or falls within the predetermined identification range set based on the value of the identification item included in the extraction combination destination data is selected as the combination destination data. It is characterized by selecting.
- the value of the identification item of the combination source data matches the value of the identification item of any extraction combination destination data extracted based on the first predetermined range in the key item or is included in the extraction combination destination data. Extraction extracted again in the key item based on the second predetermined range adjacent to the first predetermined range even if it does not correspond to the predetermined identification range set based on the value of the identification item If there is extraction combination destination data corresponding to a predetermined identification range set based on the value of the identification item included in the extracted combination destination data, or the value of the identification item matches the combination source data in the combination destination data Since the extracted combination destination data is selected as selective combination destination data to be combined with the combination source data, the success rate of data combination can be further improved.
- the combination destination data corresponding to the predetermined identification range set based on the identification item value of the combination source data matching or the identification item value included in the extracted combination destination data is stored in the key item. If it exists in the extraction combination destination data extracted based on the first predetermined range, or does not exist in the extraction combination destination data extracted based on the first predetermined range, the second predetermined If it exists in the extracted data that is extracted based on the range, the extracted data is selected as the data to be combined with the data to be combined. Select the data to be combined. It is possible to prevent the joining process relating to the joining source data from failing.
- the key item is an item related to date and time
- the first predetermined range is a predetermined period including the date and time of the value of the key item. According to this configuration, it is possible to combine the combination source data associated with the predetermined period with the combination source data associated with the date and time.
- the identification item when combining the combining destination data with the combining source data, the identification item is identical or the predetermined item set based on the value of the identification item included in the combining destination data is set.
- the combination destination data Is combined with the combination source data as the selective combination destination data, so that the success rate of the data combination can be improved.
- the first predetermined range is a range set based on the value of the key item of the combination destination data, the combination accuracy can be improved. Therefore, in the data combining process, it is possible to improve the data combining success rate while improving the combining accuracy of the combined data.
- FIG. 1 is a block diagram showing a functional configuration of a server 1 according to an embodiment of a data combination system.
- the server 1 is an apparatus that combines join source data including a plurality of key items and selected join destination data selected from a join destination data group having a plurality of join destination data including a plurality of key items.
- a user request acquisition unit 10 a table determination unit 11 (data determination unit), a data reading unit 12, a data combining unit 13 (data combining unit), a data writing unit 14 (data writing unit), a user result output unit 15 and a table.
- a storage unit 16 (combined source data storage means, combined data storage means, combined data storage means) is provided.
- FIG. 2 is a hardware configuration diagram of the server 1.
- the server 1 physically includes a CPU 101, a RAM 102 and a ROM 103 which are main storage devices, an auxiliary storage device 105 such as a hard disk and a flash memory, and an input device 106 such as a keyboard and a mouse which are input devices.
- the computer system includes an output device 107 such as a display and a communication module 104 which is a data transmission / reception device such as a network card.
- Each function shown in FIG. 1 has a communication module 104, an input device 106, and an output device 107 under the control of the CPU 101 by loading predetermined computer software on the hardware such as the CPU 101 and the RAM 102 shown in FIG. This is realized by reading and writing data in the RAM 102 and the auxiliary storage device 105.
- each of the functional units 10 to 16 is provided in the server 1.
- the functional units 10 to 16 are configured to be distributed over a plurality of servers that can communicate with each other via a network.
- the data combination system may be configured by a plurality of servers.
- the table storage unit 16 stores a table A, a table B1, a table B2, and a table C. Further, the table B1 and the table B2 constitute a table B group.
- the table A stored in the table storage unit 16 constitutes a join source data storage unit in the present invention, and the record a included in the table A constitutes a join source data in the present invention.
- the tables B1 and B2 stored in the table storage unit 16 constitute a join destination data storage means in the present invention, and a plurality of records b stored in the table B group constitute a join destination data group in the present invention.
- the record b included in the table B1 and the table B2 constitutes the join destination data in the present invention.
- the table C stored in the table storage unit 16 constitutes composite data storage means for storing the composite data in the present invention as a record.
- the table A is a table that stores the location information of the mobile terminal acquired by the base station that accommodates the mobile terminal and the exchange, and includes “user ID”, “date / time”, and “location” as items.
- the item “user ID” is an identifier of the user of the mobile terminal
- the item “date and time” is the date when the record is acquired
- the item “position” is information regarding the location of the mobile terminal.
- the table A stores records a1, a2, and a3 in the example shown in FIG.
- the table B1 is a table for storing attribute information of the subscriber of the mobile terminal at the end of “July” in the provider of the communication service of the mobile terminal.
- the table B1 has “user ID”, “sex”, and “birth date” as items.
- the item “user ID” is an identifier of the user of the mobile terminal
- the item “gender” is the gender of the user
- the item “birth date” is the date of birth of the user.
- the table B1 has an item “date and time” related to attribute information acquisition, and it can be understood that the value of the item “date and time” in all records b11 and b12 is “July”.
- the table B2 is a table for storing the attribute information of the contractor of the mobile terminal at the end of “August” in the communication service provider of the mobile terminal.
- Table B2 has the same items as table B1. Further, the table B2 has an item “date and time” at the time of attribute information acquisition, and the value of the item “date and time” in all records b21 and b22 can be regarded as “August”.
- the table C is a table that stores a record c generated by joining one of the records b stored in the tables B1 and B2 to the record a of the table A.
- the table C includes, as items, “user ID”, “date / time”, “position”, “gender”, “birth date”, and “joined table”.
- the item “joined table” indicates a table from which the joined record b is acquired, and has values such as “b7 (table B1)” and “b8 (table B2)”.
- the user request acquisition unit 10 is a part that acquires a data combination processing request from a user.
- the processing request includes various specification items related to data combination. For example, “table A and table B group are combined and output to table C”, “table A record from table B group. , Select and combine records that match the user ID items and are extracted by a predetermined algorithm based on items related to date and time ”,“ predetermined algorithm type ”, and“ successfully combined records in table C ” "Output" is included.
- the table determination unit 11 is a part that selects the record b to be combined with the record a stored in the table A from the table B group. Details of processing performed by the table determination unit 11 will be described later.
- the table determination unit 11 sends the record a acquired from the table A and the record b selected from the table B group to the data combining unit 13.
- the data reading unit 12 is a part that acquires a record by referring to the table A, the table B1, and the table B2 stored in the table storage unit 16. The data reading unit 12 sends the acquired record to the table determination unit.
- the data combining unit 13 is a part that acquires the record a and the record b from the table determination unit 11, combines the record b with the record a, and generates a record c.
- the data combining unit 13 sends the record c to the data writing unit 14.
- the data writing unit 14 is a part that acquires the record c from the data combining unit 13 and writes the acquired record c to the table C. Further, when the writing of all the records is completed, the data writing unit 14 sends a notification to that effect to the user result output unit 15.
- the user result output unit 15 is a part that obtains a notification from the data writing unit 14 that the writing of the record has been completed and displays the notification.
- FIG. 5 is a flowchart showing the processing contents of data combination performed in the server 1.
- the user request acquisition unit 10 acquires a data combination processing request from a user (S1).
- the processing request includes, for example, “record b extracted from the table B group with the item“ user ID ”corresponding to record a in table A and extracted by a predetermined algorithm based on the item related to“ date and time ”. "Select and combine" is included.
- the item “user ID” constitutes an identification item in the present invention
- the item “date and time” constitutes a key item in the present invention.
- the identification item may be an item that stores a value that does not have a concept of a range, or an item that stores a value that has a concept of a range.
- the identification item is an item that stores a user ID that is a user identifier as a value.
- the identification item is an item that stores a network prefix by an identifier of an IP address as a value.
- the key item is an item that can take a continuous or discrete value, and the concept of a range can be associated with the value.
- the data reading unit 12 acquires the table A from the table storage unit 16 (S2).
- the table determination part 11 acquires the record a from the acquired table A (S3).
- the record a1 is first acquired.
- the table determination unit 11 selects one of the tables B from the table B group using a predetermined algorithm using the value of the item “date and time” of the record a as a key (S4).
- the predetermined algorithm will be described.
- First algorithm n end-of-month attribute information (bn) (record b) is combined with the position information (record a) from the 1st to the 31st of the acquired “date and time”.
- the range of the 1st to 31st of nth month is set based on the value (n month) of the item “date and time” included in the record b, and the “date and time” value of the record a is set in this range. Is included, the record a and the record b are combined. Further, in the second algorithm, a range from n 16th to (n + 1) 15th is set based on the value (n month) of the item “date and time” included in the record b. When the value “date and time” is included, the record a and the record b are combined.
- the period “n month 1st to 31st day” related to the date and time in the first algorithm and the period “n month 16th to (n + 1) month 15th” related to the date and time in the second algorithm are the first in the present invention. Configure a predetermined range.
- the record a1 is selected in step S3, and the value of the item “date and time” of the record a1 is “2008/8/17”.
- the table B2 of b8) is selected (S4).
- the table determination unit 11 tries to acquire the record b having the value “111” of the item “user ID” from the table B2 (S5). . And the table determination part 11 determines whether the record b was acquired (S6). If the record b can be acquired, the processing procedure proceeds to step S10. If the record b cannot be acquired, the processing procedure proceeds to step S7.
- the processing procedure proceeds to step S7.
- the table determination unit 11 determines the value of the item “user ID” of the record a and the record When the value of “User ID” of b matches, the record b is acquired.
- the records a and b when an item having a concept of a range different from the “user ID”, for example, “IP address”, is used as the identification item of the present invention, the records a and b When a part of the “IP address” (for example, the upper part) matches, the table determination unit 11 can acquire the record b.
- the table determination unit 11 determines whether the range of the IP address value is set by the IP address group in which the upper part of the “IP address” of the record b matches, and the IP address of the record a falls within the range.
- a predetermined range can be set based on the value of the item in record b.
- the predetermined range set here constitutes a “predetermined identification range” in the present invention.
- the table determination unit 11 performs the determination process for acquiring the record b using the “user ID” that is an item having no concept of a range. The determination process may be performed using an item having a concept of a range.
- step S7 the table determination unit 11 determines whether the table B of (n-1) month-end attribute information (b (n-1)) or (n + 1) month-end attribute information ( The table B of (b (n + 1)) is selected via the data reading unit 12 (S7).
- the period “(n ⁇ 1) month 1st to 31st” set based on (n ⁇ 1) month-end attribute information (b (n ⁇ 1)), or (n + 1) month-end attribute information (b ( The period “(n + 1) month 1st to 31st” set based on (n + 1)) is set as the second predetermined range in the present invention.
- the table determination unit 11 selects (n ⁇ 1) month-end attribute information (b (n ⁇ 1)). This is an event when the user ID “111” has terminated the mobile terminal at the end of n, and the record “b” of the user ID “111” does not exist in the end-of-n attribute information (bn). is there.
- the period “(n ⁇ 1) month 1st to 31st” associated with the (n ⁇ 1) month end attribute information (b (n ⁇ 1)) is the first predetermined range “n month in the present invention.
- a second predetermined range set adjacent to “1st to 31st” is configured.
- the table determination unit 11 (n ⁇ 1) When the attribute information (b (n ⁇ 1)) is selected and the value of the item “date and time” of the record a is “(n + 1) month 1 to (n + 1) month 15”, the table determination unit 11 Select (n + 1) month-end attribute information (b (n + 1)).
- the period “(n ⁇ 1) month 16 to n 15” associated with (n ⁇ 1) month-end attribute information (b (n ⁇ 1)) and (n + 1) month-end attribute information (b (n + 1) )) Is associated with the period “(n + 1) month 16 to (n + 2) month 15” adjacent to the first predetermined range “n month 16 to (n + 1) month 15” in the present invention.
- a set second predetermined range is configured.
- the table determination unit 11 selects the table B1 of the end-July attribute information (b7) via the data reading unit 12 (S7).
- the table determination unit 11 tries to acquire the record b whose value of the item “user ID” is “111” from the table B selected in step S7 (S8). And the table determination part 11 determines whether the record b was acquired (S9). If the record b can be acquired, the processing procedure proceeds to step S10. If the record b cannot be acquired, the processing procedure proceeds to step S11.
- the record B1 includes the record b11 whose value of “user ID” is “111” (see FIG. 3B), so the processing procedure proceeds to step S10.
- step S10 the table determination unit 11 sends the record a and the record b acquired in step S5 or step S8 to the data combining unit 13. Then, the data combining unit 13 combines the record a and the record b acquired from the table determination unit to generate a record c.
- the data combining unit 13 acquires only the record a and does not perform the combining process (S11). In the present embodiment, the processing procedure proceeds to step S10, and the data combining unit 13 acquires the record a1 and the record b11 and combines these records.
- the data combining unit 13 sends the generated record c to the data writing unit 14, and the data writing unit 14 acquires the sent record c and writes the record c to the table C (S12).
- the data writing unit 14 writes the record c1 to the table C (see FIG. 4). It should be noted that only the record that has undergone the join process may be written out to the table C, and the unjoined record may be written out for the record that has not been joined.
- step S13 it is determined whether or not all the records “a” in the table A have been acquired. If it is determined that the records have been acquired, the processing procedure ends. The process returns to step S3 (S13).
- step S4 the processing contents when the record a2 is acquired in step S3 (see FIG. 3A) and the first algorithm is used in step S4 will be briefly described. Since the value of the item “date and time” of the record a2 is “2008/8/12”, the table B2 of “August end attribute information (b8)” is selected in step S4. Since the value of the item “user ID” of the record a2 is “222” and the record b21 whose item “user ID” is “222” exists in the table B2, the record a2 and the record b21 Are combined and the record c2 is written to the table C (see FIG. 4).
- step S4 the processing contents when the record a3 is acquired in step S3 (see FIG. 3A) and the second algorithm is used in step S4 will be briefly described. Since the value of the item “date and time” of the record a3 is “2008/8/10”, the table B1 of “end of July attribute information (b7)” is selected as the first predetermined range in step S4. Since the value of the item “user ID” of the record a3 is “333” and the record “b” having the item “user ID” “333” does not exist in the table B1, in step S7, “the end of August attribute information ( b2) "table B2 is selected as the second predetermined range. Since the record b22 having the item “user ID” “333” exists in the table B2, the record a3 and the record b22 are combined in step S10, and the record c3 is written to the table C (FIG. 4).
- FIG. 6 is a flowchart showing the processing contents of the data combining method performed in the server 1.
- the processing content in steps S21 to S23 is the same as the processing in steps S1 to S3 in FIG.
- the table determination unit 11 records the records having the same value as the user ID value of the acquired record a from all the tables belonging to the table B group (here, the tables B1 and B2) via the data reading unit 12. Attempt to acquire b group (S24). For example, when the record a1 is acquired in step S23, the record b11 whose “user ID” value is “111” is acquired.
- the table determination unit 11 determines whether or not the record b group is acquired in step S24 (S25). If the record b group can be acquired, the processing procedure proceeds to step S27. If no record b group can be acquired, the processing procedure proceeds to step S26. When the processing procedure proceeds to step S26, the joining process for the acquired record a is not performed. For example, when the record b11 is acquired in step S24, the processing procedure proceeds to step S27.
- the table determination unit 11 selects a record b that matches the first predetermined range from the record b group using a predetermined algorithm using the value of “date and time” of the record a as a key (S27). For example, when the record b11 is acquired in step S24 and the first algorithm is used, the value “2008/8/17” of the item “date and time” included in the record a is the item “date and time” of the record b11. The record b11 is not selected because it does not match the first predetermined range “July 1st to 31st” set based on the value “July”. Also in this embodiment, the item “date and time” constitutes a key item in the present invention, and the item “user ID” constitutes an “identification item” in the present invention.
- step S28 the table determination unit 11 determines whether or not the record b is selected in step S27. If the record b is selected, the processing procedure proceeds to step S31. If the record b is not selected, the processing procedure proceeds to step S29. For example, if the record b11 is not selected in step S27, the processing procedure proceeds to step S29.
- step S29 the table determination unit 11 selects a second predetermined adjacent range from a first predetermined range set by a predetermined algorithm according to the value of the item “date and time” of the record a as a range to be extracted. Set the range.
- the first predetermined range is “n month 1st to 31st”
- the second predetermined range is “(n + 1) month 1 day”. ⁇ 31 days ". Since the user with the user ID “111” has canceled the mobile terminal at the end of n, the record b with the user ID “111” is included in the end of n attribute information (bn). This is the case when it does not exist.
- the first predetermined range is “n month 16 to (n + 1) month 15”
- the value of the item “date and time” of record a is “(n + 1) month”.
- the second predetermined range is set to “(n + 1) month 16 to (n + 2) month 15”. This setting is made when the user with the user ID “111” has contracted the mobile terminal at the end of n and canceled by the end of (n + 1).
- the value of the item “date and time” of the record a is “n month 1 to n month 15”
- the second predetermined range is “(n ⁇ 1) month 16 to n month 15”. Set to. This setting is made when the user with the user ID “111” has not contracted the mobile terminal at the end of (n ⁇ 1) and has contracted by the end of n.
- the table determination unit 11 sets the second predetermined range to “August 1st to 31st” based on the first predetermined range “July 1st to 31st”.
- the table determination unit 11 selects the record b that matches the predetermined algorithm from the group of records b acquired in step S24 (S30).
- the table determination unit 11 Record b11 is selected as the record.
- step S31 the table determination unit 11 sends the record a and the record b selected in step 27 or step S30 to the data combining unit 13. Then, the data combining unit 13 combines the record a and the record b acquired from the table determination unit to generate a record c (S31). On the other hand, when the processing procedure proceeds to step S26, the data combining unit 13 acquires only the record a and does not perform the combining process (S26). For example, when the record b11 is selected in step S30, the data combining unit 13 acquires the record a1 and the record b11 and combines these records.
- the data combining unit 13 sends the generated record c to the data writing unit 14, and the data writing unit 14 acquires the sent record c and writes the record c to the table C (S32).
- the data writing unit 14 writes the record c1 into the table C (see FIG. 4).
- step S33 The processing content performed in step S33 is the same as step S13 in FIG. Note that the processing performed in steps S23 to S33 may be performed repeatedly as shown in FIG. 6, or performed in parallel for all records a stored in table A. Also good. When the processes are performed in parallel, each process can be allocated to a plurality of server apparatuses that can communicate via a network.
- the record “b” is selected using the value of the item “date and time” as the key item, but the other items included in the record “a” and the record “b” are the key items.
- the record b may be selected by performing the processing of steps S27 to S30 for each item.
- the processes of steps S27 to S30 are repeated n times for each item.
- the values of the plurality of items included in the record a correspond to the respective first predetermined ranges set based on the values of the plurality of items included in the record b group acquired in step S24.
- the record b is selected as a record to be combined with the record a.
- each value of the item included in the record a does not correspond to each first predetermined range set based on the values of the plurality of items included in the record b
- the record b is selected as a record to be combined with the record a.
- the record b is selected as a record to be combined with the record a.
- the item “date and time” and the plurality of items constitute a plurality of key items in the present invention.
- the operational effects of the data combining system and the data combining method described above will be described.
- the record b when the record b is combined with the record a, in addition to the user IDs being matched, the record b is set based on the value of the item “date and time” of the record b.
- the table determination unit 11 determines that the value of the item “date and time” of the record a falls within the first predetermined range, the record b is combined with the record a by the data combining unit 13, so the data combining success rate Can be improved.
- the first predetermined range is set by the table determination unit 11 based on the value of the item “date and time” included in the record b, it is possible to improve the coupling accuracy.
- the record b is selected by the table determination unit 11 as a record to be combined with the record a, so that the success rate of data combination can be improved.
- each item of the record a is within the first or second predetermined range set for each item in the record b.
- the table determination unit 11 selects the record b as data to be combined with the record a. Therefore, the data combining success rate is improved while improving the combining accuracy between the record b and the record a to be combined. It becomes possible.
- the present invention makes it possible to improve the success rate of data combination while improving the accuracy of combining data to be combined in the process of data combination.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
(第1実施形態)
図1は、データ結合システムの一実施形態に係るサーバ1の機能的構成を示すブロック図である。サーバ1は、複数のキー項目を含む結合元データと、複数のキー項目を含む結合先データを複数有する結合先データ群から選択した選択結合先データとを結合する装置であって、機能的には、ユーザ要求取得部10、テーブル判定部11(データ判定手段)、データ読み取り部12、データ結合部13(データ結合手段)、データ書き出し部14(データ書き出し手段)、ユーザ結果出力部15及びテーブル記憶部16(結合元データ記憶手段、結合先データ記憶手段、合成データ記憶手段)を備える。
第1のアルゴリズム:取得された「日時」がn月1日~31日の位置情報(レコードa)に対して、n月末属性情報(bn)(レコードb)を結合させる。
第2のアルゴリズム:取得された「日時」がn月16日~(n+1)月15日の位置情報(レコードa)に対して、n月末属性情報(bn)(レコードb)を結合させる。
次に、図6を用いて、第2の実施形態におけるサーバ1の動作について説明する。図6は、サーバ1において行われるデータ結合方法の処理内容を示すフローチャートである。
また、図6を用いて説明した例では、項目「日時」の値をキー項目として用いてレコードbの選択を行っているが、レコードa及びレコードbに含まれるその他の複数の項目をキー項目として用いて、ステップS27~S30の処理を項目毎に実施することによりレコードbの選択を行うこととしてもよい。
Claims (9)
- 識別項目及びキー項目を含む結合元データと、識別項目及びキー項目を含む結合先データを複数有する結合先データ群から選択した一の前記結合先データである選択結合先データとを結合するデータ結合システムであって、
前記結合元データを記憶する結合元データ記憶手段と、
前記結合先データ群を記憶する結合先データ記憶手段と、
前記結合元データ記憶手段に記憶されている一の結合元データに含まれる識別項目の値が、前記結合先データ記憶手段に記憶されている結合先データに含まれる識別項目の値と一致または当該結合先データに含まれる識別項目の値に基づいて設定された所定の識別範囲に該当し、且つ当該結合元データに含まれるキー項目の値が、当該結合先データに含まれるキー項目の値に基づいて設定された第1の所定の範囲に該当する場合に、当該結合先データを前記選択結合先データとして選択するデータ判定手段と、
前記データ判定手段により選択された前記選択結合先データを前記結合元データに結合し、合成データを生成するデータ結合手段と、
前記合成データを記憶する合成データ記憶手段と、
前記データ結合手段により生成された前記合成データを、前記合成データ記憶手段に記憶させるデータ書き出し手段とを備え、
前記所定の識別範囲は、前記結合先データに含まれる識別項目の値を含む有限の範囲であり、
前記第1の所定の範囲は、前記結合先データに含まれるキー項目の値を含む有限の範囲である
ことを特徴とするデータ結合システム。 - 前記データ判定手段は、
前記結合先データ記憶手段に記憶されている複数の前記結合先データから、前記結合先データに含まれる識別項目の値が、前記結合元データに含まれる識別項目の値と一致、または当該結合先データに含まれる識別項目の値に基づいて設定された所定の識別範囲に前記結合元データに含まれる識別項目の値が該当する結合先データを抽出結合先データとして抽出し、
前記結合元データに含まれるキー項目の値が、前記抽出結合先データに含まれるキー項目の値に基づいて設定された前記第1の所定の範囲に該当する場合に、当該抽出結合先データを前記選択結合先データとして選択する
ことを特徴とする請求項1に記載のデータ結合システム。 - 前記結合元データ及び結合先データは、複数のキー項目を含み、
前記データ判定手段は、
前記結合元データに含まれる複数のキー項目のそれぞれの値が、前記抽出結合先データに含まれる複数のキー項目の値に基づいて設定されたそれぞれの前記第1の所定の範囲に該当する場合に、当該抽出結合先データを前記選択結合先データとして選択する
ことを特徴とする請求項2に記載のデータ結合システム。 - 前記データ判定手段は、
前記結合元データに含まれるキー項目の値が、前記抽出結合先データに含まれるキー項目の値に基づいて設定された前記第1の所定の範囲に該当しなかった場合には、前記第1の所定の範囲に隣接して設定された第2の所定の範囲に前記結合元データに含まれるキー項目の値が該当する場合に、当該抽出結合先データを前記選択結合先データとして選択する
ことを特徴とする請求項2に記載のデータ結合システム。 - 前記結合元データ及び結合先データは、複数のキー項目を含み、
前記データ判定手段は、
前記抽出結合先データに含まれる複数のキー項目の値に基づいて設定されたそれぞれの前記第1の所定の範囲、または該第1の所定の範囲に隣接して設定されたそれぞれの第2の所定の範囲に、前記結合元データに含まれる複数のキー項目のそれぞれの値が該当する場合に、当該抽出結合先データを前記選択結合先データとして選択する
ことを特徴とする請求項4に記載のデータ結合システム。 - 前記データ判定手段は、
前記結合先データ記憶手段に記憶されている複数の前記結合先データから、前記結合元データに含まれるキー項目の値が前記第1の所定の範囲に該当する結合先データを抽出結合先データとして抽出し、
前記抽出結合先データに含まれる識別項目の値が、前記結合元データに含まれる識別項目の値と一致、または当該抽出結合先データに含まれる識別項目の値に基づいて設定された所定の識別範囲に前記結合元データに含まれる識別項目の値が該当する場合に、当該抽出結合先データを前記選択結合先データとして選択する
ことを特徴とする請求項1に記載のデータ結合システム。 - 前記データ判定手段は、
前記抽出結合先データに含まれる識別項目の値が、前記結合元データに含まれる識別項目の値と一致せず、且つ当該抽出結合先データに含まれる識別項目の値に基づいて設定された所定の識別範囲に前記結合元データに含まれる識別項目の値が該当しなかった場合には、前記第1の所定の範囲に隣接する第2の所定の範囲を設定するような値を当該キー項目として有する他の結合先データを前記結合先データ記憶手段から前記抽出結合先データとして再抽出し、再抽出された抽出結合先データから識別項目の値が前記結合元データに含まれる識別項目の値と一致または当該抽出結合先データに含まれる識別項目の値に基づいて設定された所定の識別範囲に該当する当該抽出結合先データを選択結合先データとして選択する
ことを特徴とする請求項6に記載のデータ結合システム。 - 前記キー項目は、日時に関する項目であり、前記第1の所定の範囲は、前記キー項目の値の日時を含む所定の期間であることを特徴とする請求項1~7のいずれか1項に記載のデータ結合システム。
- 識別項目及びキー項目を含む結合元データと、識別項目及びキー項目を含む結合先データを複数有する結合先データ群から選択した一の前記結合先データである選択結合先データとを結合するデータ結合方法であって、
一の結合元データに含まれる識別項目の値が、前記結合先データに含まれる識別項目の値と一致または当該結合先データに含まれる識別項目の値に基づいて設定された所定の識別範囲に該当し、且つ当該結合元データに含まれるキー項目の値が、当該結合先データに含まれるキー項目の値に基づいて設定された第1の所定の範囲に該当する場合に、当該結合先データを前記選択結合先データとして選択するデータ判定ステップと、
前記データ判定ステップにおいて選択された前記選択結合先データを前記結合元データに結合し、合成データを生成するデータ結合ステップと、
前記データ結合手段により生成された前記合成データを、前記合成データを記憶させるための合成データ記憶手段に記憶させるデータ書き出しステップとを有し、
前記所定の識別範囲は、前記結合先データに含まれる識別項目の値を含む有限の範囲であり、
前記第1の所定の範囲は、前記結合先データのキー項目の値を含む有限の範囲である
ことを特徴とするデータ結合方法。
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/321,383 US8386452B2 (en) | 2009-05-19 | 2010-05-10 | Data combination system and data combination method |
CN201080020656XA CN102422285B (zh) | 2009-05-19 | 2010-05-10 | 数据结合系统和数据结合方法 |
EP10777674.2A EP2434414A4 (en) | 2009-05-19 | 2010-05-10 | DATA COMBINATION SYSTEM AND DATA COMBINATION PROCESS |
KR1020117024440A KR101374642B1 (ko) | 2009-05-19 | 2010-05-10 | 데이터 결합 시스템 및 데이터 결합 방법 |
JP2011514382A JP5204303B2 (ja) | 2009-05-19 | 2010-05-10 | データ結合システム及びデータ結合方法 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2009121126 | 2009-05-19 | ||
JP2009-121126 | 2009-05-19 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2010134440A1 true WO2010134440A1 (ja) | 2010-11-25 |
Family
ID=43126122
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2010/057893 WO2010134440A1 (ja) | 2009-05-19 | 2010-05-10 | データ結合システム及びデータ結合方法 |
Country Status (6)
Country | Link |
---|---|
US (1) | US8386452B2 (ja) |
EP (1) | EP2434414A4 (ja) |
JP (1) | JP5204303B2 (ja) |
KR (1) | KR101374642B1 (ja) |
CN (1) | CN102422285B (ja) |
WO (1) | WO2010134440A1 (ja) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5765244B2 (ja) * | 2012-01-11 | 2015-08-19 | 富士通株式会社 | テーブル処理装置、テーブル処理方法、及びプログラム |
KR101784265B1 (ko) * | 2016-06-09 | 2017-10-12 | 주식회사 그리즐리 | 빅데이터의 비식별화 처리 방법 |
US10855767B1 (en) * | 2018-03-05 | 2020-12-01 | Amazon Technologies, Inc. | Distribution of batch data to sharded readers |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11250079A (ja) * | 1998-02-27 | 1999-09-17 | Nippon Telegr & Teleph Corp <Ntt> | データベース結合方法及び装置及びデータベース結合プログラムを格納した記憶媒体 |
JP2002288012A (ja) * | 2001-03-23 | 2002-10-04 | Casio Comput Co Ltd | ファイル結合装置、及びプログラム |
JP2005049943A (ja) | 2003-07-29 | 2005-02-24 | Toshiba Corp | データ処理装置、データ処理方法およびプログラム |
JP2008197976A (ja) * | 2007-02-14 | 2008-08-28 | Fujitsu Ltd | 連結情報生成プログラム及び連結情報生成方法 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6901403B1 (en) * | 2000-03-02 | 2005-05-31 | Quovadx, Inc. | XML presentation of general-purpose data sources |
US6850947B1 (en) * | 2000-08-10 | 2005-02-01 | Informatica Corporation | Method and apparatus with data partitioning and parallel processing for transporting data for data warehousing applications |
US6931390B1 (en) * | 2001-02-27 | 2005-08-16 | Oracle International Corporation | Method and mechanism for database partitioning |
US6789096B2 (en) * | 2001-06-25 | 2004-09-07 | Informatica Corporation | Real time sessions in an analytic application |
US20040098405A1 (en) * | 2002-11-16 | 2004-05-20 | Michael Zrubek | System and Method for Automated Link Analysis |
JP4948276B2 (ja) * | 2007-06-15 | 2012-06-06 | 三菱電機株式会社 | データベース検索装置及びデータベース検索プログラム |
-
2010
- 2010-05-10 US US13/321,383 patent/US8386452B2/en not_active Expired - Fee Related
- 2010-05-10 WO PCT/JP2010/057893 patent/WO2010134440A1/ja active Application Filing
- 2010-05-10 JP JP2011514382A patent/JP5204303B2/ja not_active Expired - Fee Related
- 2010-05-10 CN CN201080020656XA patent/CN102422285B/zh not_active Expired - Fee Related
- 2010-05-10 KR KR1020117024440A patent/KR101374642B1/ko not_active IP Right Cessation
- 2010-05-10 EP EP10777674.2A patent/EP2434414A4/en not_active Withdrawn
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11250079A (ja) * | 1998-02-27 | 1999-09-17 | Nippon Telegr & Teleph Corp <Ntt> | データベース結合方法及び装置及びデータベース結合プログラムを格納した記憶媒体 |
JP2002288012A (ja) * | 2001-03-23 | 2002-10-04 | Casio Comput Co Ltd | ファイル結合装置、及びプログラム |
JP2005049943A (ja) | 2003-07-29 | 2005-02-24 | Toshiba Corp | データ処理装置、データ処理方法およびプログラム |
JP2008197976A (ja) * | 2007-02-14 | 2008-08-28 | Fujitsu Ltd | 連結情報生成プログラム及び連結情報生成方法 |
Non-Patent Citations (1)
Title |
---|
See also references of EP2434414A4 * |
Also Published As
Publication number | Publication date |
---|---|
EP2434414A4 (en) | 2015-09-16 |
EP2434414A1 (en) | 2012-03-28 |
JP5204303B2 (ja) | 2013-06-05 |
JPWO2010134440A1 (ja) | 2012-11-08 |
CN102422285A (zh) | 2012-04-18 |
US20120066207A1 (en) | 2012-03-15 |
KR101374642B1 (ko) | 2014-03-14 |
KR20120022778A (ko) | 2012-03-12 |
US8386452B2 (en) | 2013-02-26 |
CN102422285B (zh) | 2013-09-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111708825B (zh) | 基于区块链的数据处理方法、装置、设备及可读存储介质 | |
US9935835B2 (en) | Methods, apparatuses, and computer program products for facilitating synchronization of setting configurations | |
CN109522330B (zh) | 基于区块链的云平台数据处理方法、装置、设备及介质 | |
JP2017207979A (ja) | 改ざん検知システム、及び改ざん検知方法 | |
US20120096030A1 (en) | Method and apparatus for providing search results by using previous query | |
KR101712570B1 (ko) | 컨텐츠 공유 서비스 제공 방법, 장치 및 컴퓨터 프로그램 | |
CN108052553B (zh) | 电子手册生成的方法、装置、计算机设备及存储介质 | |
CN112150030A (zh) | 基于多单位多身份的账号管理方法、终端设备及存储介质 | |
JP5204303B2 (ja) | データ結合システム及びデータ結合方法 | |
CN107391100B (zh) | 一种支持多语言账单的配置文件生成方法及装置 | |
JP5084665B2 (ja) | コンポーネント連携シナリオ統合開発環境提供システム、シナリオ作成支援方法、及び、プログラム | |
KR101702469B1 (ko) | 보험 정보 조회 기반의 운전 기사 매칭 지원 장치 및 방법 | |
CN111797334A (zh) | 一种网址访问方法、装置、电子设备及存储介质 | |
CN113778950B (zh) | 授信文件的获取方法、索引服务器、查询服务器和介质 | |
JP2016173623A (ja) | コンテンツ提供装置、コンテンツ提供方法及びコンテンツ提供プログラム | |
CN105827580B (zh) | 页面访问方法、装置及系统 | |
JP2016021156A (ja) | 画像表示装置 | |
JP2014235583A (ja) | データ移行システム、及びデータ移行方法 | |
JP5324276B2 (ja) | 情報取得システム、接続先情報取得サーバ及びプログラム | |
KR102130017B1 (ko) | 컨텐츠 관리 방법 및 이를 수행하기 위한 클라우드 서버 | |
JP2009118236A (ja) | 電話発信システム、投稿閲覧装置及び電話発信方法 | |
JP2007108992A (ja) | トランザクション分散方法、トランザクション分散プログラムおよびトランザクション分散装置 | |
CN116089695A (zh) | 一种关键词推荐方法、装置、计算机设备及存储介质 | |
JP6062891B2 (ja) | 端末装置、電子マガジン作成装置、電子マガジン作成システム、電子マガジン作成方法およびコンピュータプログラム | |
CN116886604A (zh) | 信息路由方法、装置和计算机设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201080020656.X Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10777674 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2011514382 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 20117024440 Country of ref document: KR Kind code of ref document: A |
|
REEP | Request for entry into the european phase |
Ref document number: 2010777674 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2010777674 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13321383 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |