WO2021246958A1 - System and method for combining related data from separate databases using identifier field pairs - Google Patents

System and method for combining related data from separate databases using identifier field pairs Download PDF

Info

Publication number
WO2021246958A1
WO2021246958A1 PCT/SG2021/050298 SG2021050298W WO2021246958A1 WO 2021246958 A1 WO2021246958 A1 WO 2021246958A1 SG 2021050298 W SG2021050298 W SG 2021050298W WO 2021246958 A1 WO2021246958 A1 WO 2021246958A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
database
data field
module
customer
Prior art date
Application number
PCT/SG2021/050298
Other languages
French (fr)
Inventor
Hanting Justin LEE
Original Assignee
Ecommerce Enablers Pte. Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ecommerce Enablers Pte. Ltd. filed Critical Ecommerce Enablers Pte. Ltd.
Publication of WO2021246958A1 publication Critical patent/WO2021246958A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/256Integrating or interfacing systems involving database management systems in federated or virtual databases

Definitions

  • the present disclosure relates to combining related data from separate databases. More particularly, the present disclosure relates to a system or method of locating identifier field pairings between separate databases to enable a query to link related data from the separate databases.
  • FIG. 1 is a block diagram representing a customer 15 in an omnichannel market 1-00 in the background art.
  • the customer 15 is illustrated in FIG. 1 at the center of the omnichannel market 1-00 efforts made by a vendor selling a branded product.
  • the omnichannel market 1-00 includes a brick and mortar channel 10 including a pair of brick and mortar stores 10A, 10B, an online platform channel 11 accessible via a mobile app 11A and a laptop 11B, a direct contact channel 12, and a social media channel 13.
  • the direct contact channel 12 includes means for contacting the customer 15 by telephone 12A (e.g., including a cell phone), text messaging 12B, and email 12C.
  • Payment methods 14 available to the customer 15 include credit card 14A, bank transfer 14B, a payment service account 14C (e.g., the Paypal brand payment service), and cash 14D (e.g., such as at the brick and mortar stores 10A, 10B).
  • the vendor may also engage distributors and consulting firms to provide directed sales and marketing efforts on behalf of the vendor. Even though the customer 15 accesses the market through, as in FIG.
  • Smaller vendors may not even directly manage the brick and mortar stores 10A, 10B. E.g., smaller vendors will often sell their products in third party department stores. Smaller vendors may also only host a virtual storefront on an e-commerce platform, such as those run by Shopify, Amazon, Ebay, Facebook, or Lazada. Inability to integrate customer data collected through these market channels, where the data is controlled by third parties, can place a smaller vendor a competitive disadvantage.
  • the acquired companies each have their own selection of marketing solutions and marketing partners.
  • API application programming interface
  • the API is a computing interface which defines interactions between multiple software intermediaries. It defines the kinds of calls or requests that can be made, how to make them, the data formats that should be used, and the conventions to follow. Some APIs are documented, others are designed so that they can be “interrogated” to determine supported functionality. Since other systems rely only on the API, the system that provides the API can change its internal details “behind” that API without affecting its users.
  • Each API and each database uses its own field names, field definitions, table architecture, and linking structure. Key indexes, for instance, which are used to link separate tables in a relational database, are not consistent between separate databases. Identifiers for a customer 15 also will differ, as each database has access to different customer data. While some databases may have a governmental identification of a customer 15, other databases may only have an email address, a cell number, or customer name.
  • a customer 15 is not concerned with these technical difficulties.
  • a customer 15 may start a transaction on an online platform but then expect to be able to seamlessly finalize the purchase at a brick and mortar store 10A, 10B.
  • the customer 15 may obtain a voucher from a social media platform 13 and wish to redeem the voucher on their smart phone app. If the customer 15 is inconvenienced or made to wait, the customer 15 may become frustrated and sale may be lost to another vendor.
  • a real-time capability to cross-reference and combine data from a plurality of databases would be a means of providing a better customer experience and building customer loyalty, leading to increased repeat customer sales and higher gross margins.
  • a first embodiment of the invention is a server for combining data from a plurality of separate databases including at least one processor in data communication with a network and a user interface, the server comprising: (a) an access module, wherein the access module is configured for communication with the plurality of separate databases through the network; (b) a linking module configured to calculate a plurality of congruence scores for a plurality of data field pairings; and (c) a query module for combining related data between the separate databases to create a unified report, wherein the related data from the separate databases is linked via the identifier field pairs.
  • the access module is configured to retrieve a logical design for each separate database. Each logical design comprises a plurality of data fields for storing data records. Each data field pairing includes a first data field of a first database and a second data field of a second database. Each data field pairing meeting or exceeding a congruence score threshold is designated as one of a plurality of identifier field pairs.
  • a second embodiment of the invention is a computer-implemented method for combining data from a plurality of separate databases using at least one processor in data communication with a network and a user interface, the method comprising the steps of: (a) maintaining an access module, wherein the access module is configured for communication with the plurality of separate databases through the network; (b) maintaining a linking module configured to calculate a plurality of congruence scores for a plurality of data field pairings; and (c) maintaining a query module for combining related data between the separate databases to create a unified report, wherein the related data from the separate databases is linked via the identifier field pairs.
  • the access module is configured to retrieve a logical design for each separate database. Each logical design comprises a plurality of data fields for storing data records.
  • Each data field pairing includes a first data field of a first database and a second data field of a second database.
  • Each data field pairing meeting or exceeding a congruence score threshold is designated as one of a plurality of identifier field pairs.
  • FIG. 1 is a block diagram representing a customer in an omnichannel market in the background art.
  • FIG. 2 is block representation of the integration of a server into a system for combining data from separate databases in an embodiment of the invention.
  • FIG. 3 is an illustration of links between data field pairings and their associated congruence scores in an embodiment of the invention.
  • FIG. 4 is flowchart of a method for combining data from separate databases in an embodiment of the invention.
  • a “computer”, a “server”, and a “system” may be implemented within: a single stand-alone computer, a stand-alone server, multiple dedicated servers, and/or a virtual server running on a larger network of servers and/or a cloud-based service.
  • a database may store data to and access data from a single stand-alone computer, a data server, multiple dedicated data servers, a cloud-based service, and/or a virtual server running on a network of servers.
  • FIG. 2 is block representation of the integration of a server 20 into a system 2-00 for combining data from separate databases 26 in an embodiment of the invention.
  • the separate databases 26 are identified in FIG. 1 as DBA, DBB, DBC, and DBD, though in practice there may be more or less of these separate databases 26.
  • Each of the separate databases 26 include a logical design dictating the format for the storing of data.
  • the separate databases 26 are in data communication with a server 20 through a network 27.
  • the server 20 includes an access module 21 for accessing data in the separate databases 26, a linking module 22, a query module 23, a push module 24, and a location module 25.
  • a user interface 28 for the server 20 is represented in FIG. 2 as a laptop computer connected to the server 20 through the network 27. Though not represented in FIG. 2, such networked access to the server 20 can be available to multiple users through multiple user interfaces 28 such as desktop computers, other servers, laptop computers, and smart phones.
  • FIG. 3 is an illustration of links 3-00 between data field pairings and their associated congruence scores in an embodiment of the invention.
  • separate databases 26 are identified in FIG. 3 as DBA, DBB, DBC, and DBD.
  • Each of the separate databases 26 include a logical design dictating the format for the storing of data.
  • DBA data fields are presented by the series of cto, ai, ct2, as, ci4 . . . to a n .
  • DBB data fields are presented by the series of bo, bi, b2, bj. A* . . . to b n .
  • Congruence scoring between data fields of separate databases 26 in FIG. 3 is depicted by the equations such as P (bo, co), representing a congruence score of similarity between the usage of data in the bo data field of DBB with the co data field of DBc.
  • P Bo, co
  • This type of one-to-one correspondence between two data fields and could be indicative of these data fields being a robust personal identifier field pair storing data such as a governmental identification number, an email address, or a phone number.
  • This type of robust personal identifier can more easily act as a key field used to link data fields in different data tables.
  • the invention can look beyond a one-to-one correspondence of two fields and look for correspondence of two or more data fields of a first database with two or more data fields of a second database to create an identifier pairing between two databases.
  • Some customer information for instance, taken alone may be insufficient to demonstrate congruence between the data fields of two databases.
  • customer information could include a building address, a date of birth, a first name, a last name, or a gender. Taken in isolation, such information could result in erroneous data field linkage of data records. E.g., two customers 15 could have the same name or the same date of birth.
  • FIG. 4 is flowchart 4-00 of a method for combining data from separate databases 26 in an embodiment of the invention, including the steps 4-01 to 4-07 (see below).
  • each separate database 26 including a logical design and a data records
  • 4-02 calculate congruence scores for a first data field of a first database and a second data field of a second database
  • a first embodiment of the invention is a server 20 for combining data from a plurality of separate databases 26 including at least one processor in data communication with a network 27 and a user interface 28, the server 20 comprising: (a) an access module 21, wherein the access module 21 is configured for communication with the plurality of separate databases 26 through the network 27; (b) a linking module 22 configured to calculate a plurality of congruence scores for a plurality of data field pairings; and (c) a query module 23 for combining related data between the separate databases 26 to create a unified report 23A, wherein the related data from the separate databases 26 is linked via the identifier field pairs.
  • the access module 21 is configured to retrieve a logical design for each separate database 26.
  • Each logical design comprises a plurality of data fields for storing data records.
  • Each data field pairing includes a first data field of a first database and a second data field of a second database.
  • Each data field pairing meeting or exceeding a congruence score threshold is designated as one of a plurality of identifier field pairs.
  • a second embodiment of the invention is a computer-implemented method for combining data from a plurality of separate databases 26 using at least one processor in data communication with a network 27 and a user interface 28, the method comprising the steps of: (a) maintaining an access module 21, wherein the access module 21 is configured for communication with the plurality of separate databases 26 through the network 27; (b) maintaining a linking module 22 configured to calculate a plurality of congruence scores for a plurality of data field pairings; and (c) maintaining a query module 23 for combining related data between the separate databases 26 to create a unified report 23A, wherein the related data from the separate databases 26 is linked via the identifier field pairs.
  • the access module 21 is configured to retrieve a logical design for each separate database 26.
  • Each logical design comprises a plurality of data fields for storing data records.
  • Each data field pairing includes a first data field of a first database and a second data field of a second database.
  • Each data field pairing meeting or exceeding a congruence score threshold is designated as one of a plurality of identifier field pairs.
  • a first technical benefit of the invention is that the logical designs and the data of the separate databases 26 need not be altered.
  • This aspect of the invention is critical because many API or other database access protocols are read-only, meaning that the party linking to the database is given limited permission that does not include altering the logical design or data of the linked database.
  • the invention enables the vendor to execute a technical solution within an IT environment of databases that the vendor does not fully control. With the use of multiple identifier field pairs, the invention is capable of working with the data as it is, even if that data is sub-optimal or if some of the data cannot be used due to regulatory restrictions.
  • a second technical benefit of the invention is that links between separate databases 26 and data fields within the separate databases 26 can be performed without a full migration of the data.
  • the links are determined by the linking module 22 and then a query is performed.
  • the unified report 23A is not static, meaning that as the data is updated within each of the separate databases 26, such updated data can be available in real-time in the unified report 23A.
  • This design allows the marketing and sales teams for a vendor more freedom to add, replace, or remove marketing channels.
  • new databases can be added to the unified reporting capability through the invention’s steps of accessing the new database via the access module 21 and establishing new links of identifier field pairs using the linking module 22.
  • the linking module 22 can also be further configured to: (i) quality test the identifier field pairs on a first predetermined schedule; and (ii) recalculate the plurality of congruence scores on a second predetermined schedule.
  • each data field pairing further includes at least one of: (a) a third data field of the first database; (b) a fourth data field of the second database; and (c) a fifth data field of a third separate database 26.
  • the calculation of the plurality of congruence scores evaluates at least one of: (a) a field definition for each data field; and (b) an associative table for each separate database 26.
  • the query module 23 includes a query history 23B; (b) the linking module 22 has access to the query history 23B; and (c) the linking module 22 includes AI / ML models based at least in part upon the query history 23B.
  • the query history 23B is particularly useful for AI / ML models as the logic design of separate databases 26 often have many similarities, whether those databases are proprietary or open source.
  • the invention may be used as a stand-alone system by individual vendors or as part of a SaaS model where a single company provides use of the invention to multiple vendors.
  • the single company will likely represent multiple vendors linking to the same databases, thus lessons learned and documented in query histories of one vendor may be useful for other vendors serviced by the same single company.
  • the AI / ML models can be continuously refined with real life examples.
  • the separate databases 26 include at least one of: (a) a brick & mortar store database; (b) an online platform database; (c) a social media database; (d) a direct contact customer support database; and (e) a payment method database.
  • the identifier field pairs include at least one of: (a) a a robust personal identifier including at least one of a government identification number, an email address, and a phone number; (b) a biographical identifier including at least one of a date of birth, a first name, a last name, and a gender; (c) a building address; (d) an electronic device identifier including at least one of a MAC address, a SIM card number, an IMEI number, and an internet address; (e) a financial account identifier including at least one of a credit cart number, a bank account number, a reward card number, and a payment service account identifier; (f) a social media profile name; and (g) a biometric identifier including at least one of a signature, a voiceprint, and a face portrait.
  • the congruence score of each data field pairing takes into account at least one of: (i) a similarity of data stored in at least one record of each data field pairing; and (ii) an identical data field type for each of the data fields in the data field pairing.
  • the linking module 22 is further configured to: (i) quality test the identifier field pairs on a first predetermined schedule; and (ii) recalculate the plurality of congruence scores on a second predetermined schedule.
  • the server 20 further including a location module 25.
  • the location module 25 determines a residence of each customer 15.
  • the location module 25 includes a library of consumer privacy protection rules for a plurality of jurisdictions.
  • the unified report 23A calculates and displays, for a plurality of identified parties, a plurality of consolidated data representing at least one of: (a) customer loyalty score; (b) a customer preference profile; (c) a customer purchase history; and (d) a customer wish list.
  • the server 20 can further comprise and the method can maintain a push module 24 for pushing the consolidated data to at least one of the separate databases 26.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A server 20 or method for combining data from a plurality of separate databases 26 including at least one processor in data communication with a network 27 and a user interface 28, the server 20 comprising: (a) an access module 21, wherein the access module 21 is configured for communication with the plurality of separate databases 26 through the network 27; (b) a linking module 22 configured to calculate a plurality of congruence scores for a plurality of data field pairings; and (c) a query module 23 for combining related data between the separate databases 26 to create a unified report 23A, wherein the related data from the separate databases 26 is linked via the identifier field pairs. The access module 21 is configured to retrieve a logical design for each separate database 26. Each logical design comprises a plurality of data fields for storing data records. Each data field pairing includes a first data field of a first database and a second data field of a second database. Each data field pairing meeting or exceeding a congruence score threshold is designated as one of a plurality of identifier field pairs.

Description

SYSTEM AND METHOD FOR COMBINING RELATED DATA FROM SEPARATE DATABASES USING IDENTIFIER FIELD PAIRS
TECHNICAL CONTRIBUTION
The present disclosure relates to combining related data from separate databases. More particularly, the present disclosure relates to a system or method of locating identifier field pairings between separate databases to enable a query to link related data from the separate databases.
BACKGROUND
FIG. 1 is a block diagram representing a customer 15 in an omnichannel market 1-00 in the background art. The customer 15 is illustrated in FIG. 1 at the center of the omnichannel market 1-00 efforts made by a vendor selling a branded product. The omnichannel market 1-00 includes a brick and mortar channel 10 including a pair of brick and mortar stores 10A, 10B, an online platform channel 11 accessible via a mobile app 11A and a laptop 11B, a direct contact channel 12, and a social media channel 13. The direct contact channel 12 includes means for contacting the customer 15 by telephone 12A (e.g., including a cell phone), text messaging 12B, and email 12C. The social media channel 13 depicted in FIG. 1 includes as examples the Facebook brand social media platform 13A, the Twitter brand social media platform 13B, and the Instagram brand social media platform 13C. The customer 15 is contactable on the social media platform 13 through targeted personalized advertisements. Payment methods 14 available to the customer 15 include credit card 14A, bank transfer 14B, a payment service account 14C (e.g., the Paypal brand payment service), and cash 14D (e.g., such as at the brick and mortar stores 10A, 10B). Though not shown in FIG. 1, the vendor may also engage distributors and consulting firms to provide directed sales and marketing efforts on behalf of the vendor. Even though the customer 15 accesses the market through, as in FIG. 1, such a multitude of means, both physical (e.g., brick and mortar stores 10A, 10B) and virtual (e.g., mobile apps 11A), the expectation of a customer 15 is that all their purchasing experiences for a single brand will be integrated. For instance if the vendor selling the brand has a customer loyalty program providing discounts and/or easy checkout, the customer 15 expects to enjoy these benefits and conveniences in each interaction within the omnichannel market 1-00.
Larger vendors may have the resources to directly manage and integrate or link the databases supporting their brick and mortar stores 10A, 10B with the databases supporting their online platform 11. But even larger vendors will not have control over all aspects of the omnichannel market 1-00. For instance, direct contact channel 12 interaction is often outsourced to large professional customer service centers, many of which are located outside the vendor’s country. And even the largest vendors do not control the social media channel channels 13 or the payment methods 14.
Smaller vendors may not even directly manage the brick and mortar stores 10A, 10B. E.g., smaller vendors will often sell their products in third party department stores. Smaller vendors may also only host a virtual storefront on an e-commerce platform, such as those run by Shopify, Amazon, Ebay, Alibaba, or Lazada. Inability to integrate customer data collected through these market channels, where the data is controlled by third parties, can place a smaller vendor a competitive disadvantage.
Vendors, small and large, face additional database integration challenges when making acquisitions of other companies. The acquired companies each have their own selection of marketing solutions and marketing partners.
One available solution for interaction with databases outside one’s control is the application programming interface (“API”). The API is a computing interface which defines interactions between multiple software intermediaries. It defines the kinds of calls or requests that can be made, how to make them, the data formats that should be used, and the conventions to follow. Some APIs are documented, others are designed so that they can be “interrogated” to determine supported functionality. Since other systems rely only on the API, the system that provides the API can change its internal details “behind” that API without affecting its users.
However, even with access to the data hosted within an omnichannel market 1-00, there is still a matter of integrating the data. Each API and each database uses its own field names, field definitions, table architecture, and linking structure. Key indexes, for instance, which are used to link separate tables in a relational database, are not consistent between separate databases. Identifiers for a customer 15 also will differ, as each database has access to different customer data. While some databases may have a governmental identification of a customer 15, other databases may only have an email address, a cell number, or customer name.
A customer 15 is not concerned with these technical difficulties. A customer 15, for instance, may start a transaction on an online platform but then expect to be able to seamlessly finalize the purchase at a brick and mortar store 10A, 10B. Or the customer 15 may obtain a voucher from a social media platform 13 and wish to redeem the voucher on their smart phone app. If the customer 15 is inconvenienced or made to wait, the customer 15 may become frustrated and sale may be lost to another vendor.
To maintain a consistent customer 15 experience across an omnichannel market 1-00, a real-time capability to cross-reference and combine data from a plurality of databases would be a means of providing a better customer experience and building customer loyalty, leading to increased repeat customer sales and higher gross margins.
SUMMARY
A first embodiment of the invention is a server for combining data from a plurality of separate databases including at least one processor in data communication with a network and a user interface, the server comprising: (a) an access module, wherein the access module is configured for communication with the plurality of separate databases through the network; (b) a linking module configured to calculate a plurality of congruence scores for a plurality of data field pairings; and (c) a query module for combining related data between the separate databases to create a unified report, wherein the related data from the separate databases is linked via the identifier field pairs. The access module is configured to retrieve a logical design for each separate database. Each logical design comprises a plurality of data fields for storing data records. Each data field pairing includes a first data field of a first database and a second data field of a second database. Each data field pairing meeting or exceeding a congruence score threshold is designated as one of a plurality of identifier field pairs.
A second embodiment of the invention is a computer-implemented method for combining data from a plurality of separate databases using at least one processor in data communication with a network and a user interface, the method comprising the steps of: (a) maintaining an access module, wherein the access module is configured for communication with the plurality of separate databases through the network; (b) maintaining a linking module configured to calculate a plurality of congruence scores for a plurality of data field pairings; and (c) maintaining a query module for combining related data between the separate databases to create a unified report, wherein the related data from the separate databases is linked via the identifier field pairs. The access module is configured to retrieve a logical design for each separate database. Each logical design comprises a plurality of data fields for storing data records. Each data field pairing includes a first data field of a first database and a second data field of a second database. Each data field pairing meeting or exceeding a congruence score threshold is designated as one of a plurality of identifier field pairs. BRIEF DESCRIPTION OF THE DRAWINGS
Embodiments of the present disclosure are described herein with reference to the drawings in which:
FIG. 1 is a block diagram representing a customer in an omnichannel market in the background art.
FIG. 2 is block representation of the integration of a server into a system for combining data from separate databases in an embodiment of the invention.
FIG. 3 is an illustration of links between data field pairings and their associated congruence scores in an embodiment of the invention.
FIG. 4 is flowchart of a method for combining data from separate databases in an embodiment of the invention.
DETAILED DESCRIPTION
In the following detailed description, reference is made to the accompanying drawings, which form a part hereof. The illustrative embodiments described in the detailed description, drawings and claims are not meant to be limiting. Other embodiments can be utilized, and other changes can be made, without departing from the spirit or scope of the subject matter presented herein. Unless specified otherwise, the terms “comprising,” “comprise,” “including” and “include” used herein, and grammatical variants thereof, are intended to represent “open” or “inclusive” language such that they include recited elements but also permit inclusion of additional, un -recited elements.
As used herein, the software and hardware of a “computer”, a “server”, and a “system” may be implemented within: a single stand-alone computer, a stand-alone server, multiple dedicated servers, and/or a virtual server running on a larger network of servers and/or a cloud-based service. As used herein, a database may store data to and access data from a single stand-alone computer, a data server, multiple dedicated data servers, a cloud-based service, and/or a virtual server running on a network of servers.
FIG. 2 is block representation of the integration of a server 20 into a system 2-00 for combining data from separate databases 26 in an embodiment of the invention. The separate databases 26 are identified in FIG. 1 as DBA, DBB, DBC, and DBD, though in practice there may be more or less of these separate databases 26. Each of the separate databases 26 include a logical design dictating the format for the storing of data. The separate databases 26 are in data communication with a server 20 through a network 27. The server 20 includes an access module 21 for accessing data in the separate databases 26, a linking module 22, a query module 23, a push module 24, and a location module 25. A user interface 28 for the server 20 is represented in FIG. 2 as a laptop computer connected to the server 20 through the network 27. Though not represented in FIG. 2, such networked access to the server 20 can be available to multiple users through multiple user interfaces 28 such as desktop computers, other servers, laptop computers, and smart phones.
FIG. 3 is an illustration of links 3-00 between data field pairings and their associated congruence scores in an embodiment of the invention. As in FIG. 2, separate databases 26 are identified in FIG. 3 as DBA, DBB, DBC, and DBD. Each of the separate databases 26 include a logical design dictating the format for the storing of data. In DBA, for instance, data fields are presented by the series of cto, ai, ct2, as, ci4 . . . to an. For DBB, data fields are presented by the series of bo, bi, b2, bj. A* . . . to bn. For DBc, data fields are presented by the series of co, ci, C2, a, c.* . . . to c„. For DBD, data fields are presented by the series of do, di,
Figure imgf000008_0001
. . . to d„.
Congruence scoring between data fields of separate databases 26 in FIG. 3 is depicted by the equations such as P (bo, co), representing a congruence score of similarity between the usage of data in the bo data field of DBB with the co data field of DBc. This type of one-to-one correspondence between two data fields and could be indicative of these data fields being a robust personal identifier field pair storing data such as a governmental identification number, an email address, or a phone number. This type of robust personal identifier can more easily act as a key field used to link data fields in different data tables.
Where robust personal identifiers are not available, the invention can look beyond a one-to-one correspondence of two fields and look for correspondence of two or more data fields of a first database with two or more data fields of a second database to create an identifier pairing between two databases. Some customer information, for instance, taken alone may be insufficient to demonstrate congruence between the data fields of two databases. Such customer information could include a building address, a date of birth, a first name, a last name, or a gender. Taken in isolation, such information could result in erroneous data field linkage of data records. E.g., two customers 15 could have the same name or the same date of birth. However, the congruence of two or more of these fields, such as both the name and date of birth, can significantly increase a congruence score. Such a correspondence is firstly depicted in FIG. 2 by the by the two links between DBA and DBB with the equations: (i) P (ao, bi), representing a congruence score of similarity between the usage of data in the ao data field of DBA with the bi data field of DBB; in conjunction with (ii) P (04, be), representing a congruence score of similarity between the usage of data in the a 4 data field of DBA with the bs data field of DBB. Such a correspondence is secondly depicted in FIG. 2 by the two links between DBc and DBD with the equations: (i) P (03, do), representing a congruence score of similarity between the usage of data in the C3 data field of DBc with the do data field of DBD; in conjunction with (ii) P (c4, di), representing a congruence score of similarity between the usage of data in the C4 data field of DBc with the d2 data field of DBD.
Data field pairing with a congruence score exceeding a set threshold can be used to establish identifier field pairs used to link the separate databases 26. Once the separate databases 26 are linked using the data field pairing having high threshold scores, data that is unique to each database can be collected using a query module 23. Such unique data is depicted as surrounded by stars in FIG. 3 (e.g., see data fields <¾ and a). This data, while unique perhaps to a specific marketing channel database, can be shared with other data channel databases to provide a unified customer experience across the omnichannel market 1-00 for each customer 15. Note that this linking of data between the separate databases 26 does not require any changes to the logical design or changes to the data of any of the databases. FIG. 4 is flowchart 4-00 of a method for combining data from separate databases 26 in an embodiment of the invention, including the steps 4-01 to 4-07 (see below).
4-01 access a plurality of separate databases 26, each separate database 26 including a logical design and a data records
4-02 calculate congruence scores for a first data field of a first database and a second data field of a second database
4-03 based on congruence scores, designate identifier link pairs 4-04 link the separate databases 26 via the identifier field pairs 4-05 combine related data between the separate databases 26 to create a unified report 23A for omnichannel market 1-00
4-06 determine customer location and restrict access to private or confidential information of customer 15 according to protection rules for jurisdiction 4-07 push consolidated data to the separate databases 26
More generally, a first embodiment of the invention is a server 20 for combining data from a plurality of separate databases 26 including at least one processor in data communication with a network 27 and a user interface 28, the server 20 comprising: (a) an access module 21, wherein the access module 21 is configured for communication with the plurality of separate databases 26 through the network 27; (b) a linking module 22 configured to calculate a plurality of congruence scores for a plurality of data field pairings; and (c) a query module 23 for combining related data between the separate databases 26 to create a unified report 23A, wherein the related data from the separate databases 26 is linked via the identifier field pairs. The access module 21 is configured to retrieve a logical design for each separate database 26. Each logical design comprises a plurality of data fields for storing data records. Each data field pairing includes a first data field of a first database and a second data field of a second database. Each data field pairing meeting or exceeding a congruence score threshold is designated as one of a plurality of identifier field pairs. A second embodiment of the invention is a computer-implemented method for combining data from a plurality of separate databases 26 using at least one processor in data communication with a network 27 and a user interface 28, the method comprising the steps of: (a) maintaining an access module 21, wherein the access module 21 is configured for communication with the plurality of separate databases 26 through the network 27; (b) maintaining a linking module 22 configured to calculate a plurality of congruence scores for a plurality of data field pairings; and (c) maintaining a query module 23 for combining related data between the separate databases 26 to create a unified report 23A, wherein the related data from the separate databases 26 is linked via the identifier field pairs. The access module 21 is configured to retrieve a logical design for each separate database 26. Each logical design comprises a plurality of data fields for storing data records. Each data field pairing includes a first data field of a first database and a second data field of a second database. Each data field pairing meeting or exceeding a congruence score threshold is designated as one of a plurality of identifier field pairs.
A first technical benefit of the invention is that the logical designs and the data of the separate databases 26 need not be altered. This aspect of the invention is critical because many API or other database access protocols are read-only, meaning that the party linking to the database is given limited permission that does not include altering the logical design or data of the linked database. Hence the invention enables the vendor to execute a technical solution within an IT environment of databases that the vendor does not fully control. With the use of multiple identifier field pairs, the invention is capable of working with the data as it is, even if that data is sub-optimal or if some of the data cannot be used due to regulatory restrictions.
A second technical benefit of the invention is that links between separate databases 26 and data fields within the separate databases 26 can be performed without a full migration of the data. The links are determined by the linking module 22 and then a query is performed. The unified report 23A is not static, meaning that as the data is updated within each of the separate databases 26, such updated data can be available in real-time in the unified report 23A. This design allows the marketing and sales teams for a vendor more freedom to add, replace, or remove marketing channels. E.g., new databases can be added to the unified reporting capability through the invention’s steps of accessing the new database via the access module 21 and establishing new links of identifier field pairs using the linking module 22. Hence a 360 degree selection of omnichannel market 1-00 shopping options can be continuously updated for the customer 15 as new shopping options arise and old shopping options fall into disfavor or disappear. Time consuming data migration and information technology architectural updates by the vendor can be avoided. Customer complaints or suggestions can be acted on quickly.
It should be noted that while data is updated frequently in a database, as receiving new data in an organized fashion is one of the primary functions of a database, changes to the logical design of a database is far less frequent. While data may be updated multiple times an hour or day, the identifier field pairs discovered by the linking module 22 may be static for days, weeks, or months. Also, not all data fields in a database are likely to be updated at once. The invention is able to make more than one data field linkage between databases, as illustrated in FIG. 2 between databases DBA and DBB with the equations P (ao, bi) and P (ci4, b ). Hence the invention’s architecture is able to identify and remedy a sudden incongruence between previously determined identifier field pairs. The linking module 22 can also be further configured to: (i) quality test the identifier field pairs on a first predetermined schedule; and (ii) recalculate the plurality of congruence scores on a second predetermined schedule.
In an alternative embodiment of the server 20 and the method, each data field pairing further includes at least one of: (a) a third data field of the first database; (b) a fourth data field of the second database; and (c) a fifth data field of a third separate database 26.
In an alternative embodiment of the server 20 and the method, the calculation of the plurality of congruence scores evaluates at least one of: (a) a field definition for each data field; and (b) an associative table for each separate database 26. In an alternative embodiment of the server 20 and the method: (a) the query module 23 includes a query history 23B; (b) the linking module 22 has access to the query history 23B; and (c) the linking module 22 includes AI / ML models based at least in part upon the query history 23B. The query history 23B is particularly useful for AI / ML models as the logic design of separate databases 26 often have many similarities, whether those databases are proprietary or open source. Note that the invention may be used as a stand-alone system by individual vendors or as part of a SaaS model where a single company provides use of the invention to multiple vendors. In the SaaS model, the single company will likely represent multiple vendors linking to the same databases, thus lessons learned and documented in query histories of one vendor may be useful for other vendors serviced by the same single company. Hence the AI / ML models can be continuously refined with real life examples.
In an alternative embodiment of the server 20 and the method, the separate databases 26 include at least one of: (a) a brick & mortar store database; (b) an online platform database; (c) a social media database; (d) a direct contact customer support database; and (e) a payment method database.
In an alternative embodiment of the server 20 and the method, the identifier field pairs include at least one of: (a) a a robust personal identifier including at least one of a government identification number, an email address, and a phone number; (b) a biographical identifier including at least one of a date of birth, a first name, a last name, and a gender; (c) a building address; (d) an electronic device identifier including at least one of a MAC address, a SIM card number, an IMEI number, and an internet address; (e) a financial account identifier including at least one of a credit cart number, a bank account number, a reward card number, and a payment service account identifier; (f) a social media profile name; and (g) a biometric identifier including at least one of a signature, a voiceprint, and a face portrait.
In an alternative embodiment of the server 20 and the method, the congruence score of each data field pairing takes into account at least one of: (i) a similarity of data stored in at least one record of each data field pairing; and (ii) an identical data field type for each of the data fields in the data field pairing. In addition, the linking module 22 is further configured to: (i) quality test the identifier field pairs on a first predetermined schedule; and (ii) recalculate the plurality of congruence scores on a second predetermined schedule.
In an alternative embodiment of the server 20 and the method, the server 20 further including a location module 25. The location module 25 determines a residence of each customer 15. The location module 25 includes a library of consumer privacy protection rules for a plurality of jurisdictions. The query module 23, following at least one consumer privacy protection rule for at least one jurisdiction, restricts access to at least one data field based upon the residence of at least one customer 15. With the use of multiple identifier field pairs, the invention is capable of working with the data as it is, even if that data is sub-optimal or if some of the data cannot be used due to regulatory restrictions.
In an alternative embodiment of the server 20 and the method, the unified report 23A calculates and displays, for a plurality of identified parties, a plurality of consolidated data representing at least one of: (a) customer loyalty score; (b) a customer preference profile; (c) a customer purchase history; and (d) a customer wish list. The server 20 can further comprise and the method can maintain a push module 24 for pushing the consolidated data to at least one of the separate databases 26.
While various aspects and embodiments have been disclosed herein, it will be apparent that various other modifications and adaptations of the invention will be apparent to the person skilled in the art after reading the foregoing disclosure without departing from the spirit and scope of the invention and it is intended that all such modifications and adaptations come within the scope of the appended claims. The various aspects and embodiments disclosed herein are for purposes of illustration and are not intended to be limiting, with the true scope and spirit of the invention being indicated by the appended claims.

Claims

1. A server for combining data from a plurality of separate databases including at least one processor in data communication with a network and a user interface, the server comprising:
(a) an access module, wherein the access module is configured for communication with the plurality of separate databases through the network;
(i) wherein the access module is configured to retrieve a logical design for each separate database; and
(ii) wherein each logical design comprises a plurality of data fields for storing data records;
(b) a linking module configured to calculate a plurality of congruence scores for a plurality of data field pairings;
(i) wherein each data field pairing includes a first data field of a first database and a second data field of a second database; and
(ii) wherein each data field pairing meeting or exceeding a congruence score threshold is designated as one of a plurality of identifier field pairs; and
(c) a query module for combining related data between the separate databases to create a unified report, wherein the related data from the separate databases is linked via the identifier field pairs.
2. The server of claim 1, wherein each data field pairing further includes at least one of:
(a) a third data field of the first database;
(b) a fourth data field of the second database; and
(c) a fifth data field of a third separate database.
3. The server of claim 1, wherein the calculation of the plurality of congruence scores evaluates at least one of:
(a) a field definition for each data field; and
(b) an associative table for each separate database.
4. The server of claim 1,
(a) wherein the query module includes a query history;
(b) wherein the linking module has access to the query history; and
(c) wherein the linking module includes AI / ML models based at least in part upon the query history.
5. The server of claim 1, wherein the separate databases include at least one of:
(a) a brick & mortar store database;
(b) an online platform database; (c) a social media database;
(d) a direct contact customer support database; and
(e) a payment method database.
6. The server of claim 1, wherein the identifier field pairs include at least one of: (a) a robust personal identifier including at least one of a government identification number, an email address, and a phone number, an email address, and a phone number;
(b) a biographical identifier including at least one of a date of birth, a first name, a last name, and a gender;
(c) a building address; (d) an electronic device identifier including at least one of a MAC address, a SIM card number, an IMEI number, and an internet address;
(e) a financial account identifier including at least one of a credit cart number, a bank account number, a reward card number, and a payment service account identifier;
(f) a social media profile name; and (g) a biometric identifier including at least one of a signature, a voiceprint, and a face portrait.
7. The server of claim 1,
(a) wherein the congruence score of each data field pairing takes into account at least one of:
(i) a similarity of data stored in at least one record of each data field pairing; and
(ii) an identical data field type for each of the data fields in the data field pairing; and
(b) wherein the linking module is further configured to:
(i) quality test the identifier field pairs on a first predetermined schedule; and
(ii) recalculate the plurality of congruence scores on a second predetermined schedule.
8. The server of claim 1, the server further including a location module,
(a) wherein the location module determines a residence of each customer;
(b) wherein the location module includes a library of consumer privacy protection rules for a plurality of jurisdictions; and
(c) wherein the query module, following at least one consumer privacy protection rule for at least one jurisdiction, restricts access to at least one data field based upon the residence of at least one customer.
9. The server of claim 1, wherein the unified report calculates and displays, for a plurality of identified parties, a plurality of consolidated data representing at least one of:
(a) a customer loyalty score;
(b) a customer preference profile;
(c) a customer purchase history; and
(d) a customer wish list.
10. The server of claim 9, the server further comprising a push module for pushing the consolidated data to at least one of the separate databases.
11. A computer-method for combining data from a plurality of separate databases using at least one processor in data communication with a network and a user interface, the method comprising the steps of:
(a) maintaining an access module, wherein the access module is configured for communication with the plurality of separate databases through the network;
(i) wherein the access module is configured to retrieve a logical design for each separate database; and
(ii) wherein each logical design comprises a plurality of data fields for storing data records;
(b) maintaining a linking module configured to calculate a plurality of congruence scores for a plurality of data field pairings;
(i) wherein each data field pairing includes a first data field of a first database and a second data field of a second database; and
(ii) wherein each data field pairing meeting or exceeding a congruence score threshold is designated as one of a plurality of identifier field pairs; and
(c) maintaining a query module for combining related data between the separate databases to create a unified report, wherein the related data from the separate databases is linked via the identifier field pairs.
12. The method of claim 11, wherein each data field pairing further includes at least one of:
(a) a third data field of the first database;
(b) a fourth data field of the second database; and
(c) a fifth data field of a third separate database.
13. The method of claim 11, wherein the calculation of the plurality of congruence scores evaluates at least one of:
(a) a field definition for each data field; and
(b) an associative table for each separate database.
14. The method of claim 11,
(a) wherein the query module includes a query history;
(b) wherein the linking module has access to the query history; and
(c) wherein the linking module includes AI / ML models based at least in part upon the query history.
15. The method of claim 11, wherein the separate databases include at least one of:
(a) a brick & mortar store database;
(b) an online platform database; (c) a social media database;
(d) a direct contact customer support database; and
(e) a payment method database.
16. The method of claim 11, wherein the identifier field pairs include at least one of:
(a) a a robust personal identifier including at least one of a government identification number, an email address, and a phone number;
(b) a biographical identifier including at least one of a date of birth, a first name, a last name, and a gender; (c) a building address;
(d) an electronic device identifier including at least one of a MAC address, a SIM card number, an IMEI number, and an internet address;
(e) a financial account identifier including at least one of a credit cart number, a bank account number, a reward card number, and a payment service account identifier; (f) a social media profile name; and
(g) a biometric identifier including at least one of a signature, a voiceprint, and a face portrait.
17. The method of claim 11,
(a) wherein the congruence score of each data field pairing takes into account at least one of:
(i) a similarity of data stored in at least one record of each data field pairing; and
(ii) an identical data field type for each of the data fields in the data field pairing; and
(b) wherein the linking module is further configured to:
(i) quality test the identifier field pairs on a first predetermined schedule; and
(ii) recalculate the plurality of congruence scores on a second predetermined schedule.
18. The method of claim 11, the method further including a location module,
(a) wherein the location module determines a residence of each customer;
(b) wherein the location module includes a library of consumer privacy protection rules for a plurality of jurisdictions; and
(c) wherein the query module, following at least one consumer privacy protection rule for at least one jurisdiction, restricts access to at least one data field based upon the residence of at least one customer.
19. The method of claim 11, wherein the unified report calculates and displays, for a plurality of identified parties, a plurality of consolidated data representing at least one of:
(a) a customer loyalty score;
(b) a customer preference profile;
(c) a customer purchase history; and
(d) a customer wish list.
20. The method of claim 19, the method further comprising maintaining a push module for pushing the consolidated data to at least one of the separate databases.
PCT/SG2021/050298 2020-06-02 2021-05-28 System and method for combining related data from separate databases using identifier field pairs WO2021246958A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
SG10202005190U 2020-06-02
SG10202005190U 2020-06-02

Publications (1)

Publication Number Publication Date
WO2021246958A1 true WO2021246958A1 (en) 2021-12-09

Family

ID=78831690

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SG2021/050298 WO2021246958A1 (en) 2020-06-02 2021-05-28 System and method for combining related data from separate databases using identifier field pairs

Country Status (1)

Country Link
WO (1) WO2021246958A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170053357A1 (en) * 2015-08-18 2017-02-23 Fiserv, Inc. Generating integrated data records by correlating source data records from disparate data sources
US20180181644A1 (en) * 2016-12-22 2018-06-28 Aon Global Operations Ltd (Singapore Branch) Methods and systems for linking data records from disparate databases
CN110569289A (en) * 2019-09-11 2019-12-13 星环信息科技(上海)有限公司 Column data processing method, equipment and medium based on big data
US20200042626A1 (en) * 2018-07-31 2020-02-06 Splunk Inc. Identifying similar field sets using related source types
US10599395B1 (en) * 2017-10-11 2020-03-24 Amperity, Inc. Dynamically merging database tables

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170053357A1 (en) * 2015-08-18 2017-02-23 Fiserv, Inc. Generating integrated data records by correlating source data records from disparate data sources
US20180181644A1 (en) * 2016-12-22 2018-06-28 Aon Global Operations Ltd (Singapore Branch) Methods and systems for linking data records from disparate databases
US10599395B1 (en) * 2017-10-11 2020-03-24 Amperity, Inc. Dynamically merging database tables
US20200042626A1 (en) * 2018-07-31 2020-02-06 Splunk Inc. Identifying similar field sets using related source types
CN110569289A (en) * 2019-09-11 2019-12-13 星环信息科技(上海)有限公司 Column data processing method, equipment and medium based on big data

Similar Documents

Publication Publication Date Title
US10983960B2 (en) Multi-source, multi-dimensional, cross-entity, multimedia centralized personal information database platform apparatuses, methods and systems
US11397831B2 (en) Method and system for double anonymization of data
US9760735B2 (en) Anonymous information exchange
US20140372338A1 (en) Systems and methods for recommending merchants to a consumer
US20190164176A1 (en) Systems and methods for processing transaction data
US20110040631A1 (en) Personalized commerce system
JP2021534468A (en) Automated Corporate Transaction Data Aggregation and Accounting
US11886503B2 (en) Method and apparatus for implementing a search index generator
US20190020557A1 (en) Methods and systems for analyzing entity performance
US11100531B2 (en) Method and apparatus for clustering platform sessions and user accounts associated with the platform sessions
US9275125B1 (en) System for organizing data from a plurality of users to create individual user profiles
US11734755B2 (en) Dynamically determining real-time offers
US10956916B1 (en) Self learning machine learning pipeline for enabling identity verification
JP6250557B2 (en) Providing device, program, and providing method
WO2021217497A1 (en) Statistics-aware sub-graph query engine
US20180075468A1 (en) Systems and methods for merchant business intelligence tools
JP2017054448A (en) Housing loan preliminary review system, method and program
US11748386B2 (en) Method, system, and computer program product for managing source identifiers of clustered records
WO2021246958A1 (en) System and method for combining related data from separate databases using identifier field pairs
US20160148220A1 (en) Method and system for impact modeling of brand repulsion
US20150348209A1 (en) Method and system for linking forensic data with purchase behavior
US20230205741A1 (en) Enterprise data management platform
US20230205742A1 (en) Data quality control in an enterprise data management platform
US20160275539A1 (en) Systems and Methods for Managing Reward Payments for Search Results by Search Engines
WO2022046773A1 (en) Computer program to curate, enable self-analysis, control, and protect individual user data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21817959

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21817959

Country of ref document: EP

Kind code of ref document: A1