US11556515B2 - Artificially-intelligent, continuously-updating, centralized-database-identifier repository system - Google Patents

Artificially-intelligent, continuously-updating, centralized-database-identifier repository system Download PDF

Info

Publication number
US11556515B2
US11556515B2 US17/404,047 US202117404047A US11556515B2 US 11556515 B2 US11556515 B2 US 11556515B2 US 202117404047 A US202117404047 A US 202117404047A US 11556515 B2 US11556515 B2 US 11556515B2
Authority
US
United States
Prior art keywords
data
databases
database
data elements
repository
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
US17/404,047
Other versions
US20210374117A1 (en
Inventor
Matthew E. Carroll
Manu Kurian
Aaron E. Russell
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of America Corp
Original Assignee
Bank of America Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of America Corp filed Critical Bank of America Corp
Priority to US17/404,047 priority Critical patent/US11556515B2/en
Assigned to BANK OF AMERICA CORPORATION reassignment BANK OF AMERICA CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: RUSSELL, AARON E., CARROLL, MATTHEW E., KURIAN, MANU
Publication of US20210374117A1 publication Critical patent/US20210374117A1/en
Application granted granted Critical
Publication of US11556515B2 publication Critical patent/US11556515B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/217Database tuning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24578Query processing with adaptation to user needs using ranking

Definitions

  • This disclosure relates to system that utilize database identifiers.
  • LOB lines of business
  • Each LOB may host one or more databases. For a variety of reasons, communication between disparate LOB databases may be limited. Because little communication exists between the LOB databases, it is not uncommon for duplicative data to be found throughout multiple databases associated with a single entity.
  • Duplicative data, or multiple instances of the same data element, may be damaging to an entity for numerous reasons. Firstly, duplicative data wastes time, effort, energy and resources used to maintain the data in more than one location. Secondly, duplicative data may cause data inconsistencies, because the data may be updated in one location and not updated in another location. Thirdly, duplicative data slows the processing of a system because the system has to review, and sometimes, traverse multiple copies of the same data element.
  • the system may include an artificial intelligence module.
  • the artificial intelligence module may be operable to review a plurality of databases.
  • Each database included in the plurality of databases may include one or more tables.
  • the review by the artificial intelligence module may be used to seek out and determine the existence of multiple particulars, redundancies and/or fact patterns within the plurality of databases.
  • One particular may include determining duplicate records within the plurality of databases. Another particular may include determining comparable records within the plurality of databases. It should be appreciated that comparable records may be similar to one another. An example of similar records may be two records that include approximately 75% of the same data and 25% different data. Another example of similar records may be two records that include approximately 90% of the same data and 10% different data.
  • Another particular may include determining a utilization metric for a table included within the plurality of databases.
  • the utilization metric may identify how often the table is used.
  • the artificial intelligence module may identify one or more recommendations for database synchronization and/or database usage optimization.
  • the recommendations may include removing duplicate entries.
  • the recommendations may include consolidating tables.
  • the recommendations may include consolidating databases.
  • the recommendations may include maintaining data references within certain tables to other tables. For example, a first data element from a first table in a first database may be duplicative of a second data element from a second table in a second database.
  • the recommendation may include, upon receipt of confirmation of redundancy between the first data element and the second data element, deleting the second data element from the second table in the second database and replacing the second data element from the second table in the second database with a reference to the first data element in the first table in the first database. Maintaining the data reference may ensure that the second database retains access to the data. Maintaining the data reference, and only maintaining the data element in one database, may also ensure that the data remains consistent across multiple databases.
  • Recommendations for database usage optimization may include archiving unused tables or databases, placing tables used at a high frequency in priority memory locations, deactivating legacy databases and any other database usage optimization.
  • Tables and/or databases that are determined to be utilized less than a second threshold frequency may be deactivated, archived or deleted.
  • the artificial intelligence module may rank the tables included in the plurality of databases based on the utilization metric. More frequently used tables may receive a higher ranking and less frequently used tables may receive a lower ranking.
  • the artificial intelligence module may determine and assign memory locations for each table and/or database.
  • the memory locations may include a plurality of memory locations with shorter than a threshold response time.
  • the memory locations may also include a plurality of memory locations with greater than a threshold response time.
  • Tables and/or databases with usage that is greater than a threshold frequency may be assigned to memory locations with shorter than the threshold response time.
  • Tables and/or databases with usage that is lower than a threshold frequency may be assigned to memory locations with greater than a threshold response time.
  • the system may include a display module.
  • the display module may be configured to display the recommendations to an operator.
  • the display module may be configured to instruct the system to execute the recommendations upon receipt of operator confirmation.
  • the system may continuously re-review the plurality of databases in order to identify one or more recommendations after a predetermined time period.
  • the system may continually or substantially continuously re-rank the tables included in the plurality of databases in order to redetermine, and as necessary reassign, if needed, memory locations based on the re-ranking.
  • FIG. 1 shows an illustrative diagram in accordance with principles of the invention
  • FIG. 2 shows another illustrative diagram in accordance with principles of the invention
  • FIG. 3 shows yet another illustrative diagram in accordance with principles of the invention
  • FIG. 4 shows an illustrative flow chart in accordance with principles of the invention.
  • FIG. 5 shows an illustrative diagram in accordance with principles of the invention.
  • a centralized database identifier repository may include a plurality of database identifiers. Each database identifier identifies a database included within a plurality of databases. The repository may also include a plurality of data elements. Each data element may be associated with data included in one or more of the plurality of databases.
  • the repository may also include a linkage between each data element and one or more database identifiers.
  • Each of the one or more database identifiers may identify a linking database.
  • the linking database may include data associated with the data element.
  • the database may be included in the plurality of databases. An example may be a first data element that identifies a person named James Smith. Data pertaining to James Smith may be found in databases A, G and H.
  • the repository may include the data element James Smith linked to database identifiers that identify databases A, G and H.
  • the repository may be operable to receive a request from a user.
  • the request may include one or more data elements.
  • the repository may be operable to respond to the user.
  • the response may include the database identifiers associated with the received one or more data elements.
  • the repository may be operable to receive a request from a user.
  • the request may include one or more data elements.
  • the repository may be operable to determine one or more database identifiers associated with the request.
  • the repository may be operable to transmit a second, or subsequent, request to each of the databases identified by the one or more database identifiers.
  • the second, or subsequent, request may include the one or more data elements.
  • the repository may receive the data, associated with the one or more data elements, from each of the databases.
  • the repository may transmit the received data associated with the one or more data elements to the user.
  • the request may include user entitlement data.
  • the repository may determine whether a user identified by the user entitlement data is permitted to access the data from each of the databases prior to transmitting the received data to the user.
  • a user may be entitled to the knowledge of whether data in included in the database, however the user may not be entitled to view the data.
  • a user may be entitled read access to the databases, however the user may not be able to retrieve the data.
  • a user may be able to read and retrieve the data.
  • the request may include a reason for the request.
  • the repository may transmit the received data upon receipt of an acceptable reason for the request.
  • An acceptable reason for a request may be a reason selected from a predefined list of acceptable reasons.
  • An example of an acceptable reason may be performing a transaction associated with a person identified by a data element.
  • a database identifier may include a communication link with the associated database.
  • the database identifier may communicate between the centralized repository and the identified database in order to retrieve the data from the identified database.
  • each database identifier is an encrypted token.
  • Each token may be processed by a validation layer prior to communicating with an associated or underlying database. The validation may be based on the requestor's entitlements and/or the requestor's purpose for the data retrieval. Upon validation, the database identifier token may communicate with the underlying database to retrieve the requested data.
  • the system may combine two or more tables or databases upon determining that the contents of the two or more tables or databases contains more than a predetermined amount of overlapping data.
  • Embodiments may omit steps shown or described in connection with illustrative methods. Embodiments may include steps that are neither shown nor described in connection with illustrative methods.
  • Illustrative method steps may be combined.
  • an illustrative method may include steps shown in connection with another illustrative method.
  • Apparatus may omit features shown or described in connection with illustrative apparatus. Embodiments may include features that are neither shown nor described in connection with the illustrative apparatus. Features of illustrative apparatus may be combined. For example, an illustrative embodiment may include features shown in connection with another illustrative embodiment.
  • FIG. 1 shows a centralized database identifier repository.
  • the centralized database identifier repository may include a plurality of data elements, such as data element A, shown at 102 , data element B, shown at 104 , data element C, shown at 106 and data element D, shown at 108 .
  • Each data element may be found in one or more databases of a system.
  • the repository may store the data element and each database location in which the data element is found.
  • Such a repository may provide a centralized source to locate each data element.
  • Data element A shown at 102 , may be located in databases AG, GH and SD, identified by database identifiers DB AG, DB GH and DB SD.
  • Data element B shown at 104 , may be located in databases GH, AH and SW, identified by database identifiers, DB GH, DB AH and DB SW.
  • Data element C shown at 106 , may be located in databases AH and GH, identified by database identifiers DB AH and DB GH.
  • Data element D shown at 108 , may be located in databases AG, GH, AH and SW, identified by database identifiers DB AG, DB GH, DB AH and DB SW.
  • FIG. 2 shows a detailed view of data element A, shown in FIG. 1 .
  • Data element A shown at 202 , may be located in databases AG, GH and SD.
  • Database AG shown at 204 , shows data element A with additional data about data element A.
  • Database GH shown at 206 , shows data element A with additional data about data element A.
  • Database SD shown at 208 , shows data element A with additional data about data element A. It should be appreciated that the data about data element A included in database AG, GH and SD, may be the same data, similar data or different data.
  • FIG. 3 shows a detailed view of an exemplary data element.
  • Data element 302 may identify a fictional person named John Doe.
  • Data element 302 may be included in a centralized database identifier repository.
  • Data element 302 may be associated with databases AG, GH and SD.
  • Databases AG, GH and SD may each include data relating to data element 302 .
  • Database AG may include a record relating to Johnny Doe, as shown at 304 .
  • Record 304 may include data relating to John Doe.
  • Record 304 may include a name, street address, phone number and last updated time stamp. It should be appreciated that the name associated with record 304 (Johnny Doe) may be similar to data element 302 (John Doe). Even though the names may not be the same, the system may have identified record 304 and data element 302 to identify the same person.
  • Database GH may include a record relating to John Doe, as shown at 306 .
  • Record 306 may include data relating to John Doe.
  • Record 306 may include a name, street address, phone number and last updated timestamp.
  • Database SD may include a record relating to John Doe, as shown at 308 .
  • Record 308 may include data relating to John Doe.
  • Record 308 may include a name, street address, cell phone number, home phone number and last updated timestamp.
  • records 304 and 306 match, while the street on record 308 differs from records 304 and 306 . It should also be appreciated that records 304 and 306 include one phone number and record 308 includes two phone numbers. Even though records 304 , 306 and 308 are not identical, the system may determine that the records identify the same person. The determination may be made because the records include greater than a threshold percentage of identical data.
  • an artificial intelligence bot may create, update and/or maintain the centralized database identifier repository.
  • the artificial intelligence bot may crawl through multiple databases of a system in order to identify the same and similar records.
  • the artificial intelligence bot may also identify data records that are out of date, records that can be archived and/or deactivated and/or improved by consolidation.
  • FIG. 4 shows an illustrative flow chart.
  • Step 402 shows identifying duplicate records within one or more databases.
  • Step 404 shows identifying records that similar within one or more databases.
  • Step 406 shows identifying the utilization of each table within one or more databases.
  • Steps 408 and 410 may be based on steps 402 , 404 and 406 .
  • Step 408 shows recommending database synchronization. The recommending may be based on the identifying duplicate records, identifying similar records and identifying the utilization of each table within the database.
  • Step 412 may be based on step 406 .
  • Step 412 may include ranking tables based on utilization. Tables that are utilized more times per time period may be ranked higher than tables that are utilized fewer times per time period.
  • Step 414 may include determining and assigning memory location for tables based on usage frequency. Tables and/or databases that ranked higher may be assigned memory locations with a shorter response time than tables and/or databases that are ranked lower.
  • Step 416 shows displaying recommendations to operator.
  • the recommendations may be displayed to an operator.
  • the recommendations may include displaying two similar records to an operator to identify which record is more accurate.
  • Step 418 shows executing recommendations in response to operator confirmation.
  • the system may execute the recommendations upon operator confirmation.
  • the system may execute recommendations that have been determined to be accurate at greater than a predetermined confidence threshold independent operator confirmation.
  • FIG. 5 shows an exemplary artificial intelligence (“AI”) bot 502 placing database tables in various memory locations.
  • Table A included in database AG, shown at 504 , may be accessed at a rate of 10 ⁇ per minute. The access may be based on, or provided in response to, various requests including structure query language (“SQL”) queries.
  • SQL structure query language
  • Table CZ included in database GH, shown at 506 , may be accessed at a rate of 50 ⁇ per minute.
  • Table QW, shown at 508 may be accessed at a rate of 5 ⁇ per minute.
  • AI bot 502 may have determined the access rate for each of tables 504 , 506 and 508 .
  • AI bot 502 may determine that table CZ is accessed at the highest rate. Therefore, table CZ may be placed into the memory location with the shortest response time, as shown at 512 .
  • AI bot 502 may determine that table A is accessed at the second to highest rate. Therefore, table A may be placed into the memory location with the second to highest response time, as shown at 514 .
  • memory location 516 may be vacant, because it may have been reserved or AI bot 502 may be waiting to place an appropriate table within memory location 516 .
  • AI bot 502 may determine that table QW is accessed at the slowest rate. Therefore, table QW may be placed into the memory location with the longest response time, as shown at 518 .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A centralized database identifier repository may identify databases using a unique identifier, or key tag, for each database. Each identified database may include data relating to one or more specific data elements. The repository may include a variety of data elements. Each data element may be associated with one or more database keys. The repository may be a repository of reference pointers. The repository may facilitate data viewing and data retrieval. A requestor may search for a data element using the centralized repository. The repository may retrieve data relating to a specific data element, from all databases identified by unique identifiers, that include data relating to the data element. The databases' unique identifiers may be encrypted tokens.

Description

CROSS-REFERENCE TO RELATED APPLICATION
This application is a continuation of U.S. patent application Ser. No. 16/406,188 filed on May 8, 2019, now U.S. Pat. No. 11,113,258, and entitled “ARTIFICIALLY-INTELLIGENT, CONTINUOUSLY-UPDATING, CENTRALIZED-DATABASE-IDENTIFIER REPOSITORY SYSTEM” which is hereby incorporated by reference herein in its entirety.
FIELD OF TECHNOLOGY
This disclosure relates to system that utilize database identifiers.
BACKGROUND
Large entities may include multiple lines of business (“LOB”). Each LOB may host one or more databases. For a variety of reasons, communication between disparate LOB databases may be limited. Because little communication exists between the LOB databases, it is not uncommon for duplicative data to be found throughout multiple databases associated with a single entity.
Duplicative data, or multiple instances of the same data element, may be damaging to an entity for numerous reasons. Firstly, duplicative data wastes time, effort, energy and resources used to maintain the data in more than one location. Secondly, duplicative data may cause data inconsistencies, because the data may be updated in one location and not updated in another location. Thirdly, duplicative data slows the processing of a system because the system has to review, and sometimes, traverse multiple copies of the same data element.
Therefore, it would be desirable to create a system that identifies duplicative data among a plurality of databases. It would be further desirable for the system to consolidate multiple instances of the same data.
SUMMARY OF THE DISCLOSURE
An artificially-intelligent, continuously-updating, centralized database identifier repository system is provided. The system may include an artificial intelligence module.
The artificial intelligence module may be operable to review a plurality of databases. Each database included in the plurality of databases may include one or more tables. The review by the artificial intelligence module may be used to seek out and determine the existence of multiple particulars, redundancies and/or fact patterns within the plurality of databases.
One particular may include determining duplicate records within the plurality of databases. Another particular may include determining comparable records within the plurality of databases. It should be appreciated that comparable records may be similar to one another. An example of similar records may be two records that include approximately 75% of the same data and 25% different data. Another example of similar records may be two records that include approximately 90% of the same data and 10% different data.
Another particular may include determining a utilization metric for a table included within the plurality of databases. The utilization metric may identify how often the table is used.
Based on the review, the artificial intelligence module may identify one or more recommendations for database synchronization and/or database usage optimization. The recommendations may include removing duplicate entries. The recommendations may include consolidating tables. The recommendations may include consolidating databases.
The recommendations may include maintaining data references within certain tables to other tables. For example, a first data element from a first table in a first database may be duplicative of a second data element from a second table in a second database. The recommendation may include, upon receipt of confirmation of redundancy between the first data element and the second data element, deleting the second data element from the second table in the second database and replacing the second data element from the second table in the second database with a reference to the first data element in the first table in the first database. Maintaining the data reference may ensure that the second database retains access to the data. Maintaining the data reference, and only maintaining the data element in one database, may also ensure that the data remains consistent across multiple databases.
Recommendations for database usage optimization according to the embodiments may include archiving unused tables or databases, placing tables used at a high frequency in priority memory locations, deactivating legacy databases and any other database usage optimization. Tables and/or databases that are determined to be utilized less than a second threshold frequency may be deactivated, archived or deleted.
Based on the review, the artificial intelligence module may rank the tables included in the plurality of databases based on the utilization metric. More frequently used tables may receive a higher ranking and less frequently used tables may receive a lower ranking.
Based on the review, the artificial intelligence module may determine and assign memory locations for each table and/or database. The memory locations may include a plurality of memory locations with shorter than a threshold response time. The memory locations may also include a plurality of memory locations with greater than a threshold response time. Tables and/or databases with usage that is greater than a threshold frequency may be assigned to memory locations with shorter than the threshold response time. Tables and/or databases with usage that is lower than a threshold frequency may be assigned to memory locations with greater than a threshold response time.
The system may include a display module. The display module may be configured to display the recommendations to an operator. The display module may be configured to instruct the system to execute the recommendations upon receipt of operator confirmation.
The system may continuously re-review the plurality of databases in order to identify one or more recommendations after a predetermined time period. The system may continually or substantially continuously re-rank the tables included in the plurality of databases in order to redetermine, and as necessary reassign, if needed, memory locations based on the re-ranking.
BRIEF DESCRIPTION OF THE DRAWINGS
The objects and advantages of the invention will be apparent upon consideration of the following detailed description, taken in conjunction with the accompanying drawings, in which like reference characters refer to like parts throughout, and in which:
FIG. 1 shows an illustrative diagram in accordance with principles of the invention;
FIG. 2 shows another illustrative diagram in accordance with principles of the invention;
FIG. 3 shows yet another illustrative diagram in accordance with principles of the invention;
FIG. 4 shows an illustrative flow chart in accordance with principles of the invention; and
FIG. 5 shows an illustrative diagram in accordance with principles of the invention.
DETAILED DESCRIPTION
A centralized database identifier repository is provided. The repository may include a plurality of database identifiers. Each database identifier identifies a database included within a plurality of databases. The repository may also include a plurality of data elements. Each data element may be associated with data included in one or more of the plurality of databases.
The repository may also include a linkage between each data element and one or more database identifiers. Each of the one or more database identifiers may identify a linking database. The linking database may include data associated with the data element. The database may be included in the plurality of databases. An example may be a first data element that identifies a person named James Smith. Data pertaining to James Smith may be found in databases A, G and H. The repository may include the data element James Smith linked to database identifiers that identify databases A, G and H.
The repository may be operable to receive a request from a user. The request may include one or more data elements. The repository may be operable to respond to the user. The response may include the database identifiers associated with the received one or more data elements.
The repository may be operable to receive a request from a user. The request may include one or more data elements. The repository may be operable to determine one or more database identifiers associated with the request. The repository may be operable to transmit a second, or subsequent, request to each of the databases identified by the one or more database identifiers. The second, or subsequent, request may include the one or more data elements. The repository may receive the data, associated with the one or more data elements, from each of the databases. The repository may transmit the received data associated with the one or more data elements to the user.
In some embodiments, the request may include user entitlement data. The repository may determine whether a user identified by the user entitlement data is permitted to access the data from each of the databases prior to transmitting the received data to the user.
In some embodiments, based on a user's entitlement, there may be different levels of access to the data included in the databases. In one example, a user may be entitled to the knowledge of whether data in included in the database, however the user may not be entitled to view the data. In another example, a user may be entitled read access to the databases, however the user may not be able to retrieve the data. In another example, a user may be able to read and retrieve the data.
In some embodiments, the request may include a reason for the request. The repository may transmit the received data upon receipt of an acceptable reason for the request. An acceptable reason for a request may be a reason selected from a predefined list of acceptable reasons. An example of an acceptable reason may be performing a transaction associated with a person identified by a data element.
In some embodiments, a database identifier may include a communication link with the associated database. The database identifier may communicate between the centralized repository and the identified database in order to retrieve the data from the identified database.
In some embodiments, each database identifier is an encrypted token. Each token may be processed by a validation layer prior to communicating with an associated or underlying database. The validation may be based on the requestor's entitlements and/or the requestor's purpose for the data retrieval. Upon validation, the database identifier token may communicate with the underlying database to retrieve the requested data.
In some embodiments, the system may combine two or more tables or databases upon determining that the contents of the two or more tables or databases contains more than a predetermined amount of overlapping data.
Apparatus and methods described herein are illustrative. Apparatus and methods in accordance with this disclosure will now be described in connection with the figures, which form a part hereof. The figures show illustrative features of apparatus and method steps in accordance with the principles of this disclosure. It is to be understood that other embodiments may be utilized and that structural, functional and procedural modifications may be made without departing from the scope and spirit of the present disclosure.
The steps of methods may be performed in an order other than the order shown or described herein. Embodiments may omit steps shown or described in connection with illustrative methods. Embodiments may include steps that are neither shown nor described in connection with illustrative methods.
Illustrative method steps may be combined. For example, an illustrative method may include steps shown in connection with another illustrative method.
Apparatus may omit features shown or described in connection with illustrative apparatus. Embodiments may include features that are neither shown nor described in connection with the illustrative apparatus. Features of illustrative apparatus may be combined. For example, an illustrative embodiment may include features shown in connection with another illustrative embodiment.
FIG. 1 shows a centralized database identifier repository. The centralized database identifier repository may include a plurality of data elements, such as data element A, shown at 102, data element B, shown at 104, data element C, shown at 106 and data element D, shown at 108.
Each data element may be found in one or more databases of a system. In order to couple together the duplicate data elements found in each database, the repository may store the data element and each database location in which the data element is found. Such a repository may provide a centralized source to locate each data element.
Data element A, shown at 102, may be located in databases AG, GH and SD, identified by database identifiers DB AG, DB GH and DB SD. Data element B, shown at 104, may be located in databases GH, AH and SW, identified by database identifiers, DB GH, DB AH and DB SW. Data element C, shown at 106, may be located in databases AH and GH, identified by database identifiers DB AH and DB GH. Data element D, shown at 108, may be located in databases AG, GH, AH and SW, identified by database identifiers DB AG, DB GH, DB AH and DB SW.
FIG. 2 shows a detailed view of data element A, shown in FIG. 1 . Data element A, shown at 202, may be located in databases AG, GH and SD. Database AG, shown at 204, shows data element A with additional data about data element A. Database GH, shown at 206, shows data element A with additional data about data element A. Database SD, shown at 208, shows data element A with additional data about data element A. It should be appreciated that the data about data element A included in database AG, GH and SD, may be the same data, similar data or different data.
FIG. 3 shows a detailed view of an exemplary data element. Data element 302 may identify a fictional person named John Doe. Data element 302 may be included in a centralized database identifier repository. Data element 302 may be associated with databases AG, GH and SD. Databases AG, GH and SD may each include data relating to data element 302. Database AG may include a record relating to Johnny Doe, as shown at 304. Record 304 may include data relating to John Doe. Record 304 may include a name, street address, phone number and last updated time stamp. It should be appreciated that the name associated with record 304 (Johnny Doe) may be similar to data element 302 (John Doe). Even though the names may not be the same, the system may have identified record 304 and data element 302 to identify the same person.
Database GH, may include a record relating to John Doe, as shown at 306. Record 306 may include data relating to John Doe. Record 306 may include a name, street address, phone number and last updated timestamp.
Database SD, may include a record relating to John Doe, as shown at 308. Record 308 may include data relating to John Doe. Record 308 may include a name, street address, cell phone number, home phone number and last updated timestamp.
It should be appreciated that the street address on records 304 and 306 match, while the street on record 308 differs from records 304 and 306. It should also be appreciated that records 304 and 306 include one phone number and record 308 includes two phone numbers. Even though records 304, 306 and 308 are not identical, the system may determine that the records identify the same person. The determination may be made because the records include greater than a threshold percentage of identical data.
It should also be appreciated that an artificial intelligence bot, as shown in FIG. 5 , may create, update and/or maintain the centralized database identifier repository. The artificial intelligence bot may crawl through multiple databases of a system in order to identify the same and similar records. The artificial intelligence bot may also identify data records that are out of date, records that can be archived and/or deactivated and/or improved by consolidation.
FIG. 4 shows an illustrative flow chart. Step 402 shows identifying duplicate records within one or more databases. Step 404 shows identifying records that similar within one or more databases. Step 406 shows identifying the utilization of each table within one or more databases.
Steps 408 and 410 may be based on steps 402, 404 and 406. Step 408 shows recommending database synchronization. The recommending may be based on the identifying duplicate records, identifying similar records and identifying the utilization of each table within the database.
Step 412 may be based on step 406. Step 412 may include ranking tables based on utilization. Tables that are utilized more times per time period may be ranked higher than tables that are utilized fewer times per time period.
Step 414 may include determining and assigning memory location for tables based on usage frequency. Tables and/or databases that ranked higher may be assigned memory locations with a shorter response time than tables and/or databases that are ranked lower.
Step 416 shows displaying recommendations to operator. The recommendations may be displayed to an operator. At times, the recommendations may include displaying two similar records to an operator to identify which record is more accurate.
Step 418 shows executing recommendations in response to operator confirmation. In these embodiments, the system may execute the recommendations upon operator confirmation. In other embodiments, the system may execute recommendations that have been determined to be accurate at greater than a predetermined confidence threshold independent operator confirmation.
FIG. 5 shows an exemplary artificial intelligence (“AI”) bot 502 placing database tables in various memory locations. Table A, included in database AG, shown at 504, may be accessed at a rate of 10× per minute. The access may be based on, or provided in response to, various requests including structure query language (“SQL”) queries.
Table CZ, included in database GH, shown at 506, may be accessed at a rate of 50× per minute. Table QW, shown at 508, may be accessed at a rate of 5× per minute. AI bot 502 may have determined the access rate for each of tables 504, 506 and 508.
AI bot 502 may determine that table CZ is accessed at the highest rate. Therefore, table CZ may be placed into the memory location with the shortest response time, as shown at 512. AI bot 502 may determine that table A is accessed at the second to highest rate. Therefore, table A may be placed into the memory location with the second to highest response time, as shown at 514. It should be appreciated that memory location 516 may be vacant, because it may have been reserved or AI bot 502 may be waiting to place an appropriate table within memory location 516. AI bot 502 may determine that table QW is accessed at the slowest rate. Therefore, table QW may be placed into the memory location with the longest response time, as shown at 518.
Thus, an artificially-intelligent, continuously-updating centralized database identifier repository system is provided. Persons skilled in the art will appreciate that the present invention can be practiced by other than the described embodiments, which are presented for purposes of illustration rather than of limitation. The present invention is limited only by the claims that follow.

Claims (7)

The invention claimed is:
1. A method for data consolidation of an artificially-intelligent centralized key data repository, the method comprising:
reviewing a plurality of databases, each database included in the plurality of databases comprising one or more data elements;
based on the reviewing, determining:
duplicate data elements within the plurality of databases;
comparable data elements within the plurality of databases;
a utilization metric for each data element included within the plurality of databases;
ranking the data elements included in the plurality of databases based on the utilization metric, wherein more frequently used data elements receive higher ranking and less frequently used data elements receive lower ranking;
determining and assigning memory locations for each data element, the memory locations including a plurality of memory locations with shorter than a threshold response time and a plurality of locations with greater than the threshold response time, wherein data elements with greater than a threshold frequency are assigned to memory locations with shorter than the threshold response time and data elements with lower than the threshold frequency are assigned to memory locations with greater than the threshold response time;
identifying one or more recommendations for database synchronization and/or database usage optimization;
displaying the recommendations to operator; and
executing the recommendations upon receipt of operator confirmation.
2. The method of claim 1, further comprising:
re-reviewing the plurality of databases; and
identifying one or more recommendations after a predetermined time period.
3. The method of claim 1, further comprising:
re-ranking the data elements included in the plurality of databases; and
redetermining and reassigning memory locations based on the re-ranking.
4. The method of claim 1, further comprising:
deactivating data elements that are determined to be utilized less than a second threshold frequency.
5. The method of claim 1, further comprising archiving data elements that are determined to be utilized less than a second threshold frequency.
6. The method of claim 1, further comprising combining two or more data elements upon determining that the contents of the two or more data elements contains more than a predetermined amount of overlapping data.
7. The method of claim 1, further comprising combining two or more databases upon determining that the contents of the two or more databases contains more than a predetermined amount of overlapping data.
US17/404,047 2019-05-08 2021-08-17 Artificially-intelligent, continuously-updating, centralized-database-identifier repository system Active US11556515B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/404,047 US11556515B2 (en) 2019-05-08 2021-08-17 Artificially-intelligent, continuously-updating, centralized-database-identifier repository system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US16/406,188 US11113258B2 (en) 2019-05-08 2019-05-08 Artificially-intelligent, continuously-updating, centralized-database-identifier repository system
US17/404,047 US11556515B2 (en) 2019-05-08 2021-08-17 Artificially-intelligent, continuously-updating, centralized-database-identifier repository system

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US16/406,188 Continuation US11113258B2 (en) 2019-05-08 2019-05-08 Artificially-intelligent, continuously-updating, centralized-database-identifier repository system

Publications (2)

Publication Number Publication Date
US20210374117A1 US20210374117A1 (en) 2021-12-02
US11556515B2 true US11556515B2 (en) 2023-01-17

Family

ID=73047179

Family Applications (2)

Application Number Title Priority Date Filing Date
US16/406,188 Active 2040-04-23 US11113258B2 (en) 2019-05-08 2019-05-08 Artificially-intelligent, continuously-updating, centralized-database-identifier repository system
US17/404,047 Active US11556515B2 (en) 2019-05-08 2021-08-17 Artificially-intelligent, continuously-updating, centralized-database-identifier repository system

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US16/406,188 Active 2040-04-23 US11113258B2 (en) 2019-05-08 2019-05-08 Artificially-intelligent, continuously-updating, centralized-database-identifier repository system

Country Status (1)

Country Link
US (2) US11113258B2 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11726952B2 (en) * 2019-09-13 2023-08-15 Oracle International Corporation Optimization of resources providing public cloud services based on adjustable inactivity monitor and instance archiver

Citations (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6381601B1 (en) * 1998-12-22 2002-04-30 Hitachi, Ltd. Grouping and duplicate removal method in a database
US6523041B1 (en) 1997-07-29 2003-02-18 Acxiom Corporation Data linking system and method using tokens
US6839843B1 (en) 1998-12-23 2005-01-04 International Business Machines Corporation System for electronic repository of data enforcing access control on data retrieval
US7096354B2 (en) 2000-08-04 2006-08-22 First Data Corporation Central key authority database in an ABDS system
US7676489B2 (en) 2005-12-06 2010-03-09 Sap Ag Providing natural-language interface to repository
US7707549B2 (en) 2006-03-15 2010-04-27 Microsoft Corporation Synchronicity in software development
US7730065B2 (en) 2007-04-27 2010-06-01 Microsoft Corporation File formats for external specification of object-relational mapping
US20110066601A1 (en) * 2009-09-11 2011-03-17 Lothar Rieger Information lifecycle cross-system reconciliation
US8135995B2 (en) 2007-10-19 2012-03-13 Oracle International Corporation Diagnostic data repository
US8185534B1 (en) 2009-02-05 2012-05-22 Google Inc. Consolidated record generation with stable identifiers for data integration systems
US8352746B2 (en) 2002-12-31 2013-01-08 International Business Machines Corporation Authorized anonymous authentication
US8392439B2 (en) 2003-11-05 2013-03-05 Hewlett-Packard Development Company, L.P. Single repository manifestation of a multi-repository system
US20140114818A1 (en) 2012-06-18 2014-04-24 ServiceSource International, Inc. Provenance tracking and quality analysis for revenue asset management data
US9129000B2 (en) 2010-04-30 2015-09-08 International Business Machines Corporation Method and system for centralized control of database applications
US9135263B2 (en) 2013-01-18 2015-09-15 Sonatype, Inc. Method and system that routes requests for electronic files
US9177175B2 (en) 2000-02-18 2015-11-03 Permabit Technology Corporation Data repository and method for promoting network storage of data
US9202086B1 (en) 2012-03-30 2015-12-01 Protegrity Corporation Tokenization in a centralized tokenization environment
US20170083415A1 (en) 2015-09-21 2017-03-23 TigerIT Americas, LLC Fault-tolerant methods, systems and architectures for data storage, retrieval and distribution
US20170116298A1 (en) * 2012-09-28 2017-04-27 Oracle International Corporation Techniques for keeping a copy of a pluggable database up to date with its source pluggable database in read-write mode
US9722777B2 (en) 2013-08-01 2017-08-01 Visa International Service Association Homomorphic database operations apparatuses, methods and systems
US20170286518A1 (en) 2010-12-23 2017-10-05 Eliot Horowitz Systems and methods for managing distributed database deployments
US20200065297A1 (en) 2018-08-24 2020-02-27 Oracle International Corporation Providing consistent database recovery after database failure for distributed databases with non-durable storage leveraging background synchronization point
US10678436B1 (en) 2018-05-29 2020-06-09 Pure Storage, Inc. Using a PID controller to opportunistically compress more data during garbage collection

Patent Citations (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6523041B1 (en) 1997-07-29 2003-02-18 Acxiom Corporation Data linking system and method using tokens
US6381601B1 (en) * 1998-12-22 2002-04-30 Hitachi, Ltd. Grouping and duplicate removal method in a database
US6839843B1 (en) 1998-12-23 2005-01-04 International Business Machines Corporation System for electronic repository of data enforcing access control on data retrieval
US9177175B2 (en) 2000-02-18 2015-11-03 Permabit Technology Corporation Data repository and method for promoting network storage of data
US7096354B2 (en) 2000-08-04 2006-08-22 First Data Corporation Central key authority database in an ABDS system
US8352746B2 (en) 2002-12-31 2013-01-08 International Business Machines Corporation Authorized anonymous authentication
US8392439B2 (en) 2003-11-05 2013-03-05 Hewlett-Packard Development Company, L.P. Single repository manifestation of a multi-repository system
US7676489B2 (en) 2005-12-06 2010-03-09 Sap Ag Providing natural-language interface to repository
US7707549B2 (en) 2006-03-15 2010-04-27 Microsoft Corporation Synchronicity in software development
US7730065B2 (en) 2007-04-27 2010-06-01 Microsoft Corporation File formats for external specification of object-relational mapping
US8135995B2 (en) 2007-10-19 2012-03-13 Oracle International Corporation Diagnostic data repository
US8185534B1 (en) 2009-02-05 2012-05-22 Google Inc. Consolidated record generation with stable identifiers for data integration systems
US20110066601A1 (en) * 2009-09-11 2011-03-17 Lothar Rieger Information lifecycle cross-system reconciliation
US9129000B2 (en) 2010-04-30 2015-09-08 International Business Machines Corporation Method and system for centralized control of database applications
US20170286518A1 (en) 2010-12-23 2017-10-05 Eliot Horowitz Systems and methods for managing distributed database deployments
US9202086B1 (en) 2012-03-30 2015-12-01 Protegrity Corporation Tokenization in a centralized tokenization environment
US20140114818A1 (en) 2012-06-18 2014-04-24 ServiceSource International, Inc. Provenance tracking and quality analysis for revenue asset management data
US20170116298A1 (en) * 2012-09-28 2017-04-27 Oracle International Corporation Techniques for keeping a copy of a pluggable database up to date with its source pluggable database in read-write mode
US9135263B2 (en) 2013-01-18 2015-09-15 Sonatype, Inc. Method and system that routes requests for electronic files
US9722777B2 (en) 2013-08-01 2017-08-01 Visa International Service Association Homomorphic database operations apparatuses, methods and systems
US20170083415A1 (en) 2015-09-21 2017-03-23 TigerIT Americas, LLC Fault-tolerant methods, systems and architectures for data storage, retrieval and distribution
US10678436B1 (en) 2018-05-29 2020-06-09 Pure Storage, Inc. Using a PID controller to opportunistically compress more data during garbage collection
US20200065297A1 (en) 2018-08-24 2020-02-27 Oracle International Corporation Providing consistent database recovery after database failure for distributed databases with non-durable storage leveraging background synchronization point

Also Published As

Publication number Publication date
US20210374117A1 (en) 2021-12-02
US20200356542A1 (en) 2020-11-12
US11113258B2 (en) 2021-09-07

Similar Documents

Publication Publication Date Title
US10838935B2 (en) Automating the logging of table changes in a database
US8555018B1 (en) Techniques for storing data
US8375010B2 (en) Method of integrating applications with a network service application by creating new records in a relationship field
US7617198B2 (en) Generation of XML search profiles
CN100440215C (en) Method and system for managing interdependent data objects
US20160078114A1 (en) Virtual repository management
US20040122849A1 (en) Assignment of documents to a user domain
US20060004686A1 (en) Real-time reporting, such as real-time reporting of extrinsic attribute values
KR101475335B1 (en) Enhancing an inquiry for a search of a database
US20070124303A1 (en) System and method for managing access to data in a database
JP2003522344A (en) Database synchronization / organization system and method
US11768883B2 (en) System and method for in-place record content management
US20150006485A1 (en) High Scalability Data Management Techniques for Representing, Editing, and Accessing Data
US10114874B2 (en) Source query caching as fault prevention for federated queries
US6957234B1 (en) System and method for retrieving data from a database using a data management system
US6775669B2 (en) Retrieval processing method and apparatus and memory medium storing program for same
US11556515B2 (en) Artificially-intelligent, continuously-updating, centralized-database-identifier repository system
US6976030B2 (en) System and method for synchronizing distributed stored documents
US20050160078A1 (en) Method and apparatus for entity removal from a content management solution implementing time-based flagging for certainty in a relational database environment
US11914612B2 (en) Selective synchronization of linked records
CN105574192A (en) Computer document retrieval method
JPH10240760A (en) Method for managing related data
US8510269B2 (en) Uninterrupted database index reorganization/movement
CN111563112A (en) Data search and display system based on cross-border trade big data
CN115718571B (en) Data management method and device based on multidimensional features

Legal Events

Date Code Title Description
AS Assignment

Owner name: BANK OF AMERICA CORPORATION, NORTH CAROLINA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CARROLL, MATTHEW E.;KURIAN, MANU;RUSSELL, AARON E.;SIGNING DATES FROM 20190429 TO 20190507;REEL/FRAME:057199/0208

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE