AU2021106743A4 - Systems and methods for determining congruency in data sets - Google Patents

Systems and methods for determining congruency in data sets Download PDF

Info

Publication number
AU2021106743A4
AU2021106743A4 AU2021106743A AU2021106743A AU2021106743A4 AU 2021106743 A4 AU2021106743 A4 AU 2021106743A4 AU 2021106743 A AU2021106743 A AU 2021106743A AU 2021106743 A AU2021106743 A AU 2021106743A AU 2021106743 A4 AU2021106743 A4 AU 2021106743A4
Authority
AU
Australia
Prior art keywords
data
data set
database
hash
congruency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
AU2021106743A
Inventor
Chitharanjan Billa
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to AU2021106743A priority Critical patent/AU2021106743A4/en
Application granted granted Critical
Publication of AU2021106743A4 publication Critical patent/AU2021106743A4/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A stored value card validation system Abstract There is provided a system for determining congruency between two data sets. The system includes a first database for storing a first data set of a first format, the first data set having a plurality of first data cells. The system further includes a second database for storing a second data set of a second format, the second data set having a plurality of second data cells. The second data set is based on the first data set in that the second data set is a migrated version of the first data set such that the first database has been migrated to the second database. The system further includes a hash value module for: receiving the first data set and the second data set, creating a plurality of first hash values corresponding to each of the plurality of first data cells, and creating a plurality of second hash values corresponding to each of the plurality of second data cells. The system also includes a comparison module for comparing the first hash values and the corresponding second hash values and determining a congruency result based on the comparison in the form of an indication of data congruency between the first data set and the second data set. Figure 1 1/6 100-, 130 Algoiitms t*Extat Tabes and ClcuatOLAKE3 Hashes 110 #Z//-120 BI.AKE3 Hash Tbv-i SLAKE3 Hash Table-I BLAKE3 Hash Tabls-2 SLA(E3 HaOhTab*4 S LAJE3 HaSh Tle- BAEahTab*s-4 IZBLAJE3H85hable-n BLAKE 3Hash aben 114 11 6 126 124 1' Result ReoulAlgoWumjslocomnpare Hashetrou~ SWourctoTaget 140/ Figure 1

Description

1/6
100-,
130
Algoiitms t*Extat Tabes andClcuatOLAKE3 Hashes
110 #Z//-120 BI.AKE3 Hash Tbv-i SLAKE3 Hash Table-I
BLAKE3 Hash Tabls-2 SLA(E3 HaOhTab*4
S LAJE3 HaSh Tle- BAEahTab*s-4
IZBLAJE3H85hable-n BLAKE 3Hash aben
114 11 6 126 124 1'
Result ReoulAlgoWumjslocomnpare Hashetrou~ SWourctoTaget
140/
Figure 1
AUSTRALIA
Patents Act 1990
Complete Patent Specification
Title: Systems and methods for determining congruency in data sets
Applicant: Chitharanjan Billa
Inventor: Chitharanjan Billa
Agent: © COTTERS Patent & Trade Mark Attorneys
The following is a full description of the invention which sets forth the best method known to the applicant of performing it.
Systems and methods for determining congruency in data sets
Technical Field
[0001] The present disclosure relates to systems and methods for determining congruency in data sets and, more specifically, to systems, methods, and apparatuses, including computer program products, for determining congruency in one or more identical or disparate data sets within an enterprise or across the enterprises. The present disclosure has applications to accuracy and quality checking and verification of data migrations.
[0002] While some embodiments will be described herein with particular reference to that application, it will be appreciated that the invention is not limited to such a field of use, and is applicable in broader contexts.
Background
[0003] Any discussion of the background art throughout the specification should in no way be considered as an admission that such art is widely known or forms part of common general knowledge in the field.
[0004] Information technology (IT) systems are commonplace across almost all industries as they provide a convenient way of, amongst other things, keeping records in an ordered and efficient manner. Systems and databases associated with those systems store data in a specific format, with certain systems and associated databases utilising their own unique format and data types.
[0005] In many cases, the reasoning for a proprietor of certain systems and databases to utilise their own unique format and data types is to discourage a customer from switching away from using their systems, due to the difficulty in migrating data from a source database of an incumbent system to a target database of a new system. This difficulty arises from the data format and types of one database not being compatible with the data format and types of another database. As such, the process of migration between two completely foreign databases can be very labour intensive and time consuming. There are two major ways that data migrations are actioned:
1) A manual process of migrating the data that involves reviewing the source database and manually entering that data into the target database.
2) An automated process in the form of an algorithm that will migrate the data from the source database to the target database.
[0006] For 1) this manual process is time-consuming and is prone to human error. For 2) this process may be relatively quick, but it is prone to errors and those errors must be manually checked, the latter of which also being prone to human error. Therefore, there is always a time-consuming element for attempting to ensure accuracy of a data migration and there is still a chance that human error will cause errors in the migration process.
[0007] Other common challenges in data migration include:
e Date field formats could get translated in different formats resulting in incorrect values being entered into a new database. Such incorrect values would have serious implications in, for example, medical records resulting in non compliance with regulatory standards.
e Depending on the database source engine, NULL values could either transform into a white space or default values (in some cases its set to 0). As a NULL value is not equal to zero there would be data mismatch between source and target systems.
[0008] As noted above, in most data migration projects the source and target database systems could be complete disparate (for example, source could be Oracle@ database and target database could be MySQL@ with different data types) and it is exceedingly difficult to determine congruency in the data sets. Further, in relation to known data congruency processes, data quality is mostly determined by matching row counts, matching table columns but seldom on the data values. It is difficult to compare, for example binary large objects (BLOBs) values or character large objects (CLOBs) values between source and target systems.
[0009] As such, there are a number of deficiencies in known data migration processes that result in the target database not accurately reflecting the source database, and there being no accurate way, other than manual review, to verify the congruency of the databases where those databases are disparate. Further, when data sets are in size of hundreds of terabytes or petabytes it is a daunting and risky task for database engineers to certify the migrated data set is identical to the source.
Summary
[0010] It is an object of the present invention to overcome or ameliorate at least one of the disadvantages of the prior art, or to provide a useful alternative.
[0011] In accordance with a first aspect of the present invention there is provided a method for determining congruency between two data sets, the method including the steps of:
receiving, from a first database, a first data set of a first format , the first data set having a plurality of first data cells; creating a plurality of first hash values corresponding to each of the plurality of first data cells; receiving, from a second database, a second data set of a second format, the second data set having a plurality of second data cells, the second data set being based on the first data set; creating a plurality of second hash values corresponding to each of the plurality of second data cells; comparing the first hash values and the second has values; and determining a congruency result based on the comparison.
[0012] In an embodiment, the second database is migrated from the first database such that the second data set is a migrated version of the first data set.
[0013] In an embodiment, the first data set is extracted to create at least one first table containing the plurality of first data cells, and the second data set is extracted to create at least one second table containing the plurality of second data cells.
[0014] In an embodiment, the first and second data sets include one or more of the following data types: key; string; number; date.
[0015] In accordance with a second aspect of the present invention there is provided a system for determining congruency between two data sets:
a first database for storing a first data set of a first format, the first data set having a plurality of first data cells; a second database for storing a second data set of a second format, the second data set having a plurality of second data cells, the second data set being based on the first data set; a hash value module for: receiving the first data set and second data set; creating a plurality of first hash values corresponding to each of the plurality of first data cells; and creating a plurality of second hash values corresponding to each of the plurality of second data cells; and a comparison module for comparing the first hash values and the second hash values and determining a congruency result based on the comparison.
[0016] In an embodiment, the second database is migrated from the first database such that the second data set is a migrated version of the first data set.
[0017] In an embodiment, the first data set is extracted to create at least one first table containing the plurality of first data cells, and the second data set is extracted to create at least one second table containing the plurality of second data cells.
[0018] In an embodiment, the first and second data sets include one or more of the following data types: key; string; number; date.
[0019] Other aspects of the present disclosure are also provided.
[0020] Reference throughout this specification to "one embodiment", "some embodiments" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases "in one embodiment", "in some embodiments" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment, but may be in some appropriate cases. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner, as would be apparent to one of ordinary skill in the art from this disclosure, in one or more embodiments.
[0021] As used herein, unless otherwise specified the use of the ordinal adjectives "first", "second", "third", etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.
[0022] In the claims below and the description herein, any one of the terms comprising, comprised of or which comprises is an open term that means including at least the elements/features that follow, but not excluding others. Thus, the term comprising, when used in the claims, should not be interpreted as being limitative to the means or elements or steps listed thereafter. For example, the scope of the expression a device comprising A and B should not be limited to devices consisting only of elements A and B. Any one of the terms including or which includes or that includes as used herein is also an open term that also means including at least the elements/features that follow the term, but not excluding others. Thus, including is synonymous with and means comprising.
Brief Description of the Drawings
[0023] One or more embodiments of the present disclosure will now be described by way of specific example(s) with reference to the accompanying drawings, in which:
Figure 1 is a block diagram of a system for determining congruency between two data sets according to an embodiment of the invention;
Figure 2A is a block diagram of a database of the system of Figure 1, showing data extracted from the database and a hash registry based on that extracted data;
Figure 2B is a block diagram of the databases of the system of Figure 1, showing the associated hash values (in register form) along with a hash value summary table;
Figure 3A is an embodiment of the hash value summary table of Figure 2B including one set of hash values;
Figure 3B is an embodiment of the hash value summary table of Figure 2B including two sets of hash values and a comparison status of those two sets of hash values; and
Figure 4 is a block diagram of a computing system with which various embodiments of the present disclosure can be implemented/configurable to perform various features of the present disclosure.
Detailed Description
[0024] Where applicable, steps or features in the accompanying drawings that have the same reference numerals are to be considered to have the same function(s) or operation(s), unless the contrary intention is expressed or implied.
[0025] Referring to Figures 1, 2A and 2B, there is illustrated a system 100 for determining congruency between two data sets. System 100 includes a first database 110 in the form of a source database management system (DBMS) for storing a first data set 112 of a first format, the first data set having a plurality of first data cells. System
100 further includes a second database 120 in the form of a target database management system (DBMS) for storing a second data set 122 of a second format, the second data set having a plurality of second data cells. Second data set 122 is based on first data set 112 in that first database 110 has been migrated to second database 120 such that second data set 112 is a migrated version of first data set 112.
[0026] System 100 further includes a hash value module 130 for: receiving first data set 112 and second data set 122, creating a plurality of first hash values 132 corresponding to each of the plurality of first data cells, and creating a plurality of second hash values 134 corresponding to each of the plurality of second data cells. System 100 also includes a comparison module 140 for comparing first hash values 132 and the corresponding second hash values 134 and determining a congruency result based on the comparison for providing an indication of data congruency between first data set 112 and second data set 122.
[0027] First data set 112 and second data set 122 contain data of the following data types:
" Key;
" String;
" Number; and
" Date.
[0028] Further, system 100 can be used other data types such as BLOBs and CLOBs, amongst others.
[0029] It will be appreciated that, in different embodiments, first data set 112 and second data set 114 contain data of: one of the above types only; two of the above types; or three of the above types. In other words, system 100 is compatible to check data of any and all of the types above, which essentially encompasses all types of database as they will use one or more of the above basic data types.
[0030] System 100 initiates a process to determine data congruency after the first database 110 has been migrated to second database 120. As shown in Figures 1, 2A and 2B, first data set 112 is extracted from first database 110 by way of a data extraction algorithm to create at least one, and in this embodiment a plurality of first tables 114 containing the plurality of first data cells. Similarly, second data set 122 is extracted from second database 120 by way of the data extraction algorithm to create at least one, and in this embodiment a plurality of second tables 124 containing the plurality of second data cells. The method of extraction will depend on the actual cell value type (for example, number or string) in the database. Each of first tables 114 and second tables 124 are extracted to a flat file (for example, a comma separated file). In embodiments where first database 110 is of a different type to second database 120, the extraction algorithm used may also differ.
[0031] Once first tables 114 and second tables 124 are created, hash value module 130 applies a hash function to the first tables 114 and second tables 124. The application of the hash function to the first tables 114 in turn produces a first hash value table 116 and the application of the hash function to the second tables 124 in turn produces a second hash value table 126. In present embodiments, the BLAKE3 hash function is utilised given its compatibility and speed, as it has been shown to be five times faster than BLAKE2, fifteen times faster than SHA3-256 and almost eight times faster than MD5. However, it will be appreciated that other functions, such as BLAKE2 may be used that produces hash values.
[0032] Comparison module 140 then applies a comparison algorithm to first hash value table 116 and second hash value table 126. Referring to Figures 2B, 3A and 3B in particular, the process is such that the hash values of first hash value table 116 (denoted in Figure 2B as data entity references El, E2, E3, . . En within "Hash Register from Source DB") are calculated prior to migration of first database 110 to second database 120, with the hash values of first hash value table 116 stored in SOURCEHASH column 118 in a hash summary table 142 either within first database 110 or at a known predefined location outside of first database 110. As seen in Figures 3A and 3B, hash summary table 142 also includes a ENAME column 144 that includes the data entity references of Figure 2B. Following the migration, the hash values of second hash value table 126 (denoted in Figure 2B as data entity references El, E2, E3, . . En within "Hash Register from Target DB") are calculated and hash summary table 142 is updated with the values of second hash value table 126 in TARGETHASH column 128. Finally, using a "for" loop each hash value of SOURCEHASH column 118 is matched with a corresponding value of TARGET_HASH column 128 and: if they match, a STATUS column 146 is updated as an "OK" status indicating data congruency of that piece of migrated data; if they do not match, STATUS column 146 is updated as a "FAIL" status indicating data discrepancies of that piece of migrated data.
[0033] The comparison produces the congruency result which will take one or more of the following forms:
e Hash summary table 142, including STATUS column 146,, is outputted showing each entry as either "OK" or "FAIL" status indicating the migration status of each entity in hash summary table 142;
e An overall indication of data congruency in a percentage form, that is, the compared datasets are certified as congruent (100% matched) or deemed not congruent (less than 100% matched); and
e A return of any non-congruent data or data anomalies of each of the datasets, with reference to first database 110, thereby providing a degree of isomorphism in first data set 112 and second data set 122.
[0034] Figure 4 provides a block diagram of a computer processing system 400 configurable to perform various functions described herein, for example the functions of system 100. System 400 is a general purpose computer processing system. It will be appreciated that Figure 4 does not illustrate all functional or physical components of a computer processing system. For example, no power supply or power supply interface has been depicted. However, system 400 will either carry a power supply or be configured for connection to a power supply (or both). It will also be appreciated that the particular type of computer processing system will determine the appropriate hardware and architecture, and in some embodiments alternative computer processing systems suitable for implementing features of the present disclosure will have additional, alternative, or fewer components than those depicted.
[0035] Computer processing system 400 includes at least one processing unit 402. In some embodiments, processing unit 402 is a single computer processing device (for example, a central processing unit, graphics processing unit, or other computational device). In other embodiments, processing unit 402 includes a plurality of computer processing devices. In some embodiments, where system 400 is described as performing an operation or function, all processing required to perform that operation or function will be performed by processing unit 402. In other embodiments, processing required to perform that operation or function is also performed by remote processing devices accessible to and useable by (either in a shared or dedicated manner) system 400.
[0036] Through a communications bus 404, processing unit 402 is in data communication with a one or more machine readable storage (memory) devices which store instructions and/or data for controlling operation of system 400. In various embodiments, system 400 includes one or more of: a system memory 406 (for example, resident set-size memory), volatile memory 408 (for example, random access memory), and non-volatile or non-transitory memory 410 (for example, one or more hard disk or solid-state drives). Such memory devices may also be referred to as computer readable storage media.
[0037] System 400 also includes one or more interfaces, indicated generally by reference 412, via which system 400 interfaces with various devices and/or networks. Generally speaking, in various embodiments, other devices are integral with system 400, or are separate. Where a device is separate from system 400, connection between the device and system 400, in various embodiments, is via wired or wireless hardware and communication protocols, and are a direct or an indirect (for example, networked) connection.
[0038] Wired connection with other devices/networks is facilitated by any appropriate standard or proprietary hardware and connectivity protocols. For example, in various embodiments, system 400 is be configured for wired connection with other devices/communications networks by one or more of: USB; FireWire; Ethernet; HDMI; and other wired connection interfaces.
[0039] Wireless connection with other devices/networks is similarly facilitated by any appropriate standard or proprietary hardware and communications protocols. For example, in various embodiments, system 400 is configured for wireless connection with other devices/communications networks using one or more of: infrared; Bluetooth; Wi-Fi; near field communications (NFC); Global System for Mobile Communications (GSM); Enhanced Data GSM Environment (EDGE); long term evolution (LTE); and other wireless connection protocols.
[0040] Generally speaking, and depending on the particular system in question, devices to which system 400 connects (whether by wired or wireless means) include one or more input devices to allow data to be input into/received by system 400 for processing by processing unit 402, and one or more output device to allow data to be output by system 400. A number of example devices are described below. However, it will be appreciated that, in various embodiments, not all computer processing systems will include all mentioned devices, and that additional and alternative devices to those mentioned are used.
[0041] Referring to reference 414, in one embodiment, system 400 includes or connects to one or more input devices by which information/data is input into (received by) system 400. Such input devices include keyboards, mice, trackpads, microphones, accelerometers, proximity sensors, GPS devices and the like. System 400, in various embodiments, further includes or connects to one or more output devices controlled by system 400 to output information. Such output devices include devices such as a cathode ray tube (CRT) displays, liquid-crystal displays (LCDs), light-emitting diode (LED) displays, plasma displays, touch screen displays, speakers, vibration modules, LEDs/other lights, amongst others. In embodiments, system 400 includes or connects to devices which are able to act as both input and output devices, for example memory devices (hard drives, solid state drives, disk drives, compact flash cards, SD cards and the like) which system 400 can read data from and/or write data to, and touch screen displays which can both display (output) data and receive touch signals (input).
[0042] System 400 also includes one or more communications interfaces 416 for communication with a network. Via the communications interface(s) 416, system 400 can communicate data to and receive data from networked devices, which in some embodiments are themselves other computer processing systems.
[0043] System 400 stores or has access to computer applications (also referred to as software, applications or programs), such as applications that provide the functionality of hash value module 130 and comparison module 140. These are also described as computer readable instructions and data which, when executed by the processing unit 402, configure system 400 to receive, process, and output data. Instructions and data are able to be stored on non-transient machine readable medium accessible to system 400. For example, in an embodiment, instructions and data are stored on non transient memory 410. Instructions and data are able to be transmitted to/received by system 400 via a data signal in a transmission channel enabled (for example) by a wired or wireless network connection over interface such as 412.
[0044] Applications accessible to system 400 typically includes an operating system application such as Windows, macOS, iOS, Android, Unix, Linux, or other operating system.
[0045] In embodiments, applications that provide the functionality of hash value module 130 and comparison module 140 are dedicated applications that communicate with a server using an application programming interface (API). Alternatively, in other embodiments, the applications are a web browser (such as Chrome, Safari, Internet Explorer, Firefox, or an alternative web browser) which communicates with a web server using http/https protocols.
[0046] Furthermore, while a single computer processing system 400 has been depicted, in other embodiments, system 400 consists of multiple subsystems (for example, one or more web servers and/or one or more application servers).
Advantages of Detailed Embodiments
[0047] It will be appreciated that the embodiments of system 100 described herein are advantageous over known systems as provides efficient data congruency checking over known system. More specifically, system 100 achieves the following advantages:
" Speed up the data congruency determinacy process.
" Compatible with any type of database such that the congruency of databases of different types can be determined.
e Finds the degree of isomorphism in the data sets helping to determine the quality of data migrations.
e Determining congruency of data sets after migration provides compliance with regulatory standards such Health Insurance Portability and Accountability Act (HIPAA) and/or Sarbanes-Oxley (SOX) standards. Noncompliance could result in millions of dollars of fines and penalty to the organizations maintaining/migrating the data sets.
e Unlike some known databases and hash functions (such as Orcale and the associated MD5 hash function) the resultant hash in the present system is not dependent on the length of the concatenated string.
e Producing consistent hash values for the tables, irrespective of database type.
[0048] As such, system 100 provides an efficient and accurate mechanism for checking and confirming congruency between dataset particularly data that has been migrated from a source database to a target database. Thus, integrity of data is greatly improved through the use of the present systems and methods.
Conclusions and Interpretation
[0049] Throughout this specification, where used, the terms "element" and "component" are intended to mean either a single unitary component or a collection of components that combine to perform a specific function or purpose.
[0050] It should be appreciated that in the above description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, Figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this invention.
[0051] Furthermore, while some embodiments described herein include some but not other features included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention, and form different embodiments, as would be understood by those skilled in the art. For example, in the following claims, any of the claimed embodiments can be used in any combination.
[0052] Similarly, it is to be noticed that the term coupled, when used in the claims, should not be interpreted as being limited to direct connections only. The terms "coupled" and "connected," along with their derivatives, may be used. It should be understood that these terms are not intended as synonyms for each other. Thus, the scope of the expression a device A coupled to a device B should not be limited to devices or systems wherein an output of device A is directly connected to an input of device B. It means that there exists a path between an output of A and an input of B which may be a path including other devices or means. "Coupled" may mean that two or more elements are either in direct physical, electrical or optical contact, or that two or more elements are not in direct contact with each other but yet still co-operate or interact with each other.
[0053] Unless specifically stated otherwise, as apparent from the following discussions, it is appreciated that throughout the specification discussions utilizing terms such as "processing," "computing," "calculating," "determining", analysing" or the like, refer to the action and/or processes of a computer or computing system, or similar electronic computing device, that manipulate and/or transform data represented as physical, such as electronic, quantities into other data similarly represented as physical quantities.
[0054] In a similar manner, the term "processor" may refer to any device or portion of a device that processes electronic data, for example, from registers and/or memory to transform that electronic data into other electronic data that, for example, may be stored in registers and/or memory. A "computer" or a "computing machine" or a "computing platform" may include one or more processors.
[0055] Some methodologies or portions of methodologies described herein are, in one embodiment, performable by one or more processors that accept computer-readable (also called machine-readable) code containing a set of instructions that when executed by one or more of the processors carry out at least one of the methods described herein. A memory subsystem of a processing system includes a computer-readable carrier medium that carries computer-readable code (for example, software) including a set of instructions to cause performing, when executed by one or more processors, one of more of the methods described herein. Note that when the method includes several elements, for example, several steps, no ordering of such elements is implied, unless specifically stated. The software may reside in the storage medium, or may also reside, completely or at least partially, within the RAM and/or within the processor during execution thereof by the computer system. Thus, the memory and the processor also constitute computer readable carrier medium carrying computer-readable code.
[0056] Furthermore, a computer-readable carrier medium may form, or be included in a computer program product.
[0057] In alternative embodiments, unless otherwise specified, the one or more processors operate as a standalone device or may be connected, for example, networked to other processor(s), in a networked deployment, the one or more processors may operate in the capacity of a server or a user machine in server-user network environment, or as a peer machine in a peer-to-peer or distributed network environment. The one or more processors may form a personal computer (PC), a tablet PC, a set-top box (STB), a Personal Digital Assistant (PDA), a cellular telephone, a web appliance, a network router, switch or bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine.
[0058] Note that while only a single processor and a single memory that carries the computer-readable code may be shown herein, those in the art will understand that many of the components described above are included, but not explicitly shown or described in order not to obscure the inventive aspect. For example, while only a single machine is illustrated, the term "machine" shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein, unless otherwise specified.
[0059] Thus, one embodiment of each of the methods described herein is in the form of a computer-readable carrier medium carrying a set of instructions, for example, a computer program that is for execution on one or more processors, for example, one or more processors that are part of web server arrangement. Thus, as will be appreciated by those skilled in the art, embodiments of the present invention may be embodied as a method, an apparatus such as a special purpose apparatus, an apparatus such as a data processing system, or a computer-readable carrier medium, for example, a computer program product. The computer-readable carrier medium carries computer readable code including a set of instructions that when executed on one or more processors cause the processor or processors to implement a method. Accordingly, aspects of the present invention may take the form of a method, an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of carrier medium (for example, a computer program product on a computer-readable storage medium) carrying computer readable program code embodied in the medium.
[0060] The software may further be transmitted or received over a network via a network interface device. While the carrier medium may be shown in an embodiment to be a single medium, the term "carrier medium" should be taken to include a single medium or multiple media (for example, a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The term "carrier medium" shall also be taken to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by one or more of the processors and that cause the one or more processors to perform any one or more of the methodologies of the present invention. A carrier medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media. Non-volatile media includes, for example, optical, magnetic disks, and magneto-optical disks. Volatile media includes dynamic memory, such as main memory. Transmission media includes coaxial cables, copper wire and fibre optics, including the wires that comprise a bus subsystem. Transmission media also may also take the form of acoustic or light waves, such as those generated during radio wave and infrared data communications. For example, the term "carrier medium" shall accordingly be taken to included, but not be limited to, solid-state memories, a computer product embodied in optical and magnetic media; a medium bearing a propagated signal detectable by at least one processor of one or more processors and representing a set of instructions that, when executed, implement a method; and a transmission medium in a network bearing a propagated signal detectable by at least one processor of the one or more processors and representing the set of instructions.
[0061] It will be understood that the steps of methods discussed are performed in one embodiment by an appropriate processor (or processors) of a processing (i.e., computer) system executing instructions (computer-readable code) stored in storage.
Industrial Applicability
[0062] The arrangements described are applicable to many industries that utilise commercial IT systems and, and particularly to the updating of IT systems that require migration of data between an existing type of database to a new type of database that may not be compatible with the existing type of database. Therefore, the invention is clearly industrially applicable.

Claims (1)

  1. The claims defining the invention are as follows: 1. A method for determining congruency between two data sets, the method including the steps of: receiving, from a first database, a first data set of a first format, the first data set having a plurality of first data cells; creating a plurality of first hash values corresponding to each of the plurality of first data cells; receiving, from a second database, a second data set of a second format, the second data set having a plurality of second data cells, the second data set being based on the first data set; creating a plurality of second hash values corresponding to each of the plurality of second data cells; comparing the first hash values and the second has values; and determining a congruency result based on the comparison. is 2. The method according to claim 1 wherein the second database is migrated from the first database such that the second data set is a migrated version of the first data set. 3. The method according to claim 1 or claim 2 wherein the first data set is extracted to create at least one first table containing the plurality of first data cells, and the second data set is extracted to create at least one second table containing the plurality of second data cells. 4. The method according to any one of the preceding claims wherein the first and second data sets include one or more of the following data types: key; string; number; date. 5. A system for determining congruency between a first data set and a second data set: a first database for storing the first data set of a first format, the first data set having a plurality of first data cells; a second database for storing the second data set of a second format, the second data set having a plurality of second data cells, the second data set being based on the first data set; a hash value module for: receiving the first data set and second data set; creating a plurality of first hash values corresponding to each of the plurality of first data cells; and creating a plurality of second hash values corresponding to each of the plurality of second data cells; and a comparison module for comparing the first hash values and the second hash values and determining a congruency result based on the comparison.
    6. The system according to claim 5 wherein the second database is migrated from the first database such that the second data set is a migrated version of the first data set. 7. The system according to claim 5 or claim 6 wherein the first data set is extracted to create at least one first table containing the plurality of first data cells, and the second data set is extracted to create at least one second table containing the plurality of second data cells. 8. The system according to any one of the preceding claims 5 to 7 wherein the first and second data sets include one or more of the following data types: key; string; number; date.
    Chitharanjan Billa By Patent Attorneys for the Applicant
    ©COTTERS Patent & Trade Mark Attorneys
AU2021106743A 2021-08-24 2021-08-24 Systems and methods for determining congruency in data sets Active AU2021106743A4 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2021106743A AU2021106743A4 (en) 2021-08-24 2021-08-24 Systems and methods for determining congruency in data sets

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
AU2021106743A AU2021106743A4 (en) 2021-08-24 2021-08-24 Systems and methods for determining congruency in data sets

Publications (1)

Publication Number Publication Date
AU2021106743A4 true AU2021106743A4 (en) 2021-11-11

Family

ID=78488584

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2021106743A Active AU2021106743A4 (en) 2021-08-24 2021-08-24 Systems and methods for determining congruency in data sets

Country Status (1)

Country Link
AU (1) AU2021106743A4 (en)

Similar Documents

Publication Publication Date Title
KR102289995B1 (en) Data storage, data check, and data linkage method and apparatus
US10956403B2 (en) Verifying data consistency
US10176224B2 (en) Query plan optimization for large payload columns
US9652368B2 (en) Using linked data to determine package quality
CN111078672B (en) Data comparison method and device for database
US20180113886A1 (en) Method for Computing Distinct Values in Analytical Databases
US10757186B2 (en) Uploading user and system data from a source location to a destination location
US10331670B2 (en) Value range synopsis in column-organized analytical databases
CN111124872A (en) Branch detection method and device based on difference code analysis and storage medium
US11709763B2 (en) Systems and method for testing computing environments
AU2021106743A4 (en) Systems and methods for determining congruency in data sets
US20180069774A1 (en) Monitoring and reporting transmission and completeness of data upload from a source location to a destination location
CN110866031B (en) Database access path optimization method and device, computing equipment and medium
CN112579591B (en) Data verification method, device, electronic equipment and computer readable storage medium
CN111177119A (en) Database-based full-data comparison method, device, equipment and storage medium
JPWO2020065778A1 (en) Information processing equipment, control methods, and programs
CN103593500A (en) Aircraft parameter mapping system and method based on supporting vector machine and multiple regression
CN113760765B (en) Code testing method and device, electronic equipment and storage medium
KR101567550B1 (en) Method for collecting and providing data in manufacturing process
WO2020065960A1 (en) Information processing device, control method, and program
CN114490583A (en) Data migration method and device, electronic equipment and storage medium
CN115470386A (en) Data storage method, data retrieval method, data storage device, data retrieval device and electronic equipment
CN117271445A (en) Log data processing method, device, server, storage medium and program product

Legal Events

Date Code Title Description
FGI Letters patent sealed or granted (innovation patent)