WO2023238044A1

WO2023238044A1 - System and method of generating test deviations

Info

Publication number: WO2023238044A1
Application number: PCT/IB2023/055844
Authority: WO
Inventors: Devon Dawn KELLY; Jennifer MEMMOTT; Hamza Hasan
Original assignee: Crispr Therapeutics Ag
Priority date: 2022-06-07
Filing date: 2023-06-07
Publication date: 2023-12-14

Abstract

Disclosed herein include systems, devices, and methods for standardizing clinical test data received from multiple databases or sources collected for a clinical trial and determining deviations in test samples collected or insufficient test samples collected for the clinical trial based on electronic data capture (EDC) visitation data.

Description

SYSTEM AND METHOD OF GENERATING TEST DEVIATIONS

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims the benefit of priority to U.S. Provisional Patent Application No. 63/349,958, filed June 7, 2022, and U.S. Provisional Patent Application No. 63/393,580, filed July 29, 2022. The content of each of these related applications is incorporated herein by reference in its entirety for all purposes.

BACKGROUND

Field

[0002] The present disclosure generally relates to data processing. More particularly, the present disclosure relates to systems and methods of compiling data stored in a plurality of databases and generating a deviation report for the data.

Background

[0003] In a clinical trial, various tests of the clinical trial can be performed at different test sites (e.g., clinics), in different stages, and by different administrators (or providers or vendors). For example, a clinical trial can comprise administering a drug to a patient and testing the patient’s blood samples some days after the drug was administered. In this example, the drug can be administered at one clinic by one administrator, the blood samples can be collected at another clinic by another administrator, and testing of the blood samples can be performed at yet another clinic by yet another administrator. As such, various data gathered for the clinical trial can be documented and/or recorded differently by different administrators. Under conventional methods, users accessing data of a clinical trial generally need to manually review and compile data collected by different administrators and standardize the data under a common nomenclature. Further, in some cases, a nomenclature used in identifying various data of a clinical trial must conform to Food and Drug Administration (FDA) and/or Schedule of Assessments (SoA) reporting requirements or guidelines. As such, conventional methods of reviewing, compiling, and standardizing data from clinical trials are inadequate, inefficient, and cumbersome. For example, mistakes can be introduced when a user reviews and compiles data of a clinical trial, for example, for FDA reporting purposes.

SUMMARY

[0004] Disclosed herein include computer-implemented methods of determining insufficient test samples collected or collection (or deviations in actual samples collected relative to expected samples collected or deviations in actual sample collection relative to expected sample collection). In some embodiments, a computer-implemented method of determining insufficient test sample collection comprises: obtaining (or receiving), by a computing system, clinical test data (or test reports) collected for a clinical trial from a plurality of databases (or data sources, such as files), such as 2, 3, 4, 5, or more databases (or data sources). The clinical test data for example, can be visualized as or represent data tables. Alternatively or additionally, the method can comprise: obtaining (or receiving), by a computing system, clinical test data (or test reports) collected for a clinical trial and stored in multiple data files. Alternatively or additionally, the method can comprise: obtaining (or receiving), by a computing system, data files storing clinical test data (or test reports) collected for a clinical trial, for example, from a plurality of databases (or data sources). The method can comprise: standardizing, by the computing system, the clinical test data to conform to a common data nomenclature (or standardized data nomenclature). The method can comprise: comparing, by the computing system, the standardized clinical test data to collection requirements. The method can comprise: generating, by the computing system, based on the comparison, a deviation report. The deviation report can include information relating to one or more insufficient test samples collected or collection for the clinical trial, including one or more deviations in test sample collected (actual test samples collected relative to the test samples expected to be collected), or insufficient test samples collected from a subject at each of one or more time points of the clinical trial. In some embodiments, the clinical test data is collected by at least two different administrators (e.g., 2, 3, 4, 5, or more) of the clinical trial.

[0005] In some embodiments, the clinical test data obtained from a first database and the clinical test data obtained from a second database have different numbers of data fields (for example, corresponding to columns of data tables). The clinical test data obtained from two databases (or data sources) can contain different number of data fields and/or entries (for example, corresponding to rows of data tables).

[0006] In some embodiments, the computer-implemented method further comprises: obtaining, by the computing system, conversion keys, for standardizing the clinical test data to conform to the common data nomenclature. The method can include obtaining (or receiving) the conversion keys from one or more conversion key databases. The conversion keys for standardizing the clinical test data can be obtained from multiple databases (or data sources). The method can include receiving files comprising the conversions keys. The conversion keys obtained from a first database (or a first data source) can be of a first type (e.g., visit name conversion keys). The conversions obtained from a second database (or a second data source) can be of a second type (e.g., visit name conversion keys). The conversion keys for standardizing the clinical test data obtained from a first database of the plurality of databases and the conversion keys for standardizing the clinical test data obtained from a second database of the plurality of databases can be different. The conversions keys for standardizing the clinical test data obtained from a first database of the plurality of databases can be specific for standardizing the clinical test data obtained from the first database. The conversions keys for standardizing the clinical test data obtained from a second database of the plurality of databases can be specific for standardizing the clinical test data obtained from the second database.

[0007] In some embodiments, a conversion key comprises a pair comprising a nonstandardized value and a standardized value. A conversion key can comprise a pair comprising a non-standardized data value of a data field and a corresponding standardized data value of the data field. For example, a conversion key can comprise a pair comprising a non-standardized data value of data field and a corresponding standardized data value of the data field. For example, a conversion key can comprise a pair comprising a non-standardized label (or name) of a data field and a corresponding standardized label (or name) of the data field. There can be multiple types of conversion keys, such as 2, 3, 4, 5, or more types of conversion keys. Non-limiting exemplary types of conversion keys include visit name conversion keys and test name (or container name) conversion keys.

[0008] In some embodiments, obtaining the conversion keys comprises: obtaining, by the computing system, visit name conversion keys, for standardizing visit names in the clinical test data, from a visit name conversion key database (or data source). Obtaining the conversion keys can comprise: obtaining, by the computing system, test name (or container name) conversion keys, for standardizing test names in the clinical test data, from a test name conversion key database (or data source).

[0009] In some embodiments, standardizing the clinical test data to conform to the common data nomenclature comprises: standardizing, by the computing system, the clinical test data to conform to the common data nomenclature using conversion keys. In some embodiments, standardizing the clinical test data to conform to the common data nomenclature comprises: determining, by the computing system, one or more data values of one or more data fields of the clinical test data are not present in the conversion keys (or are not non-standardized values of the conversion keys, or are unrecognizable using the conversion keys). Determining the one or more data values of the one or more data fields of the clinical test data are not present in the conversion keys can comprise: determining, by the computing system, one or more data values of one or more data fields of the clinical test data are not present in the conversion keys using a machine learning model. The method can include: generating and/or displaying, by the computing system, a notification that the one or more data values are not present in the conversion keys. Alternatively or additionally, the method can include: requesting and/or receiving, by the computing system, additional conversion keys for standardizing the data values to conform to the common data nomenclature. Standardizing the clinical test data to conform to the common data nomenclature can comprise: modifying (or standardizing), by the computing system, the clinical test data to conform to the common data nomenclature using the conversion keys and the additional conversion keys. Standardizing the clinical test data to conform to the common data nomenclature can comprise: modifying (or standardizing), by the computing system, the clinical test data to conform to the common data nomenclature using the conversion keys and a machine learning model (e.g., based on similarity scores, including edit distances, such as based on the edit distance between a data value of a data field of the clinical test data that is not present in the conversion keys (or not present in the non-standardized values of the conversion keys) and any value of the conversion keys (or any non-standardized values of the conversion keys or any standardized values of the conversion keys). In some embodiments, standardizing the clinical test data to conform to the common data nomenclature comprises: determining, by the computing system, using the conversion keys, data values of one or more data fields of the clinical test data that do not conform to the common data nomenclature. Determining data values of one or more data fields of the clinical test data that do not conform to the common data nomenclature can comprise: determining, by the computing system, using a machine learning model and the conversion keys, data values of one or more data fields of the clinical test data that do not conform to the common data nomenclature. Standardizing the clinical test data to conform to the common data nomenclature can comprise: modifying (or standardizing), by the computing system, the nonconforming data to the common data nomenclature. Standardizing the clinical test data to conform to the common data nomenclature can comprise: modifying (or standardizing), by the computing system, using a machine learning model (e.g., a machine learning model based on similarity scores such as edit distances). In some embodiments, standardizing the clinical test data to conform to the common data nomenclature comprises: determining, by the computing system, using a machine learning model (e.g., a machine learning model based on similarity scores such as edit distances) and the conversion keys, data values of one or more data fields of the clinical test data that do not conform to the common data nomenclature. Standardizing the clinical test data to conform to the common data nomenclature can comprise: modifying (or modifying), by the computing system, based on the first machine learning model, the nonconforming data to the common data nomenclature. Standardizing the clinical test data to conform to the common data nomenclature can comprise: modifying (or modifying), by the computing system, based on a machine learning model, the nonconforming data to the common data nomenclature. In some embodiments, a machine learning model is trained based on training data that includes labels for data fields and/or data values of data fields that conform to the common data nomenclature. [0010] In some embodiments, the clinical test data obtained from a database and the corresponding standardized clinical test data comprise different numbers of data fields (e.g., corresponding to columns of a data table) and/or different numbers of data entries (e.g., corresponding to rows of a data table). The clinical test data obtained from a database can comprise more data fields and/or more data entries than the corresponding standardized clinical test data. The clinical test data obtained from a database can comprise fewer data fields and/or fewer data entries than the corresponding standardized clinical test data.

[0011] In some embodiments, standardizing the clinical test data to conform to the common data nomenclature further comprises: determining, by the computing system, an error in a data value of the clinical test data. Determining the error in the data value of the clinical test data can comprise: determining, by the computing system, an error in a data value of the clinical test data using one or more conversion keys and/or a machine learning model. Standardizing the clinical test data to conform to the common data nomenclature can comprise: modifying, by the computing system, the data value containing the error to correct the error. The error can comprise a typographical error and/or a data format error.

[0012] In some embodiments, standardizing the clinical test data to conform to the common data nomenclature further comprises: generating, by the computing system, an output database. The output database can include a compilation of the clinical test data confirming to the common data nomenclature. Standardizing the clinical test data to conform to the common data nomenclature can further comprise: generating, by the computing system, a count table. The count table can be a pivot table. The count table can include a count of each of one or more test samples collected from a subject at each of one or more time points of the clinical trial.

[0013] In some embodiments, comparing the standardized clinical test data to the collection requirements comprises: obtaining, by the computing system, Electronic Data Collection (EDC) visitation data from a database (or data source). Comparing the standardized clinical test data to the collection requirements can comprise: determining, at least one unrecognized visit or one visit with an unrecognized name in the EDC visitation data. An unrecognized visit can be a visit with a name not present in the requirement keys and/or the collection requirements. Comparing the standardized clinical test data to the collection requirements can comprise: generating and/or displaying, by the computing system, a notification regarding the at least one unrecognized visit in the EDC visitation data. Comparing the standardized clinical test data to the collection requirements can comprise: requesting and/or receiving, by the computing system, requirements for the unrecognized visit. Determining the at least one unrecognized visit in the EDC visitation data can comprise: determining, by the computing system, based on a machine learning model, the at least one unrecognized visit in the EDC visitation data. In some embodiments, a machine learning model is trained based on training data that includes recognized visit name or visit names that conform to the common data nomenclature.

[0014] In some embodiments, the computer-implemented method further comprises: determining, by the computing system, collection requirements for each visit in the Electronic Data Collection (EDC) visitation data. Determining the collection requirements for each visit in the EDC visitation data can comprise: determining, by the computing system, the collection requirements for each visit in the EDC visitation data from a collection requirements table and a collection requirements key. The method can comprise: receiving the collection requirements table and the collection requirement key from at least one database (or at least one data source), such as 2, 3, 4, 5, or more, databases. Receiving the collection requirements table and the collection requirement key can comprise: receiving the collection requirements table and the collection requirement key from one database (or data source). Receiving the collection requirements table and the collection requirement key can comprise: receiving the collection requirements table from a first database (or data source) and the collection requirement key from a second database (or data source).

[0015] In some embodiments, the computer-implemented method further comprises: obtaining, by the computing system, collection requirements, or a portion thereof, from a database (or data source). In some embodiments, comparing the standardized clinical test data to the collection requirements further comprises: determining, by the computing system, collection requirements for each visit of a subject in Electronic Data Collection (EDC) visitation data. Comparing the standardized clinical test data to the collection requirements further can comprise: comparing, by the computing system, test samples collected for the visit of the subject in the count table to the collection requirements for the visit of the subject to determine insufficient samples collected for the clinical trial.

[0016] Disclosed herein include systems of determining insufficient test samples collected or collection (or deviations in actual samples collected relative to expected samples collected or deviations in actual sample collection relative to expected sample collection). In some embodiments, a system of determining insufficient test samples collected or collection (or deviations in actual samples collected relative to expected samples collected or deviations in actual sample collection relative to expected sample collection) comprises: a processor. The system can comprise: a memory storing instructions. The instructions, when executed by the processor, cause the system to perform: obtaining clinical test data collected for a clinical trial from a plurality of databases (or data sources, such as files). The instructions, when executed by the processor, cause the system to perform: standardizing the clinical test data to conform to a common data nomenclature using conversion keys. The instructions, when executed by the processor, cause the system to perform: determining collection requirements. The instructions, when executed by the processor, cause the system to perform: comparing the standardized clinical test data to the collection requirements. The instructions, when executed by the processor, cause the system to perform: generating, based on the comparison, a deviation report, wherein the deviation report includes information relating to insufficient test samples collected for the clinical trial. In some embodiments, the clinical test data is collected by at least two different administrators (e.g., 2, 3, 4, 5, or more) of the clinical trial.

[0017] In some embodiments, the clinical test data obtained from a first database and the clinical test data obtained from a second database have different numbers of data fields (for example, corresponding to columns of data tables). The clinical test data obtained from two databases (or data sources) can contain different numbers of data fields and/or entries (for example, corresponding to rows of data tables).

[0018] In some embodiments, wherein the instructions, when executed by the processor, cause the system to perform: obtaining the conversion keys, for standardizing the clinical test data to conform to the common data nomenclature, from one or more conversion key databases. In some embodiments, the conversion keys for standardizing the clinical test data obtained from a first database of the plurality of databases and the conversion keys for standardizing the clinical test data obtained from a second database of the plurality of databases are In some embodiments, a conversion key comprises a pair comprising a non-standardized value and a standardized value. A conversion key can comprise a pair comprising a non-standardized data value of a data field and a corresponding standardized data value of the data field. For example, a conversion key can comprise a pair comprising a non-standardized data value of data field and a corresponding standardized data value of the data field. For example, a conversion key can comprise a pair comprising a non-standardized label (or name) of a data field and a corresponding standardized label (or name) of the data field. There can be multiple types of conversion keys, such as 2, 3, 4, 5, or more types of conversion keys. Non-limiting exemplary types of conversion keys include visit name conversion keys and test name (or container name) conversion keys.

[0019] In some embodiments, obtaining the conversion keys comprises: obtaining visit name conversion keys, for standardizing visit names in the clinical test data, from a visit name conversion key database (or data source). Obtaining the conversion keys can comprise: obtaining test name (or container name) conversion keys, for standardizing test names in the clinical test data, from a test name conversion key database (or data source). [0020] In some embodiments, standardizing the clinical test data to conform to the common data nomenclature comprises: standardizing the clinical test data to conform to the common data nomenclature using conversion keys. In some embodiments, standardizing the clinical test data to conform to the common data nomenclature comprises: determining one or more data values of one or more data fields of the clinical test data are not present in the conversion keys (or are not non-standardized values of the conversion keys, or are unrecognizable using the conversion keys). Determining the one or more data values of the one or more data fields of the clinical test data are not present in the conversion keys can comprise: determining one or more data values of one or more data fields of the clinical test data are not present in the conversion keys using a machine learning model. The instructions, when executed by the processor, cause the system to perform: generating and/or displaying a notification that the one or more data values are not present in the conversion keys. Alternatively or additionally, the instructions, when executed by the processor, cause the system to perform: requesting and/or receiving additional conversion keys for standardizing the data values to conform to the common data nomenclature. Standardizing the clinical test data to conform to the common data nomenclature can comprise: modifying (or standardizing) the clinical test data to conform to the common data nomenclature using the conversion keys and the additional conversion keys. Standardizing the clinical test data to conform to the common data nomenclature can comprise: modifying (or standardizing) the clinical test data to conform to the common data nomenclature using the conversion keys and a machine learning model (e.g., based on similarity scores, including edit distances, such as based on the edit distance between a data value of a data field of the clinical test data that is not present in the conversion keys (or not present in the non-standardized values of the conversion keys) and any value of the conversion keys (or any non-standardized values of the conversion keys or any standardized values of the conversion keys). In some embodiments, standardizing the clinical test data to conform to the common data nomenclature comprises: determining using the conversion keys, data values of one or more data fields of the clinical test data that do not conform to the common data nomenclature. Determining data values of one or more data fields of the clinical test data that do not conform to the common data nomenclature can comprise: determining using a machine learning model and the conversion keys, data values of one or more data fields of the clinical test data that do not conform to the common data nomenclature. Standardizing the clinical test data to conform to the common data nomenclature can comprise: modifying (or standardizing) the nonconforming data to the common data nomenclature. Standardizing the clinical test data to conform to the common data nomenclature can comprise: modifying (or standardizing) using a machine learning model (e.g., a machine learning model based on similarity scores such as edit distances). In some embodiments, standardizing the clinical test data to conform to the common data nomenclature comprises: determining using a machine learning model (e.g., a machine learning model based on similarity scores such as edit distances) and the conversion keys, data values of one or more data fields of the clinical test data that do not conform to the common data nomenclature. Standardizing the clinical test data to conform to the common data nomenclature can comprise: modifying (or modifying) based on the first machine learning model, the nonconforming data to the common data nomenclature. Standardizing the clinical test data to conform to the common data nomenclature can comprise: modifying (or modifying) based on a machine learning model, the nonconforming data to the common data nomenclature. In some embodiments, a machine learning model is trained based on training data that includes labels for data fields and/or data values of data fields that conform to the common data nomenclature.

[0021] In some embodiments, the clinical test data obtained from a database and the corresponding standardized clinical test data comprise different numbers of data fields (e.g., corresponding to columns of a data table) and/or different numbers of data entries (e.g., corresponding to rows of a data table). The clinical test data obtained from a database can comprise more data fields and/or more data entries than the corresponding standardized clinical test data. The clinical test data obtained from a database can comprise fewer data fields and/or fewer data entries than the corresponding standardized clinical test data.

[0022] In some embodiments, standardizing the clinical test data to conform to the common data nomenclature further comprises: determining, by the computing system, an error in a data value of the clinical test data. Determining the error in the data value of the clinical test data can comprise: determining, by the computing system, an error in a data value of the clinical test data using one or more conversion keys and/or a machine learning model. Standardizing the clinical test data to conform to the common data nomenclature can comprise: modifying, by the computing system, the data value containing the error to correct the error. The error can comprise a typographical error and/or a data format error.

[0023] In some embodiments, standardizing the clinical test data to conform to the common data nomenclature further comprises: generating, by the computing system, an output database. The output database can include a compilation of the clinical test data confirming to the common data nomenclature. Standardizing the clinical test data to conform to the common data nomenclature can further comprise: generating, by the computing system, a count table. The count table can be a pivot table. The count table can include a count of each of one or more test samples collected from a subject at each of one or more time points of the clinical trial.

[0024] In some embodiments, comparing the standardized clinical test data to the collection requirements comprises: obtaining Electronic Data Collection (EDC) visitation data from a database (or data source). Comparing the standardized clinical test data to the collection requirements can comprise: determining, at least one unrecognized visit or one visit with an unrecognized name in the EDC visitation data. An unrecognized visit can be a visit with a name not present in the requirement keys and/or the collection requirements. Comparing the standardized clinical test data to the collection requirements can comprise: generating and/or displaying a notification regarding the at least one unrecognized visit in the EDC visitation data. Comparing the standardized clinical test data to the collection requirements can comprise: requesting and/or receiving requirements for the unrecognized visit. Determining the at least one unrecognized visit in the EDC visitation data can comprise: determining based on a machine learning model, the at least one unrecognized visit in the EDC visitation data. In some embodiments, a machine learning model is trained based on training data that includes recognized visit name or visit names that conform to the common data nomenclature.

[0025] In some embodiments, determining the collection requirements comprises: determining collection requirements for each visit in the Electronic Data Collection (EDC) visitation data. Determining the collection requirements for each visit in the EDC visitation data can comprise: determining the collection requirements for each visit in the EDC visitation data from a collection requirements table and a collection requirements key. Determining the collection requirements can comprise: receiving the collection requirements table and the collection requirement key from at least one database (or data source), such as 2, 3, 4, 5, or more, databases. Receiving the collection requirements table and the collection requirement key can comprise: receiving the collection requirements table and the collection requirement key from one database (or data source). Receiving the collection requirements table and the collection requirement key can comprise: receiving the collection requirements table from a first database (or data source) and the collection requirement key from a second database (or data source). In some embodiments, comparing the standardized clinical test data to the collection requirements further comprises: determining collection requirements for each visit of a subject in Electronic Data Collection (EDC) visitation data. Comparing the standardized clinical test data to the collection requirements can further comprise: comparing test samples collected for the visit of the subject in the count table to the collection requirements for the visit of the subject to determine insufficient samples collected for the clinical trial.

[0026] Disclosed herein include embodiments of a non-transitory storage media, such of a non-transitory storage. The non-transitory storage media can store machine instructions. The non-transitory storage media can be of a computing system. The machine instructions, when executed by a processor of a computing system, causes the computing system to perform any method disclosed herein. For example, the machine instructions, when executed by a processor of a computing system, can cause the system to perform: obtaining clinical test data from a plurality of databases. The machine instructions, when executed by a processor of a computing system, can cause the system to perform: standardizing the clinical test data to conform to a common data nomenclature. The machine instructions, when executed by a processor of a computing system, can cause the system to perform: comparing the standardized clinical test data to collection requirements. The machine instructions, when executed by a processor of a computing system, can cause the system to perform: generating, based on the comparison, a deviation report. The deviation report can include information relating to insufficient test samples collected for a clinical trial.

[0027] These and other features of the apparatuses, systems, methods, and non- transitory computer-readable media disclosed herein, as well as the methods of operation and functions of the related elements of structure and the combination of parts and economies of manufacture, will become more apparent upon consideration of the following description and the appended claims with reference to the accompanying drawings, all of which form a part of this specification, wherein like reference numerals designate corresponding parts in the various figures. It is to be expressly understood, however, that the drawings are for purposes of illustration and description only and are not intended as a definition of the limits of the disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

[0028] Certain features of various embodiments of the present technology are set forth with particularity in the appended claims. A better understanding of the features and advantages of the technology will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the technologies are utilized, and the accompanying drawings of which:

[0029] FIGURE 1 illustrates a computing environment in which data of a clinical trial can be accessed to generate a data deviation report for the clinical trial, according to various embodiments of the present disclosure.

[0030] FIGURE 2A illustrates a deviation generator module, according to various embodiments of the present disclosure.

[0031] FIGURE 2B illustrates example datasets relating to a clinical trial collected by two different administrators (or providers or vendors), according to various embodiments of the present disclosure.

[0032] FIGURE 2C illustrates example conversion keys for datasets, respectively, according to various embodiments of the present disclosure.

[0033] FIGURE 2D illustrates an example output dataset generated by the data standardization module 202, according to various embodiments of the present disclosure. [0034] FIGURE 2E illustrates an example table generated by the data standardization module 202, according to various embodiments of the present disclosure.

[0035] FIGURE 2F illustrates an example requirements table, according to various embodiments of the present disclosure.

[0036] FIGURE 2G illustrates an example requirements key, according to various embodiments of the present disclosure.

[0037] FIGURE 2H illustrates an example EDC eligibility screen report, according to various embodiments of the present disclosure.

[0038] FIGURE 21 illustrates an example EDC visit report, according to various embodiments of the present disclosure.

[0039] FIGURE 2 J illustrates an example data deviation report, according to various embodiments of the present disclosure.

[0040] FIGURE 3A1 illustrates a data process flow diagram for a data standardization module, according to various embodiments of the present disclosure.

[0041] FIGURE 3A2 illustrates a data process flow diagram for a data standardization module, according to various embodiments of the present disclosure.

[0042] FIGURE 3B1 illustrates a data process flow diagram for a data comparison module, according to various embodiments of the present disclosure.

[0043] FIGURE 3B2 illustrates a data process flow diagram for a data comparison module, according to various embodiments of the present disclosure.

[0044] FIGURE 4 illustrates a computing component that includes one or more hardware processors and a machine-readable storage media storing a set of machine- readable/machine-executable instructions that, when executed, cause the hardware processor(s) to perform a method of generating a data deviation report, according to various embodiments of the present disclosure.

[0045] FIGURE 5 illustrates a block diagram of a computer system upon which any of various embodiments described herein may be implemented.

[0046] The figures depict various embodiments of the disclosed technology for purposes of illustration only, wherein the figures use like reference numerals to identify like elements. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated in the figures can be employed without departing from the principles of the disclosed technology described herein. DETAILED DESCRIPTION

[0047] In general, in a clinical trial, various tests of the clinical trial can be performed at different test sites (e.g., clinics), in different stages, and by different administrators (or providers or vendors). For example, a clinical trial can comprise administering a drug to a patient and testing the patient’s blood samples some days after the drug was administered. In this example, the drug can be administered at one clinic by one administrator, the blood samples can be collected at another clinic by another administrator, and testing of the blood samples can be performed at yet another clinic by yet another administrator. As such, various data gathered for the clinical trial can be documented and/or recorded differently by different administrators. Under conventional methods, users accessing data of clinical trials generally need to manually review and compile the data so that the data is standardized across different administrators under a common nomenclature. Further, in some cases, a nomenclature used in identifying various data of a clinical trial must conform to Food and Drug Administration (FDA) and/or Schedule of Assessments (SoA) reporting requirements or guidelines. As such, conventional methods of reviewing and compiling data from clinical trials are inadequate, inefficient, and cumbersome. For example, a user reviewing data of a clinical trial can make mistakes while compiling the data. Better solutions are needed.

[0048] Described herein are technologies that address the problems described above. In various embodiments, the technologies can include a computing system. The computing system can be coupled to a plurality of databases over one or more networks, such as the Internet and/or intranets. In some cases, the computing system can be coupled to the plurality of databases over one or more data buses associated with the computing system. Through the plurality of databases, the computing system can access data of a clinical trial (e.g., clinical test data) and process the data so that the data can be easily digested by users accessing the data. The computing system can further process the data so that the data is compliant with various reporting guidelines set forth by FDA and/or SoA for clinical trials. In some embodiments, the computing system can process the data of the clinical trial to conform to a common data nomenclature. In this way, the data of the clinical trial can be standardized, under the common data nomenclature, across different administrators (or providers or vendors) of the clinical trial. Once the data is standardized, the computing system can aggregate and compile the data under the common data nomenclature based on one or more predefined rules compliant with the reporting guidelines. The computing system can then generate, based on the compiled data by applying the one or more predefined rules, a data deviation report indicating deviations from tests prescribed in the clinical trial. In this way, any test deficiencies of the clinical trial can be readily identified. As such, the technologies described herein can automate the laborious process of having to manually compile data of clinical trials and automate clinical trial reporting processes. These and other features of the technologies are described in further detail herein.

[0049] FIGURE 1 illustrates a computing environment 100 in which data of a clinical trial can be accessed to generate a data deviation report for the clinical trial, according to various embodiments of the present disclosure. As shown in FIGURE 1, in some embodiments, the computing environment 100 can include a computing system 120. The computing system 120 can be coupled to a plurality of databases 110-114 over one or more networks 130. The one or more networks 130 can be implemented using various conventional methods. For example, the computing system 120 can access data stored in the plurality of databases 110-114 over the Internet or an intranet. In some cases, the one or more networks 130 can be data buses. For example, in some embodiments, the computing system 120 can access data stored in the plurality of databases 110-114 over one or more data buses internal to the computing system 120. In such embodiments, the plurality of databases 110-114 may be a part of the computing system 120. For example, the plurality of databases 110-114 can be stored in a computer media storage of the computing system 120. Many variations are possible and contemplated.

[0050] In some embodiments, the plurality of databases 110-114 can store the data of the clinical trial. The data of the clinical trial can be associated with one or more administrators (or providers or vendors) of the clinical trial. For example, the database 110 can store test data associated with a first test prescribed by the clinical trial administered by a first administrator. The database 112 can store test data associated with a second test prescribed by the clinical trial administered by a second administrator. In some embodiments, at least one of the plurality of databases 110-114 can store translations to which the data of the clinical trial is to be converted under a common data nomenclature. In this regard, a common data nomenclature refers to a data notation in which data fields and underlying data values of the data fields are labeled and arranged under a common set of names or identifiers. For example, continuing from the example above, the database 114 can store translations to which data fields of the test data stored in the database 110 and data fields of the test data stored in the database 112 are be converted under a common set of identifiers (e.g., data field names). In various embodiments, these translations can be referred to as conversion keys. The conversion keys can be used to convert or format data from multiple databases to a common data nomenclature. Alternatively or additionally, he conversion keys can be used to convert or format data from one database to data from another database, or vice versa. In general, data stored in the plurality of databases 110-114 can be of any suitable data type. For example, in some embodiments, the data stored in the plurality of databases 112-114 can be unstructured data, such as alphanumeric text files. In some embodiments, the data stored in the plurality of databases 110-114 can be structured data, such as tabularized data or tables. The tabularized data can be organized into columns and rows. For example, in one implementation, the data stored in the plurality of databases 110-114 can be, or corresponds to, spreadsheet data comprising test results of a clinical trial administered by various administrators organized and arranged in rows and columns.

[0051] In some embodiments, the computing system 120 can be a general -purpose computing device, such as a computer, a tablet, or a mobile phone. In some embodiments, the computing system 120 can be a specialized computing device comprising specialized processors, such as custom application-specific integrated circuits (ASICs) or field-programmable gate arrays (FPGAs). As shown in FIGURE 1, in some embodiments, the computing system 120 can include a deviation generator module 122. The deviation generator module 122 can be configured to standardize data notations of the data stored in the plurality of databases 110-114 into a common data nomenclature. In this way, the deviation generator module 122 can collect and aggregate the data stored in the plurality of databases 110-114 under the common data nomenclature in order to generate a data deviation report. For example, referring to FIGURE 1, assume that the databases 110 and 112 store test data of a participant of a clinical trial administered by two different test administrators under slightly different data notations (i.e., different data fields), and the database 114 stores conversion keys to which to convert the test data of the participant into a common data nomenclature. In this example, the deviation generator module 122 can obtain the test data of the participant from the databases 110 and 112, the conversion keys from the database 114, and aggregate the test data based on the conversion keys to generate a data deviation report under the common data nomenclature. In this way, information of the data deviation report as outputted by the deviation generator module 122 is organized under the common data nomenclature and being in complaint with various reporting guidelines. In this way, the test data of the participant can be easily reviewed, digested, and reported, without having to go back and forth between the test data collected by the two different administrators. The deviation generator module 122 will be discussed in further detail with reference to FIGURE 2A herein. The deviation generator module 122 can be implemented in a variety of ways. For example, in one implementation, the deviation generator module 122 can comprise machine-compatible codes stored in a non-transitory memory of the computing system 120 that, when executed, causes the computing system 120 to perform various functions of the deviation generator module 122 described herein. For instance, the machine-compatible codes may comprise a computing program (or application), a software script, a spreadsheet macro, etc. In another implementation, the deviation generator module 122 can comprise one or more logic blocks of one or more ASICs or FPGAs configured to perform various functions of the deviation generator module 122 described herein. Many implementations are possible and contemplated.

[0052] FIGURE 2A illustrates a deviation generator module 200, according to various embodiments of the present disclosure. In some embodiments, the deviation generator module 122 of FIGURE 1 can be implemented as the deviation generator module 200. As discussed with reference to FIGURE 1 above, the deviation generator module 200 can obtain data of a clinical trial and conversion keys with which to standardize the data from a plurality of databases (e.g., the plurality of databases 110-114 of FIGURE 1) and output the data, based on the conversion keys, under a common data nomenclature in a data deviation report. In this way, the data of the clinical trial, as captured in the data deviation report, can be easily digested, reviewed, and reported. As shown in FIGURE 2A, in some embodiments, the deviation generator module 200 can comprise a data standardization module 202, a data comparison module 204, and a data resolution module 206. Each of these modules will be described in further detail below.

[0053] The data standardization module 202 can be configured to standardize various data (or datasets) of a clinical trial in accordance with conversion keys for the data. The data standardization module 202 can obtain the data of the clinical trial from the plurality of databases over a network or a data bus (e.g., the network 130 of FIGURE 1). In general, the data of the clinical trial can be stored in any suitable format. For example, in some embodiments, the data of the clinical trial can be stored as unstructured data, such as alphanumeric text data. In other embodiments, the data of the clinical trial can be stored as structured data, such as tabularized data. In one particular embodiment, the data of the clinical trial can be stored as tables (e.g., spreadsheet data) comprising clinical test data arranged in rows and columns under various data fields. For example, consider FIGURE 2B. FIGURE 2B illustrates example datasets 220, 222 relating to a clinical trial collected by two different administrators, according to various embodiments of the present disclosure. As shown in FIGURE 2B, the dataset 220 can comprise data fields of “Protocol ID,” “Subject Num,” “Kit Num,” “Visit Name,” “Label Line,” “Visit Collection,” and “Facility.” Similarly, the dataset 222 can comprise data fields of “Study Code,” “Subject Code,” “Biomaterial Visit Name,” “Biomaterial Name,” “Sample Tracking Code,” “Collection Date,” and “Facility.” Underlying data stored under these data fields can represent various tracking information of tests performed for the clinical trial by each administrator. As shown in FIGURE 2B, because the tests of the clinical trial are performed by two different administrators, data collected by the administrators can be organized and identified in slightly different ways. For instance, data field “Protocol ID” of the dataset 220 and data field “Study Code” of the dataset 222 generally track same information, namely an alphanumeric code that identifies a clinical trial, but, in this instance, the datasets 220, 222 have different labels (“Protocol ID” and “Study Code”) for the alphanumeric code. Similarly, data field “Subject Num” of the dataset 220 and data field “Subject Code” of the dataset 222 generally track same information, namely an alphanumeric code that identifies a test of the clinical trial performed or administered, but the datasets 220, 222 have different labels for the alphanumeric code. Now going back to the example, the data standardization module 202 can obtain the dataset 220, the dataset 222, and standardize the datasets 220, 222 under a common set of data fields (i.e., a common data nomenclature) so that the datasets 220, 222 can be represented by same data notation. The data standardization module 202 can standardize the datasets 220, 222 based on conversion keys associated with the datasets 220, 222. In general, conversion keys are one or more tables comprising mappings between data fields of different datasets. For instance, a conversion key for the above example can be a mapping that correlates data field “Protocol ID” of the dataset 220 to data field “Study Code” of the dataset 222. In some embodiments, the conversion keys can include mappings from old data nomenclature to new data nomenclature within a dataset. For example, consider FIGURE 2C. FIGURE 2C illustrates example conversion keys 230, 232 for the datasets 220, 222, respectively, according to various embodiments of the present disclosure. As shown in FIGURE 2C, the conversion keys 230, 232 can comprise mappings from old data nomenclature (e.g., “Old Name”) to new data nomenclature (e.g., “New Name”). For instance, if the dataset 220 includes any instances of “BM-Slide” under data field “Label Line,” the data standardization module 202, based on the conversion key 230, can convert the instances of “BM-Slide” to “BM CORE-SLD.” Similarly, if the dataset 222 includes any instances of “ANTI HLA A,” “ANTI HLA B,” or “ANTI HLA C,” under data field “Sample Tracking Code,” the data standardization module 202, based on the conversion key 232, can convert the instances of “ANTI HLA A,” “ANTI HLA B,” or “ANTI HLA C” to “ANTI HLA.” In some embodiments, as shown in FIGURE 2C, the conversion keys 230, 232 can include additional information such as to whom tests of the clinical trial were performed (e.g., “Parent/Child”) and whether to include or exclude particular mappings for data standardization (e.g., “Include/Exclude”) to be performed by the data standardization module 202. For example, when a mapping is marked with “Include,” the mapping is taken into account by the data standardization module 202 during data standardization process. Likewise, when a mapping is marked with “Exclude,” the mapping is not taken into account by the data standardization module 202 during data standardization process.

[0054] In some embodiments, in addition to performing data standardization, the data standardization module 202 can be further configured to perform data cleaning. Data cleaning, in general, can refer to a process in which data in datasets is modified or converted to a particular format or standard. For example, the data standardization module 202 can parse data of a clinical trial to determine whether there are typographical errors or other known errors in the data. As another example, referring to the datasets 220, 222 of FIGURE 2B, date format of data field “Collection Date” (i.e., “YYYY/MM/DD”) does not conform to date format of data field “Visit Collection” (i.e., MM/DD/YYYY). In this example, the data standardization module 202 can convert the date format of data field “Collection Date” to match the date format of data field “Visit Collection,” or vice versa.

[0055] In some embodiments, once the data of the clinical trial are cleaned and standardized under a common data nomenclature, the data standardization module 202 can generate an output dataset that is a compilation of the data of the clinical trial under the common data nomenclature. In this way, the data of the clinical trial have same data notation and formats. For example, FIGURE 2D illustrates an example output dataset 234 generated by the data standardization module 202, according to various embodiments of the present disclosure. In FIGURE 2D, the output dataset 234 represents a dataset that has been compiled and combined by the data standardization module 202 based on the datasets 220, 222 of FIGURE 2B and the conversion keys 230, 232 of FIGURE 2C. As shown in FIGURE 2D, data fields and data of the datasets 202, 222 have been standardized to a common data nomenclature based on the conversion keys 230, 232. In addition, differences in data formats (e.g., date formats) have been standardized to a common format as well.

[0056] In some embodiments, the data standardization module 202 can be further configured to generate a count table (e.g., a pivot table) that includes data relating to counts of test samples of each participant in the clinical trial. The standardization module 202 can generate the table based on the data of the clinical trial stored in the plurality of databases. For example, FIGURE 2E illustrates an example table 236 generated by the data standardization module 202, according to various embodiments of the present disclosure. As shown in FIGURE 2E, the table 236 can include data fields of a count of expected test samples (“Expected Cnt”), a count of received test samples (“Received Cnt”), a count of missing test samples (“Missing Cnt”), a count of unknown test samples (“Unknown Cnt”), and a count of not drawn test samples (“Not Drawn Cnt”). Information relating to these counts can be useful when tracking overall progress of the clinical trial. In some cases, this information is needed for FDA and/or SoA reporting purposes. This table is later used by the data comparison module 204 in cross-referencing counts with test performs with various collection requirements of the clinical test.

[0057] In some embodiments, the data standardization module 202 can include a machine learning model. The machine learning model can be trained to detect deviations in data nomenclature in the data of the clinical trial and correct these deviations. For example, referring to FIGURES 2B and 2C, the machine learning model can parse through data fields and underlying data values of the data fields in the datasets 220, 222 to determine whether there are deviations in the data fields and/or the underlying data from the conversion keys 230, 232. Once deviations are detected, the machine learning model can modify or update data fields and/or underlying data that have deviated in accordance with the conversion keys 230, 232. In some embodiments, the machine learning model can update conversion keys when new deviations in data nomenclature are detected. For example, continuing from the example above, if the deviations detected by the machine learning model are not recognized or not included in the conversion keys 230, 232, the machine learning model can update the conversion keys 230, 232 to include these deviations so that these deviations are “learned” to recognize future deviations of these types. In various embodiments, the machine learning model can be trained using a dataset annotated with correct and incorrect data labels for data fields and underlying data values of the data fields. In general, any suitable machine learning models can be implemented for detection of data nomenclature deviations. For example, in some embodiments, a classifier can be used to detect data nomenclature deviations. In other embodiments, a neural network can be used to detect data nomenclature deviations. Many variations are possible and contemplated.

[0058] The data comparison module 204 can be configured to map the output database generated by the data standardization module 202 to specific collection requirements of visits of the client trial and generate a data deviation report. The data comparison module 204 can map the output database to the collection requirements of visits based on a requirements table and a requirements key. In some embodiments, the requirements table can be tabularized data, for example, organized as spreadsheet data. The tabularized data can include data relating to requirements of test samples needed to be collected for each prescribed test in the clinical trial. The requirements table can include various collection requirements for the clinical trial. For example, consider FIGURE 2F. FIGURE 2F illustrates an example requirements table 238, according to various embodiments of the present disclosure. The requirements table 238 can include data fields of tests required in the clinical trial (“Req Container”), types of tests (“Timepoint”), required test count (“Req Cnf ’), and threshold for important deviation (“Threshold For Important Deviation”). The types of tests can indicate a type of a test being administered. For example, as shown in FIGURE 2F, a first test required in a clinical trial may be a screening test (“Screening”), a second test required in the clinical trial may be a first dosage test or a pre-dosage test (“DI / Pre”), and a third test required in the clinical trial may be a second dosage test (“D2”). The required test count can indicate a number of tests within each test or test type to be performed. For example, as shown in FIGURE 2F, the screening test is performed once, while the first dosage test and the second dosage test are performed four times each. The threshold for deviation is used to classify a minor or an important deviation. In general, a minor deviation may refer to a non- critical and/or optional test sample collection. In some cases, a minor deviation may refer to an insufficient number of aliquots. An important deviation, on the other hand, may refer to critical test sample collections that were missed and thus requiring follow-ups. For example, as shown in FIGURE 2F, a deviation threshold of 1 indicates a test sample from a critical test required for the clinical trial may be missed only once. In this example, all of the tests shown in FIGURE 2F are critical tests requiring collection of test samples. Applying this logic, the screening tests cannot be missed while the first dosage test and the second dosage test can only be missed once out of four required tests. In some embodiments, the requirements key can be tabularized data, for example, organized as spreadsheet data. The requirements key can indicate data columns of the requirements table from which to find or locate the tests of the clinical trial. For example, consider FIGURE 2G. FIGURE 2G illustrates an example requirements key 240, according to various embodiments of the present disclosure. In this example, the requirements key 240 is associated with the requirements table 238 of FIGURE 2F. The requirements key 240 indicates data columns in the requirements table 238 to locate collection requirements (e.g., test sample counts) for particular tests of the clinical trial. For example, names of required tests of the clinical trial (i.e., “ADD MM DA,” “BN2,” etc.) can be found in data column A in the requirements table 238, types of the required tests (e.g., “Screening,” “Dl/Pre,” “D2”) can be found in data column B in the requirements table 238, counts of the required tests can be found in data column C in the requirements table 238, and threshold for important deviation for the required test can be found in data column D in the requirements table 238. In various embodiment, the requirements table and the requirements key can be obtained from the plurality of databases.

[0059] In some embodiments, the data comparison module 204 can be configured to include or exclude certain data entries of the output database generated by the data standardization module 202. The data comparison module 204 can include or exclude the data entries based on an Electronic Data Collection (EDC) eligibility screening report. The EDC eligibility screen report can include information relating to patients screened for clinical trial eligibility. The data comparison module 204 can, based on the EDC eligibility screen report, exclude test samples collected from ineligible patients and only include test samples collected from eligible patients when generating the data deviation report. An example EDC eligibility screen report 242 is illustrated in FIGURE 2H. As shown in FIGURE 2H, patients “XXX-XXX-001” and “XXX- XXX-002” have met eligibility criteria to participate in the clinical trial, thus test samples from these patients are included in the data deviation report to be generated. Also shown in FIGURE 2H, patient “XXX-XXX-003” has not met the eligibility criteria to participate in the clinical trial, thus test samples from this patient are not included in the data deviation report to be generated. In various embodiment, the EDC eligibility screen report can be obtained from the plurality of databases. [0060] In some embodiments, the data comparison module 204 can incorporate an EDC visit report when generating the data deviation report. The EDC visit report can include visitation data of patients visiting clinics as recorded by an EDC system. An example EDC visit report 244 is shown in FIGURE 21. The EDC visit report 244 correspond to a visit report of patient XXX-XX-001. As shown in FIGURE 21, the patient XXX-XX-001 had visited test sites for tests DI, DIO, D14 at different dates. This information can be incorporated into the data deviation report. In various embodiment, the EDC visit report can be obtained from the plurality of databases.

[0061] In some embodiments, the data comparison module 204 can execute a technique to generate the data deviation report. This technique can be based on one or more predefined rules. Pseudocode for the one or more predefined rules are summarized below:

If a required test sample is not associated with a data entry in the output database generated by the data standardization module 202, a received test sample count for the required test sample is assumed to be zero;

If received test sample > required test sample:

No deviation output;

If (received test sample < required test sample) & (received test sample > threshold):

If received test sample = 0 (Not in the output database):

Output minor deviation {Sample Missing - not in report};

If received test sample = 0:

Output minor deviation {Sample Missing};

If received test sample > 0:

Output minor deviation {Insufficient Quantity of Samples Received};

If (received test sample < required test sample) & (received test sample < threshold):

If received test sample = 0 (Not in the output database):

Output important deviation {Sample Missing - not in report};

If received test sample = 0:

Output important deviation {Sample Missing};

If received test sample > 0:

Output important deviation {Insufficient Quantity of Samples Received};

If important threshold = 0: If received test sample = 0:

Output minor deviation {Sample Missing};

If received test sample = 1 :

Output minor deviation {Insufficient Quantity of Samples

Received};

If important threshold = 1 :

If received test sample = 0:

Output important deviation {Sample Missing};

If received test sample = 1 :

Output minor deviation {Insufficient Quantity of Samples Received};

If received test sample = 2:

Output minor deviation {Insufficient Quantity of Samples

Received};

If important threshold = 2:

If received test sample = 0:

Output important deviation {Sample Missing};

If received test sample = 1 :

Output important deviation {Insufficient Quantity of Samples Received};

If received test sample = 2:

Output minor deviation {Insufficient Quantity of Samples Received};

If received test sample = 3 :

Output minor deviation {Insufficient Quantity of Samples

Received};

The data comparison module 204 can generate the data deviation report based on the pseudocode above using the output database generated by the data standardization module 202. An example data deviation report 246 is shown in FIGURE 2J. As shown in FIGURE 2J, the data deviation report 246 identifies tests of the clinical trial in which insufficient test sample quantities were received by various test administrators or even missing test samples. Data in the data deviation report 246 has been organized in accordance with FDA and/or SoA guidelines under a common data nomenclature.

[0062] In some embodiments, the data comparison module 204 can include a machine learning model. The machine learning model can be trained to detect deviations in data nomenclature in the EDC eligibility screen report and the EDC visit report and correct these deviations. For example, referring to FIGURES 2H and 21, the machine learning model can parse through data fields and underlying data values of the data fields in the EDC eligibility screen report 242 and the EDC visit report 244 to determine whether there are deviations in the data fields and/or the underlying data. Once deviations are detected, the machine learning model can retrained to recognize these deviations and fixed these deviations in accordance with collection requirements. In various embodiments, the machine learning model can be trained using a dataset annotated with correct and incorrect data labels for data fields and underlying data values of the data fields. In general, any suitable machine learning models can be implemented for detection of data nomenclature deviations. For example, in some embodiments, a classifier can be used to detect data nomenclature deviations. In other embodiments, a neural network can be used to detect data nomenclature deviations. Many variations are possible and contemplated.

[0063] The data reconciliation module 206 can be configured to generate a resolution report that incorporates resolutions from previously generated data deviation reports. The resolution report can include differences between the data deviation report generated by the data comparison module 204 and incorporate resolutions of previous resolutions in resolving these differences.

[0064] FIGURE 3A1 illustrates a data process flow diagram 300 for a data standardization module, according to various embodiments of the present disclosure. Data standardization is also referred to herein as phase 1. In some embodiments, the data process flow diagram 300 can depict a data flow of the data standardization module 202 of FIGURE 2A. As shown in FIGURE 3A1, the data process flow diagram 300 can include a machine learning model 308 coupled to a conversion keys database 302 and clinical trial databases 304, 306. The machine learning model 308 can access data stored in the conversion keys database 302 and the clinical trial databases 304, 306 over one or more networks or data buses. The clinical trial databases 304, 306 can store clinical test data of a clinical trial. The conversion keys database 302 can store various alphanumeric mappings needed to convert data fields and underlying data values of the data fields of the clinical test data to a common data nomenclature. The machine learning model 308 can parse the data fields and the underlying data (e.g., “Data Parsing 310”) to determine whether alphanumeric codes associated with the data fields and the underlying data correspond to the alphanumeric mappings (e.g., “Recognized? 312”) of the alphanumeric mappings stored in the conversion keys database 302. If the machine learning model 308 does not recognize alphanumeric codes of data fields or underlying data values of the data fields, the machine learning model 308 can update the conversion keys database 302 to include the unrecognized alphanumeric codes (e.g., “No” branch). In such cases, the machine learning model 308 can be retrained to recognize the unrecognized alphanumeric codes. After the machine learning model 308 determines that the alphanumeric codes of the data fields and the underlying data correspond to the alphanumeric mappings (e.g., “Yes” branch), the data fields and the under data are then cleaned to correct any typographic errors and/or data formatting errors (e.g., “Data Cleaning 314”). Once cleaned, the data fields and the underlying data are converted to conform to the common data nomenclature and are compiled in accordance with the common data nomenclature to standardize reporting of the underlying data (e.g., “Data Compilation 316”). After data compilation is complete, test samples collected from the tests of the clinical trial can be collected and appended to the complied data (e.g., “Determine Sample Counts 318”). Based on the test sample counts, the data standardization module 202 can generate a count table, e.g., a pivot table (e.g., “Generate Count Table 320”). Particular test samples to be excluded in the compiled data are removed from the compiled data (e.g., “Remove Excluded Data 322”). And finally, a cumulative data report (i.e., the output database generated by the data standardization module 202 of FIGURE 2A) is generated and outputted (e.g., “Generate Cumulative Data Report 324”).

[0065] FIGURE 3A2 illustrates a data process flow diagram for a data standardization module, according to various embodiments of the present disclosure. Data standardization is also referred to herein as phase 1. A database can store one or more data files received from an administrator (or provider or vendor) of a clinical trial. There can be multiple databases, each storing one or more data files received from a different administrator (or provider or vendor) of the clinical trial. The ICON database can include one or more data files of clinical test data received from the ICON administrator of the clinical trial. A data file can, for example, have a file name: Containers Rcvd-Not Rcvd DDMMYYYY . This data file can be a report received from ICON regularly, for example, weekly. The LabMatrix database can include one or more data files received from the LabMatrix administrator of the clinical trial. In some embodiments, this database can include one or more data files received from administrators (or providers or vendors) of the clinical trial other than the ICON administrator. Table 1 A shows an exemplary data file of clinical test data (e.g., an ICON data file of samples collected or tested). Table IB shows another exemplary data file of clinical test data (e.g., a LabMatrix data file of samples collected or tested). The first row includes the labels (or names) of data fields. Other rows show exemplary data values of the data fields.

Table 1 A. An exemplary data file of clinical test data (e.g., an ICON data file). Row numbering is shown for reference purposes only.

Table IB. Another exemplary data file of clinical test data (e.g., a LabMatrix data file)

[0066] One or more conversion key files can store conversion keys. A conversion key file can have the name: Conversion Keys VI . This file can be used to drive the transformation of the ICON data file and the LabMatrix data file. This file can contain one tab for ICON test name (or container name) conversion keys and one tab for and LabMatrix test name (or container name). This file can contain one tab for ICON visit name conversion keys. This file can contain one tab for LabMatrix visit name conversion keys. The file specifies whether an ICON container name should be included or excluded in the output file, and maps the original ICON container name to the new container name (translations). Many of the translations are set to ‘flatten’ the original container name, or to be more easily recognizable (some of the original container names differ by a single character, and so it can be hard to know for sure which sample you are looking at). This conversion key can be updated easily to accommodate new containers, or to change how current containers are renamed. In some embodiments, one file can include both a tab of a test name (or container name) conversion keys and another tab of visit name conversion keys for data files received from each administrator. For example, there can be one file with one tab of ICON test name (or container name) conversion keys and one tab of ICON visit name conversion keys. There can be another file with one tab of ICON test name (or container name) conversion keys and one tab of ICON visit name conversion keys. Tables 2A-2D show exemplary conversion keys. The first rows include labels (or names) of data fields. Other rows include data values of the data fields.

[0067] In some embodiments, if any unrecognized test (or test name or sample) is detected, a message box will prompt a user to update the test name conversion keys and provide the user with a list of the unrecognized samples. The test conversion key can be updated. If any unrecognized visit (or visit name) is detected, a message box will prompt a user to update the visit name key and provide the user with a list of the unrecognized samples. If no unrecognized test (or test name or sample) is detected, the data process can process the data, for example, in the background. When the processing completes, a notification can be provided and two output files (e.g., Tables 3 and 4s) can be saved.

Table 2A. Exemplary visit name conversion keys (e.g., LabMatrix visit name conversion keys)

Table 2B. Exemplary test name (or container name) conversion keys (e.g., LabMatrix test name conversion keys)

Table 2C. Exemplary visit name conversion keys (e.g., ICON visit name conversion keys)

Table 2D. Exemplary test name (or container) name conversion keys (e.g., ICON test name conversion keys)

[0068] One output file of the process flow can include those tests (or containers) which are include in the report. The output file can, for example, have the name: Output data DDMMYYYY . The file can combine the data from the ICON data file and the LabMatrix data file. Table 3 shows an exemplary output file of included clinical test data (or included tests or samples).

Table 3. An exemplary data file of included clinical test data. Row numbering is shown for reference purposes only.

[0069] Another output file of the process flow can combine data files from multiple administrators (or providers or vendors) of the clinical trial. Table 4 shows an exemplary count table. The output file can, for example, have the name: Output counts DDMMYYYY. The output file can include a table, e.g., a pivot table, which shows the status of the samples:

[0070] “Not Drawn” is when a sample is marked as not drawn in the ICON lims system - typically due to a site informing ICON that they did not collect this sample for a patient visit.

[0071] “Received” is used for any sample that has a received date in the ICON lims system, i.e. any received sample.

[0072] “Expected” is when a sample is not yet received by ICON but has not passed the ‘Expected Receipt Date’ for the sample in the ICON lims system. The ‘Expected Receipt Date’ in the ICON database is configured based on the expected shipping schedule for each sample type (e.g. sites asked to batch ship cytokines samples monthly). So ICON has a threshold set up in the ICON database of max 30 days that these samples could be held at site. If a subject’s visit is received and these samples are not received but it is less than 30 days from visit collection, ICON would consider the samples ‘Expected.’

[0073] “Missing” is used when a sample is not yet received at ICON and has passed the ‘Expected Receipt Date’ in the ICON lims system. Using the sample example as above, if that 30 day period had passed from the visit collection date and the samples had not been received to ICON they would consider the samples ‘Missing.’

[0074] “Unknown” is used for everything else so should be fairly uncommon as it would only be used in specific unusual circumstances E.g. where the sample was unlabeled when received at ICON and the site was queried; sample was then relabeled to the correct sample and passed for testing.

Table 4. Exemplary count table.

[0075] FIGURE 3B1 illustrates a data process flow diagram 350 for a data comparison module, according to various embodiments of the present disclosure. Data comparison is also referred to herein as phase 2. In some embodiments, the data process flow diagram 350 can depict a data flow of the data comparison module 204 of FIGURE 2A. As shown in FIGURE 3B1, the data process flow diagram 350 can include a machine learning model 358 coupled to a requirement table and key database 352, an EDC eligibility screening database 354, and an EDC visits database 356. Further, the machine learning module 358 can receive a cumulative data report 380 generated by a data standardization module (e.g., the data standardization module 202 of FIGURE 2A). The machine learning model 358 can access, over one or more networks or data buses, data stored in the requirement table and key database 352, the EDC eligibility screening database 354, the EDC visits database 356, and compile or append the data with the cumulative data report 380. The requirement table and key database 352 stores data relating to collection requirements of test samples needed to be collected for each prescribed test in a clinical trial and keys with which to find or locate the collection requirements. The EDC eligibility screening database 354 stores data relating to patients screened for clinical trial eligibility. The EDC visits database 356 stores visitation data of patients visiting clinics as recorded by an EDC system. The machine learning model 308 can parse data fields and underlying data values of the data fields in the data relating to patients screened for clinical trial eligibility and the visitation data of patients (e.g., “Data Parsing 360”) to determine whether alphanumeric codes associated with data fields and underlying data values of the data fields are correct (e.g., “Recognized? 362”). If the machine learning model 358 does not recognize alphanumeric codes of data fields or underlying data values of the data fields, the machine learning model 358 can be retrained to recognize the unrecognized alphanumeric codes (e.g., “No” branch). After the machine learning model 358 determines that the alphanumeric codes of the data fields and the underlying data are correct (e.g., “Yes” branch), the visitation data of patients are appended to the cumulative data report 380. Based on the data relating to patients screened for clinical trial eligibility, data entries in the cumulative data report 380 that correspond to patients ineligible for the clinical trial are removed from the cumulative data report 380 (e.g., “Remove Excluded Data 364”) and remaining data entries of the cumulative data report 380 are compared with the collection requirements of test samples needed to be collected for each prescribed test (e.g., “Data Comparison 366”) and cross-referenced with a table, such as a pivot table (e.g., the pivot table 236 of FIGURE 2E) to determine a count of test samples received. This count of test sample is then compared with a required number of test samples and an importance threshold of test samples provided in the cumulative data report 380 (e.g., “Data Cross-Reference 368”). From this point on, a technique to generate a data deviation report can be executed (e.g., “Generate Deviation”). This technique to generate the data deviation report has been discussed above. After execution of the technique, the data deviate report is generated (e.g., “Generate Deviation Report 372”) for review and submission.

[0076] FIGURE 3B2 illustrates a data process flow diagram for a data comparison module, according to various embodiments of the present disclosure. Data comparison is also referred to herein as phase 2. One input file is the count table (e.g., the pivot table) generated. The count table can be a cleaned table of the combined data from the ICON containers received vs not received data file (e.g., a report) and the LabMatrix data file (e.g., a report) that is outputted from phase 1. [0077] One input file is an electronic data capture (EDC) visitation report. The report can contain all of the visits that have been recorded in the EDC. The requirements for every visit that has occurred can be analyzed. Visit Date from this report can be used as the Occurred Date for potential deviations (also referred to herein as central lab deviations or unaccounted samples). Table 5 shows an exemplary EDC visitation report.

Table 5. An exemplary EDC visitation report.

[0078] One input file is an EDC inclusion and exclusion criteria report. This report contains information for each subject that was screened for eligibility. The report can be used to exclude potential deviations (also referred to herein as central lab deviations or unaccounted samples) associated with ineligible/screen fail patients from the potential deviation list (also referred to herein as the potential central lab deviation list or potential central lab unaccounted samples list). Table 6 shows an exemplary EDC inclusion and exclusion criteria report.

Table 6. An exemplary EDC inclusion and exclusion criteria report.

[0079] One input file includes the requirements (also referred to herein as phase requirements). The requirements can be the requirements for each visit that is recorded in the EDC. For each EDC visit, there are a list of samples (of tests) with the corresponding visit name from the report or data file of one administrator (e.g., the ICON report) the required number of samples for that visit, and a threshold for important deviation that is used to classify minor vs important potential deviations. Table 7 shows exemplary requirements.

[0080] An input file is requirement keys (also referred to herein as phase 2 requirements keys). The requirement keys can be used to assign requirements (e.g., phase 2 requirements) for each visit that is recorded in the EDC visitation report.

Table 7. Exemplary requirements. Row numbering and column labeling are shown for reference purposes only.

Table 8. Exemplary requirement keys.

[0081] An output file can be a deviations list (also referred to herein as potential deviation lists or phase 2 potential deviations list). The deviation list is a list of potential central lab deviations or unaccounted samples, for example, in the standard protocol deviation log template. In some embodiments, all input reports from multiple administrators have been analyzed to generate the deviation list. The deviation list can be a comprehensive list of potential central lab deviations or unaccounted samples that will be ready for pre-adjudications, adjudications, and follow ups. In some embodiments, only the data from one administrator (e.g., the ICON tests received vs not received data file) has been analyzed. The list may include samples received by other administrators as potential central lab deviations or unaccounted samples. In phase 3, data from other administrators can be incorporated, and potential central lab deviations or unaccounted samples will be removed if they were received by another vendor. After phase 3, a comprehensive list of potential central lab deviations or unaccounted samples that will be ready for pre-adjudications, adjudications, and follow ups. Table 9 shows an exemplary deviation list.

Table 9. Exemplary deviation list. Row numbering is show for reference purposes only.

[0082] FIGURE 4 illustrates a computing component 400 that includes one or more hardware processors 402 and a machine-readable storage media 404 storing a set of machine- readable/machine-executable instructions that, when executed, cause the one or more hardware processors 402 to determine insufficient or deviations in tests performed or samples collected (or deviations in actual samples collected relative to expected samples collected or deviations in actual sample collection relative to expected sample collection). Determining insufficient or deviations in tests performed or samples collected can be referred to herein as central lab accounting. The set of machine-readable/machine-executable instructions, when executed, can cause the one or more hardware processors 402 to generate a data deviation report, according to various embodiments of the present disclosure. The computing component 400 may be, for example, the computing system 500 of FIGURE 5. The hardware processors 402 may include, for example, the processor(s) 504 of FIGURE 5 or any other processing unit described herein. The machine-readable storage media 404 may include the main memory 506, the read-only memory (ROM) 508, the storage 510 of FIGURE 5, and/or any other suitable machine-readable storage media described herein.

[0083] At block 406, the set of machine-readable/machine-executable instructions, when executed, can cause the one or more hardware processors 402 to perform obtaining (or receiving) clinical test data (or test reports) collected for a clinical trial from a plurality of databases (or data sources, such as files), such as 2, 3, 4, 5, or more databases (or data sources). See Tables 1 A and IB for exemplary clinical test data. The clinical test data for example, can be visualized as or represent data tables. Alternatively or additionally, the set of machine- readable/machine-executable instructions, when executed, can cause the one or more hardware processors 402 to perform obtaining (or receiving) clinical test data (or test reports) collected for a clinical trial and stored in multiple data files. Alternatively or additionally, the set of machine- readable/machine-executable instructions, when executed, can cause the one or more hardware processors 402 to obtain (or receive) data files storing clinical test data (or test reports) collected for a clinical trial, for example, from a plurality of databases (or data sources). In some embodiments, the clinical test data is collected by at least two different administrators (e.g., 2, 3, 4, 5, or more) of the clinical trial.

[0084] The clinical test data obtained from a first database and the clinical test data obtained from a second database can have different numbers of data fields (for example, corresponding to columns of data tables), such as 5, 10, 15, 20, 30, 40, 50, or more, data fields. The clinical test data obtained from a first database and the clinical test data obtained from a second database can have different numbers of data entries (for example, corresponding to rows of data tables), such as 100, 500, 1000, 5000, 10000, 50000, or more, data entries. The clinical test data obtained from two databases (or data sources) can contain different numbers of data fields and/or entries (for example, corresponding to rows of data tables).

[0085] The clinical test data obtained from a database and the corresponding standardized clinical test data comprise different numbers of data fields (e.g., corresponding to columns of a data table), such as 5, 10, 15, 20, 30, 40, 50, or more, data fields. The clinical test data obtained from a database and the corresponding standardized clinical test data comprise different numbers of data fields (e.g., corresponding to columns of a data table) different numbers of data entries (e.g., corresponding to rows of a data table), such as 100, 500, 1000, 5000, 10000, 50000, or more, data entries. The clinical test data obtained from a database can comprise more data fields and/or more data entries than the corresponding standardized clinical test data. The clinical test data obtained from a database can comprise fewer data fields and/or fewer data entries than the corresponding standardized clinical test data.

[0086] At block 408, the set of machine-readable/machine-executable instructions, when executed, can cause the one or more hardware processors 402 to perform standardizing the clinical test data to a data nomenclature, such as to conform to a data nomenclature or to conform to a common data nomenclature (or standardized data nomenclature). Standardizing the clinical test data can comprise generating standardized clinical test data with the data nomenclature.

[0087] Standardizing the clinical test data to the data nomenclature can comprise standardizing the clinical test data to the data nomenclature using conversion keys. See Tables 2A-2D for exemplary conversion keys. The number of conversion keys can be, for example, 10, 20, 30, 50, 100, or more conversion keys. There can be multiple types of conversion keys, such as 2, 3, 4, 5, or more types of conversion keys. Non-limiting exemplary types of conversion keys include visit name conversion keys and test name (or container name) conversion keys.

[0088] A conversion key can, for example, comprise a pair comprising a nonstandardized value and a standardized value. A conversion key can comprise a pair comprising a non-standardized data value of a data field and a corresponding standardized data value of the data field. For example, a conversion key can comprise a pair comprising a non-standardized data value of data field and a corresponding standardized data value of the data field. For example, a conversion key can comprise a pair comprising a non-standardized label (or name) of a data field and a corresponding standardized label (or name) of the data field.

[0089] The set of machine-readable/machine-executable instructions, when executed, can cause the one or more hardware processors 402 to perform obtaining conversion keys, for standardizing the clinical test data to a data nomenclature. The set of machine-readable/machine- executable instructions, when executed, can cause the one or more hardware processors 402 to perform obtaining (or receiving) the conversion keys from one or more conversion key databases. [0090] The conversion keys for standardizing the clinical test data can be obtained from multiple databases (or data sources). The method can include receiving files comprising the conversions keys. The conversion keys obtained from a first database (or a first data source) can be of a first type (e.g., visit name conversion keys). The conversions obtained from a second database (or a second data source) can be of a second type (e.g., visit name conversion keys). The conversion keys for standardizing the clinical test data obtained from a first database of the plurality of databases and the conversion keys for standardizing the clinical test data obtained from a second database of the plurality of databases can be different. The conversions keys for standardizing the clinical test data obtained from a first database of the plurality of databases can be specific for standardizing the clinical test data obtained from the first database. The conversions keys for standardizing the clinical test data obtained from a second database of the plurality of databases can be specific for standardizing the clinical test data obtained from the second database.

[0091] Obtaining the conversion keys can comprise obtaining visit name conversion keys, for standardizing visit names in the clinical test data, from a visit name conversion key database (or data source). Obtaining the conversion keys can comprise obtaining test name (or container name) conversion keys, for standardizing test names in the clinical test data, from a test name conversion key database (or data source).

[0092] In some embodiments, standardizing the clinical test data to the data nomenclature comprises determining one or more data values of one or more data fields of the clinical test data are not present in the conversion keys (or are not non-standardized values of the conversion keys, or are unrecognizable using the conversion keys). The number of data values of a data field of the clinical test data not present in the conversion keys can be, for example, 1, 2, 5, 10, 20, 30, 40, 50, or more data values). The number of data fields of the clinical test data with data values not present in the conversion keys can be, for example, 1, 2, 5, 10, 20, 30, 40, 50, or more data fields. Determining the one or more data values of the one or more data fields of the clinical test data are not present in the conversion keys can comprise determining one or more data values of one or more data fields of the clinical test data are not present in the conversion keys using a machine learning model. The set of machine-readable/machine-executable instructions, when executed, can cause the one or more hardware processors 402 to perform generating and/or displaying a notification that the one or more data values are not present in the conversion keys. Alternatively or additionally, the set of machine-readable/machine-executable instructions, when executed, can cause the one or more hardware processors 402 to perform requesting and/or receiving additional conversion keys for standardizing the data values to the data nomenclature. Standardizing the clinical test data to the data nomenclature can comprise modifying (or standardizing) the clinical test data to the data nomenclature using the conversion keys and the additional conversion keys. Standardizing the clinical test data to the data nomenclature can comprise modifying (or standardizing) the clinical test data to the data nomenclature using the conversion keys and a machine learning model (e.g., based on similarity scores, including edit distances, such as based on the edit distance between a data value of a data field of the clinical test data that is not present in the conversion keys (or not present in the nonstandardized values of the conversion keys) and any value of the conversion keys (or any nonstandardized values of the conversion keys or any standardized values of the conversion keys). A machine learning model of the present disclosure can be trained based on training data that includes labels for data fields and/or data values of data fields that conform to the common data nomenclature.

[0093] In some embodiments, standardizing the clinical test data to the data nomenclature comprises determining using the conversion keys, data values of one or more data fields of the clinical test data that do not conform to the common data nomenclature. Determining data values of one or more data fields of the clinical test data that do not conform to the common data nomenclature can comprise determining using a machine learning model and the conversion keys, data values of one or more data fields of the clinical test data that do not conform to the common data nomenclature. Standardizing the clinical test data to the data nomenclature can comprise modifying (or standardizing) the nonconforming data to the common data nomenclature. Standardizing the clinical test data to the data nomenclature can comprise modifying (or standardizing) using a machine learning model (e.g., a machine learning model based on similarity scores such as edit distances).

[0094] In some embodiments, standardizing the clinical test data to the data nomenclature comprises determining using a machine learning model (e.g., a machine learning model based on similarity scores such as edit distances) and the conversion keys, data values of one or more data fields of the clinical test data that do not conform to the common data nomenclature. Standardizing the clinical test data to the data nomenclature can comprise modifying (or modifying) based on the first machine learning model, the nonconforming data to the common data nomenclature. Standardizing the clinical test data to the data nomenclature can comprise modifying (or modifying) based on a machine learning model, the nonconforming data to the common data nomenclature.

[0095] In some embodiments, standardizing the clinical test data to the data nomenclature further comprises determining an error in a data value of the clinical test data. Determining the error in the data value of the clinical test data can comprise determining an error in a data value of the clinical test data using one or more conversion keys and/or a machine learning model. Standardizing the clinical test data to the data nomenclature can comprise modifying the data value containing the error to correct the error. The error can comprise a typographical error and/or a data format error.

[0096] In some embodiments, standardizing the clinical test data to the data nomenclature further comprises generating an output (e.g., an output file) which can be, for example, stored in an output database. The output database (or file) can include a compilation of the clinical test data confirming to the common data nomenclature. The output file can be referred to herein as a cumulative data file or report. Standardizing the clinical test data to the data nomenclature can further comprise generating a count table. The count table can be a pivot table. The count table can include a count of each of one or more test samples collected from a subject at each of one or more time points of the clinical trial.

[0097] At block 410, the set of machine-readable/machine-executable instructions, when executed, can cause the one or more hardware processors 402 to perform comparing the standardized clinical test data to collection requirements. Comparing the standardized clinical test data to the collection requirements can comprise obtaining Electronic Data Collection (EDC) visitation data from a database (or data source). Comparing the standardized clinical test data to the collection requirements can comprise determining, at least one unrecognized visit (such as 10, 50, 100, 500, 1000, 5000, unrecognized visits) or visit with an unrecognized name in the EDC visitation data. An unrecognized visit can be a visit with a name not present in the requirement keys and/or the collection requirements. Comparing the standardized clinical test data to the collection requirements can comprise generating and/or displaying a notification regarding the at least one unrecognized visit in the EDC visitation data. Comparing the standardized clinical test data to the collection requirements can comprise requesting and/or receiving requirements for the unrecognized visit. Determining the at least one unrecognized visit in the EDC visitation data can comprise determining based on a machine learning model, the at least one unrecognized visit in the EDC visitation data.

[0098] A machine learning model can be trained based on training data that includes recognized visit name or visit names that conform to the common data nomenclature. A machine learning model can be, for example, trained based on training data that includes recognized visit name or visit names that conform to the common data nomenclature. A machine learning model can be based on, for example, similarity scores, such as edit distances.

[0099] Comparing the standardized clinical test data to the collection requirements can comprise obtaining Electronic Data Collection (EDC) visitation data from a database (or data source). Comparing the standardized clinical test data to the collection requirements can comprise determining, at least one unrecognized visit or one visit with an unrecognized name in the EDC visitation data. An unrecognized visit can be a visit with a name not present in the requirement keys and/or the collection requirements. Comparing the standardized clinical test data to the collection requirements can comprise generating and/or displaying a notification regarding the at least one unrecognized visit in the EDC visitation data. Comparing the standardized clinical test data to the collection requirements can comprise requesting and/or receiving requirements for the unrecognized visit. Determining the at least one unrecognized visit in the EDC visitation data can comprise: determining based on a machine learning model, the at least one unrecognized visit in the EDC visitation data. A machine learning model can be, for example, trained based on training data that includes recognized visit name or visit names that conform to the common data nomenclature. A machine learning model can be based on, for example, similarity scores, such as edit distances.

[0100] In some embodiments, the set of machine-readable/machine-executable instructions, when executed, can cause the one or more hardware processors 402 to perform determining collection requirements for each visit in the Electronic Data Collection (EDC) visitation data. Determining the collection requirements for each visit in the EDC visitation data can comprise determining the collection requirements for each visit in the EDC visitation data from a collection requirements table and a collection requirements key. The method can comprise receiving the collection requirements table and the collection requirement key from at least one database (or at least one data source), such as 2, 3, 4, 5, or more, databases. Receiving the collection requirements table and the collection requirement key can comprise receiving the collection requirements table and the collection requirement key from one database (or data source). Receiving the collection requirements table and the collection requirement key can comprise receiving the collection requirements table from a first database (or data source) and the collection requirement key from a second database (or data source). In some embodiments, the set of machine-readable/machine-executable instructions, when executed, can cause the one or more hardware processors 402 to perform obtaining collection requirements, or a portion thereof, from a database (or data source).

[0101] In some embodiments, comparing the standardized clinical test data to the collection requirements comprises determining collection requirements for each visit of a subject in Electronic Data Collection (EDC) visitation data. Comparing the standardized clinical test data to the collection requirements further can comprise comparing test samples collected for the visit of the subject in the count table to the collection requirements for the visit of the subject to determine insufficient samples collected for the clinical trial.

[0102] At block 412, the set of machine-readable/machine-executable instructions, when executed, can cause the one or more hardware processors 402 to perform generating based on the comparison, a deviation report or list. The deviation report can include information relating to one or more insufficient tests performed or samples collected (or collection) for the clinical trial, including one or more deviations in tests performed or samples collected (actual tests performed or samples collected relative to the tests expected to be performed or samples expected to be collected), or insufficient tests or samples collected from a subject at each of one or more time points of the clinical trial.

[0103] In some embodiments, the methods disclosed herein can contribute to understanding of subject health. One or more of the following can be generated or achieved: (1) Generate summary of collections for key subjects (e.g. responders or by cohort/dose level (DL)). (2) Review of site performance metrics in tests performed or sample collections. (3) Facilitate reporting on the protocol deviation log. In some embodiments, the methods disclosed herein can assist in data cleaning. For example, the methods can identify incorrect meta-data (e.g. subject ID, visit) at lab vendor. The methods can identify incorrect EDC entries. In some embodiments, the method can help preserve sample stability and timely testing. For example, the methods can identify samples still needing shipment from site. The methods can identify missed collections that may be ‘caught up’ on.

[0104] The techniques described herein, for example, are implemented by one or more special-purpose computing devices. The special-purpose computing devices may be hard-wired to perform the techniques, or may include circuitry or digital electronic devices such as one or more application-specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) that are persistently programmed to perform the techniques, or may include one or more hardware processors programmed to perform the techniques pursuant to program instructions in firmware, memory, other storage, or a combination.

[0105] FIGURE 5 is a block diagram that illustrates a computer system 500 upon which any of various embodiments described herein may be implemented. The computer system 500 includes a bus 502 or other communication mechanism for communicating information, one or more hardware processors 504 coupled with bus 502 for processing information. A description that a device performs a task is intended to mean that one or more of the hardware processor(s) 504 performs.

[0106] The computer system 500 also includes a main memory 506, such as a random access memory (RAM), cache and/or other dynamic storage devices, coupled to bus 502 for storing information and instructions to be executed by processor 504. Main memory 506 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 504. Such instructions, when stored in storage media accessible to processor 504, render computer system 500 into a special-purpose machine that is customized to perform the operations specified in the instructions. [0107] The computer system 500 further includes a read only memory (ROM) 508 or other static storage device coupled to bus 502 for storing static information and instructions for processor 504. A storage device 510, such as a magnetic disk, optical disk, or USB thumb drive (Flash drive), etc., is provided and coupled to bus 502 for storing information and instructions.

[0108] The computer system 500 may be coupled via bus 502 to output device(s) 512, such as a cathode ray tube (CRT) or LCD display (or touch screen), for displaying information to a computer user. Input device(s) 514, including alphanumeric and other keys, are coupled to bus 502 for communicating information and command selections to processor 504. Another type of user input device is cursor control 516. The computer system 500 also includes a communication interface 518 coupled to bus 502.

[0109] Unless the context requires otherwise, throughout the present specification and claims, the word “comprise” and variations thereof, such as, “comprises” and “comprising” are to be construed in an open, inclusive sense, that is as “including, but not limited to.” Recitation of numeric ranges of values throughout the specification is intended to serve as a shorthand notation of referring individually to each separate value falling within the range inclusive of the values defining the range, and each separate value is incorporated in the specification as it were individually recited herein. Additionally, the singular forms “a,” “an” and “the” include plural referents unless the context clearly dictates otherwise. The phrases “at least one of,” “at least one selected from the group of,” or “at least one selected from the group consisting of,” and the like are to be interpreted in the disjunctive (e.g., not to be interpreted as at least one of A and at least one of B).

[0110] Reference throughout this specification to “one embodiment” or “an embodiment” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment, but may be in some instances. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiment.

[OHl] A component being implemented as another component may be construed as the component being operated in a same or similar manner as another component, and/or comprising same or similar features, characteristics, and parameters as another component.

Claims

WHAT IS CLAIMED IS:

1. A computer-implemented method comprising: obtaining, by a computing system, clinical test data collected for a clinical trial from a plurality of databases; standardizing, by the computing system, the clinical test data to conform to a common data nomenclature; comparing, by the computing system, the standardized clinical test data to collection requirements; and generating, by the computing system, based on the comparison, a deviation report, wherein the deviation report includes information relating to insufficient test samples collected for the clinical trial.

2. The computer-implemented method of claim 1, wherein the clinical test data obtained from a first database and the clinical test data obtained from a second database have different numbers of data fields.

3. The computer-implemented method of any one of claims 1-2, further comprising: obtaining, by the computing system, conversion keys, for standardizing the clinical test data to conform to the common data nomenclature, from one or more conversion key databases.

4. The computer-implemented method of claim 3, wherein the conversion keys for standardizing the clinical test data obtained from a first database of the plurality of databases and the conversion keys for standardizing the clinical test data obtained from a second database of the plurality of databases are different.

5. The computer-implemented method of claim 2-4, wherein a conversion key comprises a pair comprising a non-standardized value and a standardized value.

6. The computer-implemented method of any one of claims 2-5, wherein obtaining the conversion keys comprises: obtaining, by the computing system, visit name conversion keys, for standardizing visit names in the clinical test data, and test name conversion keys, for standardizing test names in the clinical test data, from a visit name conversion key database and a test name conversion key database.

7. The computer-implemented method of any one of claims 1-6, wherein standardizing the clinical test data to conform to the common data nomenclature comprises: standardizing, by the computing system, the clinical test data to conform to the common data nomenclature using conversion keys.

8. The computer-implemented method of any one of claims 1-6, wherein standardizing the clinical test data to conform to the common data nomenclature comprises: determining, by the computing system, one or more data values of one or more data fields of the clinical test data are not present in the conversion keys, optionally wherein determining the one or more data values of the one or more data fields of the clinical test data are not present in the conversion keys comprises determining the one or more data values of the one or more data fields of the clinical test data are not present in the conversion keys comprises using a first machine learning model; and generating and/or displaying, by the computing system, a notification that the one or more data values are not present in the conversion keys and/or requesting and/or receiving, by the computing system, additional conversion keys for standardizing the data values to conform to the common data nomenclature.

9. The computer-implemented method of claim 8, wherein standardizing the clinical test data to conform to the common data nomenclature comprises: modifying, by the computing system, the clinical test data to conform to the common data nomenclature using the conversion keys and the additional conversion keys.

10. The computer-implemented method of any one of claims 1-6, wherein standardizing the clinical test data to conform to the common data nomenclature comprises: determining, by the computing system, using the conversion keys, data values of one or more data fields of the clinical test data that do not conform to the common data nomenclature, optionally wherein determining, using the conversion keys, data values of one or more data fields of the clinical test data that do not conform to the common data nomenclature comprises: determining, by the computing system, using a machine learning model and the conversion keys, data values of one or more data fields of the clinical test data that do not conform to the common data nomenclature; and modifying, by the computing system, using a first machine learning model, the nonconforming data to the common data nomenclature, optionally wherein modifying the nonconforming data to the common data nomenclature comprises: modifying, by the computing system, using a first machine learning model, the nonconforming data to the common data nomenclature.

11. The computer-implemented method of any one of claims 1-6, wherein standardizing the clinical test data to conform to the common data nomenclature comprises: determining, by the computing system, using a first machine learning model and the conversion keys, data values of one or more data fields of the clinical test data that do not conform to the common data nomenclature; and modifying, by the computing system, based on the first machine learning model, the nonconforming data to the common data nomenclature.

12. The computer-implemented method of any one of claims 10-11, wherein the first machine learning model is trained based on training data that includes labels for data fields or data values of the data fields that conform to the common data nomenclature.

13. The computer-implemented method of any one of claims 1-12, the clinical test data obtained from a database and the corresponding standardized clinical test data comprise different numbers of data fields and/or different numbers of data entries, optionally wherein the clinical test data obtained from a database comprise more data fields and/or more data entries than the corresponding standardized clinical test data.

14. The computer-implemented method of any one of claims 1-13, wherein standardizing the clinical test data to conform to the common data nomenclature further comprises: determining, by the computing system, an error in a data value of the clinical test data; and modifying, by the computing system, the data value containing the error to correct the error.

15. The computer-implemented method of claim 14, wherein the error comprises a typographical error and/or a data format error.

16. The computer-implemented method of any one of claims 1-15, wherein standardizing the clinical test data to conform to the common data nomenclature further comprises: generating, by the computing system, an output database, wherein the output database includes a compilation of the clinical test data confirming to the common data nomenclature; and generating, by the computing system, a count table, wherein the count table includes a count of each of one or more test samples collected from a subject at each of one or more time points of the clinical trial.

17. The computer-implemented method of any one of claims 1-16, wherein comparing the standardized clinical test data to the collection requirements comprises: obtaining, by the computing system, Electronic Data Collection (EDC) visitation data from a database; determining, at least one unrecognized visit in the EDC visitation data; and generating and/or displaying, by the computing system, a notification regarding the at least one unrecognized visit in the EDC visitation data, and/or requesting and/or receiving, by the computing system, requirements for the unrecognized visit.

18. The computer-implemented method of claim 17, wherein determining the at least one unrecognized visit in the EDC visitation data comprises: determining, by the computing system, based on a second machine learning model, the at least one unrecognized visit in the EDC visitation data.

19. The computer-implemented method of any one of claims 1-18, further comprising: determining, by the computing system, collection requirements for each visit in the Electronic Data Collection (EDC) visitation data.

20. The computer-implemented method of claim 19, wherein determining the collection requirements for each visit in the EDC visitation data comprises: determining, by the computing system, the collection requirements for each visit in the EDC visitation data from a collection requirements table and a collection requirements key, optionally wherein the method comprises: receiving the collection requirements table and the collection requirement key from at least one database, optionally wherein receiving the collection requirements table and the collection requirement key comprises: receiving the collection requirements table and the collection requirement key from one database, and/or optionally wherein receiving the collection requirements table and the collection requirement key comprises: receiving the collection requirements table and the collection requirement key from two databases.

21. The computer-implemented method of any one of claims 1-20, further comprising: obtaining, by the computing system, collection requirements, or a portion thereof, from a database.

22. The computer-implemented method of any one of claims 1-21, wherein comparing the standardized clinical test data to the collection requirements further comprises: determining, by the computing system, collection requirements for each visit of a subject in Electronic Data Collection (EDC) visitation data; and comparing, by the computing system, test samples collected for the visit of the subject in the count table to the collection requirements for the visit of the subject to determine insufficient samples collected for the clinical trial.

23. The computer-implemented method of any one of claims 1-22, wherein the clinical test data is collected by at least two different administrators of the clinical trial.

24. A system comprising: a processor; and a memory storing instructions that, when executed by the processor, causes the system to perform: obtaining clinical test data collected for a clinical trial from a plurality of databases; standardizing the clinical test data to conform to a common data nomenclature using conversion keys; determining collection requirements; comparing the standardized clinical test data to the collection requirements; and generating, based on the comparison, a deviation report, wherein the deviation report includes information relating to insufficient test samples collected for the clinical trial.

25. The system of claim 24, wherein the clinical test data obtained from a first database and the clinical test data obtained from a second database have different numbers of data fields.

26. The system of any one of claims 24-25, wherein the instructions, when executed by the processor, cause the system to perform: obtaining the conversion keys, for standardizing the clinical test data to conform to the common data nomenclature, from one or more conversion key databases.

27. The system of claim 26, wherein the conversion keys for standardizing the clinical test data obtained from a first database of the plurality of databases and the conversion keys for standardizing the clinical test data obtained from a second database of the plurality of databases are different.

28. The system of claim 25-27, wherein a conversion key of the conversion keys comprises a pair comprising a non-standardized value and a standardized value.

29. The system of any one of claims 25-28, wherein obtaining the conversion keys comprises: obtaining visit name conversion keys, for standardizing visit names in the clinical test data, and test name conversion keys, for standardizing test names in the clinical test data, from a visit name conversion key database and a test name conversion key database.

30. The system of any one of claims 24-29, wherein standardizing the clinical test data to conform to the common data nomenclature comprises: determining one or more data values of one or more data fields of the clinical test data are not present in the conversion keys; and generating and/or displaying a notification that the one or more data values are not present in the conversion keys and/or requesting and/or receiving additional conversion keys for standardizing the data values to conform to the common data nomenclature.

31. The system of claim 30, wherein standardizing the clinical test data to conform to the common data nomenclature comprises: modifying the clinical test data to conform to the common data nomenclature using the conversion keys and the additional conversion keys.

32. The system of any one of claims 24-29, wherein standardizing the clinical test data to conform to the common data nomenclature comprises: determining using the conversion keys, data values of one or more data fields of the clinical test data that do not conform to the common data nomenclature; and modifying using a first machine learning model, the nonconforming data to the common data nomenclature.

33. The system of any one of claims 24-29, wherein standardizing the clinical test data to conform to the common data nomenclature comprises: determining using a first machine learning model and the conversion keys, data values of one or more data fields of the clinical test data that do not conform to the common data nomenclature; and modifying based on the first machine learning model, the nonconforming data to the common data nomenclature.

34. The system of any one of claims 32-33, wherein the first machine learning model is trained based on training data that includes labels for data fields or data values of the data fields that conform to the common data nomenclature.

35. The system of any one of claims 23-34, the clinical test data obtained from a database and the corresponding standardized clinical test data comprise different numbers of data fields and/or different numbers of data entries, optionally wherein the clinical test data obtained from a database comprise more data fields and/or more data entries than the corresponding standardized clinical test data.

36. The system of any one of claims 24-35, wherein standardizing the clinical test data to conform to the common data nomenclature further comprises: determining an error in a data value of the clinical test data; and modifying the data value containing the error to correct the error.

37. The system of claim 36, wherein the error comprises a typographical error and/or a data format error.

38. The system of any one of claims 24-37, wherein the instructions, when executed by the processor, cause the system to perform: generating an output database, wherein the output database includes a compilation of the clinical test data confirming to the common data nomenclature; and generating a count table, wherein the count table includes a count of each of one or more test samples collected from a subject at each of one or more time points of the clinical trial.

39. The system of any one of claims 24-38, wherein comparing the standardized clinical test data to the collection requirements comprises: obtaining Electronic Data Collection (EDC) visitation data from a database; determining, at least one unrecognized visit in the EDC visitation data; and generating and/or displaying a notification regarding the at least one unrecognized visit in the EDC visitation data, and/or requesting and/or receiving requirements for the unrecognized visit.

40. The system of claim 39, wherein determining the at least one unrecognized visit in the EDC visitation data comprises: determining based on a second machine learning model, the at least one unrecognized visit in the EDC visitation data.

41. The system of any one of claims 24-40, wherein determining the collection requirements comprises: determining collection requirements for each visit in the Electronic Data Collection (EDC) visitation data.

42. The system of claim 41, wherein determining the collection requirements for each visit in the EDC visitation data comprises: determining the collection requirements for each visit in the EDC visitation data from a collection requirements table and a collection requirements key, optionally wherein determining the collection requires comprises: receiving the collection requirements table and the collection requirement key from at least one database, optionally wherein receiving the collection requirements table and the collection requirement key comprises: receiving the collection requirements table and the collection requirement key from one database, and/or optionally wherein receiving the collection requirements table and the collection requirement key comprises: receiving the collection requirements table and the collection requirement key from two databases.

43. The system of any one of claims 24-42, wherein comparing the standardized clinical test data to the collection requirements further comprises: determining collection requirements for each visit of a subject in Electronic Data Collection (EDC) visitation data; and comparing test samples collected for the visit of the subject in the count table to the collection requirements for the visit of the subject to determine insufficient samples collected for the clinical trial.

44. The system of any one of claims 24-43, wherein the clinical test data is collected by at least two different administrators of the clinical trial.

45. A non-transitory storage media storing machine instructions that, when executed by a processor of a computing system, causes the computing system to perform: obtaining clinical test data from a plurality of databases; standardizing the clinical test data to conform to a common data nomenclature; comparing the standardized clinical test data to collection requirements; and generating, based on the comparison, a deviation report, wherein the deviation report includes information relating to insufficient test samples collected for a clinical trial.