WO2016049187A1

WO2016049187A1 - Systems, methods, and software for processing a question relative to one or more of a plurality of population research databases

Info

Publication number: WO2016049187A1
Application number: PCT/US2015/051724
Authority: WO
Inventors: Robert Rosen
Original assignee: Lincolnpeak
Priority date: 2014-09-23
Filing date: 2015-09-23
Publication date: 2016-03-31

Abstract

Systems, methods, and software for processing a question relative to one or more of a plurality of population research databases. Aspects of the disclosure allow researchers to find appropriate data sets or data providers in an efficient manner by enabling researchers to identify characteristics of proprietary data without requiring curators of such data to reveal the data itself. In this way, legal concerns can be assuaged and data- sharing agreements can be avoided until appropriate data sets or data providers have already been identified, thus minimizing the cost and maximizing the benefit of obtaining such data- sharing agreements. In some embodiments, such data- sharing agreements may be automatedly arranged and research questions may be automatedly and comprehensively answered.

Description

SYSTEMS, METHODS, AND SOFTWARE FOR PROCESSING A QUESTION RELATIVE TO ONE OR MORE OF A PLURALITY OF POPULATION RESEARCH DATABASES

RELATED APPLICATION DATA

[0001] This application claims the benefit of priority of U.S. Provisional Patent Application Serial No. 62/054,026, filed on September 23, 2014, and titled "NETWORK OF NETWORK SERVICES," which is incorporated by reference herein in its entirety.

FIELD OF THE INVENTION

[0002] The present invention generally relates to the field of information technology. In particular, the present invention is directed to systems, methods, and software for processing a question relative to one or more of a plurality of population research databases.

BACKGROUND

[0003] Approximately eighty percent of clinical research studies fail because researchers cannot locate appropriate data sets or data providers. Often, this is so because electronic health record (EHR) systems do not use semantically equivalent meanings of terms and, thus, cannot be easily searched to identify relevant populations for particular research studies. Further, due to HIPAA and other legal and social concerns, curators of health records are extremely reluctant to release data to researchers without first having data-sharing agreements in place, which can be complicated and expensive to arrange. Accordingly, new systems, methods, and software are needed to allow researchers to find appropriate data sets or data providers in a more time and cost-effective manner.

SUMMARY OF THE DISCLOSURE

[0004] In one implementation, a method of processing a question relative to one or more of a plurality of distinct and separate population research databases using semantics embodied in one or more smart concepts is provided. The method includes: receiving, from a requestor, a question comprising one or more smart concepts, the one or more smart concepts defined as a function of information in at least one of the plurality of population research databases; identifying one or more population research databases that contain information meeting a threshold, the threshold

corresponding to minimum required types and amounts of information usable for generating a response to the question; determining a relevance quotient for each population research database identified by the identifying as a function of the amount of information usable for generating a response to the question in each respective population research database; and providing, to the requestor, an identification of one or more population research databases identified in the identifying step and the relevance quotient determined for the one or more population research databases in the determining step, or information derived therefrom.

[0005] In another implementation, a machine-readable storage medium containing machine- executable instructions for performing a method of processing a question relative to one or more of a plurality of distinct and separate population research databases using semantics embodied in one or more smart concepts is provided. The method may be executed in a question processing system and the machine-executable instructions may comprise machine-executable instructions for performing any one or more of the methods and/or functionalities disclosed herein.

[0006] In still another implementation, an apparatus comprising a processor and memory is provided. The memory may contain computer executable instructions, which, when executed, cause the processor to execute any one or more of the methods and/or functionalities disclosed herein.

[0007] In yet another implementation, a question processing system is provided. The system may comprise a database interface and software for controlling the database interface, the software being designed and configured to perform any one or more of the methods and/or functionalities disclosed herein.

[0008] These and other aspects and features of non-limiting embodiments of the present invention will become apparent to those skilled in the art upon review of the following description of specific non-limiting embodiments of the invention in conjunction with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

[0009] For the purpose of illustrating the invention, the drawings show aspects of one or more embodiments of the invention. However, it should be understood that the present invention is not limited to the precise arrangements and instrumentalities shown in the drawings, wherein:

FIG. 1 is a flow diagram illustrating a method of processing a question relative to one or more of a plurality of population research databases;

FIG. 2 is a high-level block diagram illustrating an exemplary question processing system that may be used to implement the method of FIG. 1;

FIG. 3 is a flow diagram illustrating a method of processing concepts within various data sources; FIG. 4 is a representative screenshot depicting various aspects of an exemplary question creating interface implemented in accordance with aspects of the invention showing various semantic concepts;

FIG. 5 is a flow diagram illustrating a method of generating an estimated prevalence for various data sources;

FIG. 6 is a representative screenshot depicting various aspects of an exemplary question creating interface implemented in accordance with aspects of the invention showing how a prevalence can be generated prior to posing a question to data sources;

FIG. 7 is a representative screenshot depicting various aspects of an exemplary question creating interface implemented in accordance with aspects of the invention showing a map and estimated populations of potential data sources;

FIG. 8 is a flow diagram illustrating a further method of processing a question relative to one or more of a plurality of population research databases that may be implemented using the system of FIG. 2; and

FIG. 9 is a block diagram of a computing system that can be used to implement any one or more of the methodologies disclosed herein and any one or more portions thereof.

DETAILED DESCRIPTION

[0010] At a high level, aspects of the present disclosure are directed to systems, methods, and software for processing a question relative to one or more of a plurality of population research databases. Aspects of the disclosure allow researchers to find appropriate data sets or data providers in an efficient manner by enabling researchers to identify characteristics of proprietary data without requiring curators of such data to reveal the data itself. In this way, legal concerns can be assuaged and data- sharing agreements can be avoided until appropriate data sets or data providers have already been identified, thus minimizing the cost and maximizing the benefit of obtaining such data- sharing agreements because such agreements only need to be arranged for data sets or data providers that have already been established as being relevant to a researcher's particular study.

[0011] Referring now to the drawings, FIG. 1 illustrates an exemplary method 100 of processing a question relative to one or more of a plurality of population research databases.

Method 100 may be implemented in an apparatus, such as in exemplary database interface 200 within question processing system 204 of FIG. 2, using a computing system, such as computing system 900 of FIG. 9 or a network of such or similar computing systems (e.g. , a wide-area network, a global network (such as the Internet), and/or a local area network, among others), that is generally: 1) programmed with instructions for performing steps of a method of the present disclosure; 2) capable of receiving and/or storing data necessary to execute such steps; and 3) capable of providing any user interface that may be needed for a user to interact with the database interface, including setting the system up for question processing and reviewing any responses produced, among other things. Those skilled in the art will readily appreciate that aspects of the present disclosure can be implemented with and/or within any one or more of numerous devices, ranging from self-contained devices, such as a smartphone, tablet, computer, laptop computer, desktop computer, server, or webserver, to a network of two or more of any of these devices. Fundamentally, there is no limitation on the physical construct of the question processing system, as long as it can provide one or more of the features and functionality described herein. In some embodiments, depending on specific

implementation, one or more steps of method 100 and/or any other method(s) incorporating features/functionality disclosed herein may be implemented substantially in real-time, enabling identification of appropriate data sets or data providers in a fraction of a second that until now would typically have taken months to years to be performed . FIGS. 2 and 9, described more fully below, illustrate an exemplary question processing system 204 and computer system 900, respectively, that can be used to implement various steps of method 100 and/or any other method incorporating features/functionality disclosed herein.

[0012] Prior to describing exemplary method 100, parts of question processing system 204 will first be described to provide context for method 100. Referring to FIG. 2, system 204 may include a database interface 200 for processing questions, which may comprise software 208 and memory 212. Memory 212 may represent any part or the entirety of the memory used by database interface 200 in providing its functionality. Depending upon the particular implementation at issue, memory 212 may be volatile memory, such as primary storage memory (e.g. , random-access memory (RAM) or cache memory, etc.), non- volatile memory, such as secondary storage memory (e.g. , a magnetic drive, optical drive, etc.), and any combination thereof and in any number of memory devices.

Those skilled in the art will readily understand the types of memory(ies) needed for memory 212 for any particular instantiation of a database interface implemented in accordance with the present disclosure. [0013] Software 208 may include a question processor 216 that researchers may manipulate or access, for example via email or other appropriate means, such as an appropriate graphical user interface, which may be communicatively coupled with database interface 200 and provided directly via software 208, e.g. , via optional user interface 220, or indirectly through a separate and/or third party website, such as in one or more question portals 224, among others, in order to provide the software with a question 228, which may comprise a structured or unstructured query, application, or request of any type for information. In some embodiments, question portal 224 may be implemented within database interface 200 and/or the database interface may be implemented within the question portal, as appropriate. For example, researchers may access or utilize question processor 216 via another system or apparatus (e.g., a home computer connected to the Internet) or directly via one or more user input devices (e.g. , keyboard, mouse, etc.) associated with database interface 200.

[0014] Questions 228 provided by researchers may include one or more smart concepts, described further hereinbelow, which the present inventors have developed in order to enable appropriate semantically and clinically meaningful interrogation of two or more population research databases 232(1) to 232(N), such as "Population Research Database 1" 232(1), "Population Research Database 2" 232(2), "Population Research Database 3" 232(3), and up to any number of third parties (designated by "Population Research Database N" 232(N)). Generally, population research databases are highly denormalized compared to highly normalized databases typically used, for example, for single patient records, as much of the information stored in population research databases is extracted from normalized databases and reorganized to optimize for reporting, research, and/or analysis. For example, in a population research database, details of an individual, such as their name, may be stored in one record and an index may be created pointing to that record; a separate table for diagnosis information may then be provided in connection with the index such that the index can establish correlations between particular individual records and individual diagnoses in the table. By contrast, in a highly normalized database, e.g. , one structured using third normal form, data is organized to permit rapid updating, e.g. , by inserting and deleting, and, accordingly, data is typically distributed across many parts of the database.

[0015] It is noted that while various components of software are described herein, these descriptions are not intended to imply that any particular configuration of the corresponding software code is required. For example, various software components described herein should not be construed to be required to be embodied in a discrete set of code independent of other software code, such as that for software 208. Rather, components of software described herein are merely used as a convenient way to refer to underlying functionality.

[0016] After a researcher provides question 228 to software 208 of database interface 200, the database interface may translate and/or distribute that question to two or more population research databases 232(1) to 232(N) in order to identify one or more of such databases that contain information relevant to the question and return an appropriate response 236 to the researcher, as described further below. In some embodiments, one or more questions 240 and/or responses 244 may be stored in memory 212, such as in cases where a question has been posed but one or more population research databases 232(1) to 232(N) are offline and the question needs to be stored for later use when the databases come back online or in cases where a researcher poses a question and an appropriate responses is generated but the researcher is unavailable (e.g. , the particular question portal 224 used by that researcher or the researcher's email server is offline). In some embodiments, question 228 may be a translated version of a question posed at question portal 224, which may then be translated again one or more times such that each of population research databases 232(1) to 232(N) can process the question and/or be appropriately accessed by database interface 200. These translations can occur at question portal 224, in question processor 216, and/or in software controlling one or more population research databases 232(1) to 232(N), as appropriate.

[0017] For the sake of completeness, it is noted that the unlabeled arrows in FIG. 2 represent temporary and/or permanent data connections that enable data communication between various components of question processing system 204. These connections may be implemented in the form of, for example, data buses, Internet connections, local network connections, and/or any other connections between electronic devices or portions of one or more devices.

[0018] With the context of question processing system 204 established and referring again to FIG. 1, and also FIG. 2, method 100 may begin at step 105, at which question 228 comprising one or more smart concepts, the one or more smart concepts defined as a function of information in at least one of the plurality of population research databases, is received from a requestor (such as, but not limited to, a researcher). A concept, in one sense, is a formalization of medical meaning with respect to large population epidemiological studies; examples include age, sex, race or ethnicity, diagnosis, procedure, etc. Concepts may describe, e.g. , the state of a patient at a point in time, the activities that occur in an encounter between a provider and a patient, and the activities and costs that occur between a patient, provider, and a payer, among others. Based upon the present inventors' review of major common data models (CDMs) used in industry, a canonical set of concepts

{C ...C_n} were defined that represent the key concepts in use across all major CDMs, which can be adjusted and/or expanded as required. Within each CDM, the information related to these concepts is stored in a different fashion. For example, in CDM_1; the "sex" concept might be stored in a demographic record in a "sex" field and encoded as one of { 1,2,... }, where T means "female" and '2' means "male," while in CDM₂, the "sex" concept might be stored in an encounter record in a "sex" field and encoded as one of { Τ','Μ',... }, where 'F' means "female" and 'M' means "male." Translations like those described above can be used to reconcile these differences. For example, a common data format may be used by database interface 200 to represent the "sex" concept, optionally including a comprehensive list of each distinct possible semantic meaning, and "male" may be represented by a hexadecimal value of ΌΟΟΑ', while "female" may be represented by a hexadecimal value of '0010'. Translations between the native data format for "sex" of CDMi and CDM₂ and the common data format of database interface 200 may involve converting between the native and common data types as necessary to interface with databases using different CDMs.

Accordingly, if database interface 200 needs to ask a question involving a "sex" concept of "female" of a database using CDM_1; it may convert its common representation of "female", i.e., ΌΟΙΟ', to the native CDMi representation, i.e., Ί', such that the database can understand and properly process the question. Similarly, when database interface 200 receives a response from a database using CDM₂ that includes a "sex" concept of "male", the database interface may convert the CDM₂ native data format, i.e., 'M', to the common data format of the database interface, i.e., ΌΟΟΑ' . Similar translations can be performed between question portal 224 and/or user interface 220 and database interface 200, as necessary and appropriate for the database interface to understand and properly process questions 228 and provide responses 236 to researchers.

[0019] After reconciling each CDM's concepts, e.g., CDMi concepts {Cn- · -C_ln} => canonical concepts {Ci...C_n} and CDM₂ concepts {C₂₁...C_2n} => canonical concepts {Ci...C_n}, standard terms and values are obtained but not necessarily standard meanings. For example, in certain

CDMs, the "sex" concept may have semantic values of {Male, Female, Unknown}, where "Male" and "Female" are well understood but "Unknown" may mean "anything else." In other CDMs, the

"sex" concept may have semantic values of {Male, Female, Ambiguous, Other, Unknown}. Again,

"Male" and "Female" are well understood, but "Ambiguous" may mean "transgender," "Other" may means "not male, not female, and not ambiguous, but sex noted," and "Unknown" may mean "sex not noted." A consequence of this is that if a researcher wants to use "Male" and "Female" as values for the "sex" concept in a question posed to data providers using CDMs that use equivalent semantic values, they can do so. However, if a researcher wants to use "Ambiguous" or "Other" as values for the "sex" concept in a question, then that question cannot be posed meaningfully to any data provider that uses a CDM in which those semantic meanings are not supported. Similarly, if a researcher wants to use "Unknown" as a value for the "sex" concept in a question, such a question can be posed to CDMs that do not use equivalent semantic values, but the results may be unreliable because "Unknown" as a population within certain CDMs may represent a population subset of "Unknown" as a population within a CDM that uses different semantic values. Depending upon the researcher's research protocol and particular question, this unreliability may or may not be acceptable. However, in order to codify these distinctions and subtleties, the present inventors have, in one embodiment of the invention, developed "smart concepts" based on machine rules for about fifty canonical concepts in order to enable system 204 to properly handle interactions with researchers and enable other functionality described hereinbelow.

[0020] In the context of population research databases like population research database 232(1) to 232(N), CDMs may govern how data is stored. For example, although this condition is not required to practice various aspects of the present disclosure, assume there is a one-to-one relationship between the set of data models {DMi...DM_n} and the set of population research databases {PRBi...PRB_n}. For each concept C there are a series of concept values Cy that represent the states for that concept (for example, "male" and "female" for sex). Given one concept Ci and its implementation in two data models DMi and DM₂, the relationship between the semantic

implementation of Ci in data models DMi and DM₂ or Ci (DMi) <=> Ci (DM₂) can be categorized as one of the following semantic conditions: equality, intersection, or distinct.

[0021] In the case of equality between the semantic implementation of Ci in data models DMi and DM₂, Cy (DMi) = Cy (DM₂). Each of the concept values Cy in the two data models DMi and DM₂ are or can be normalized as semantically equal so that questions using this concept in these data models can return populations with the same characterization for this concept. For example, if the concept "sex" in DMi is defined by the concept values "male" and "female" and the concept "sex" in DM₂ is defined by the states "male" and "female," then all the concept values for sex are

semantically equal in the two data models. Likewise, if the concept values for "sex" in DMi are "male," "female," and "unknown" and the concept values for sex in DM₂ are "male," "female," "unknown," and "other," then the concept "sex" can be normalized between DMi and DM₂by equating the concept values for "male" and "female" and equating "unknown" in DMi with

"unknown" and "other" in DM₂ such that, for example, a search for the "sex" concept of "unknown" will return all data with "unknown" sex from DMi and all data with "unknown" or "other" sex from DM₂.

[0022] In the case of an intersection between the semantic interpretations Q in data models DMi and DM₂, Cy (DMi) Π Cy (DM₂). In this case, only some concept values for Cy can be used in both DMi and DM₂. For example, "male" and "female" concepts may be equivalent between both data models, but an "ambiguous" concept for "sex" in DMi may not be used at all in DM₂ and an "unknown" concept for "sex" in DMi may be equivalent to "unknown and other" in DM₂. In this example, any question using the concept "sex" may be used against both data models DMi and DM₂ so long as the question posed by the researcher uses "sex" concept values "male," "female," or "unknown." However, if the question posed by the researcher uses the "sex" concept value

"ambiguous," then DM₂ may be flagged as not available and removed from the potential list of data sources that can respond to or otherwise provide relevant data in connection with the researcher's question.

[0023] In the case of distinctness between the semantic interpretations Q in data models DMi and DM₂, (DMi)≠ (DM₂). In this case, the concept does not exist at all in either DMi and DM₂, and so any question that uses cannot be used with any data model DM; that does not support the concepts covered by . For example, if a researcher creates a question that uses the concept "enrollment date" and that concept is not supported in some data models, then any data source using data models that do not support that concept is not available to respond to or otherwise provide data in connection with the question. Note each of the operations above (equality, intersection, discrete) creates semantic equivalence or closest possible fit and allows the researcher to understand and control the semantic relationships between concepts implemented in different data models.

[0024] As shown in FIG. 3, a concept processing method 300, which may be performed by any one or more appropriate components of the systems disclosed herein, such as question processor 216, may begin at step 305 with an evaluation of whether all data sources (population research databases {PRBi...PRB_n}) are processed. If they are not all processed, then, at step 310, the next data source is selected. At step 315, the data model DM; associated with the data source may be selected, and then, at step 320, it may be determined whether all values for the concept in question Cy have been processed (in some embodiments, all processing for one concept may be performed as it is dragged on the screen in a question creating interface). If so, then method 300 may return to step 305 and a next data source may be selected; otherwise, at step 325, a next concept value Cy may be selected. At step 330, it is determined whether the current concept value exists in data model DMj. If so, then method 300 may proceed to step 335, at which the concept value may be marked as equivalent Cy, after which the method may return to step 320 to determine whether any other values for the concept in question have yet to be processed. If the current concept value does not exists in Data Model DMj, then method 300 may proceed to step 340, at which semantic equivalence rules may be analyzed to determine whether a rule exists that can be applied to coerce equivalence. If there is an appropriate equivalence rule, then method 300 may proceed to step 345, the rule may be applied, and then the method may return to step 320. If there is no existing rule, then, at step 350, the data source DM; may be marked as not available for use with this concept value Cy and method 300 may return to step 320. If at step 305 it is determined that all data sources have been processed, then method 300 is complete, as indicated by step 355.

[0025] As shown in FIG. 4 in the context of a question creating interface, which may be provided in connection with, e.g., user interface 220 and/or question portal 224, a researcher may wish to use the concept value "unknown" for the concept "sex." Per method 300 and the description of smart concepts provided above, question processor 216 may interpret "unknown" as "unknown" directly in the Sentinel CDM but interpret "unknown" as an inclusive set of "unknown" and "no information" in the PCORI CDM. Note also that the interface allows the researcher to review and customize the smart concept logic if desired.

[0026] As described above, a requestor may provide a question to software 208 of database interface 200 via user interface 220 and/or question portal 224. After receiving a question, at step 110, database interface 200 may utilize question processor 216 to identify one or more population research databases that contain information meeting a threshold, the threshold corresponding to minimum required types and/or amounts of information usable for generating a response to the question. Because the researcher uses the set of canonical concepts {C ...C_n} as part of the

"language" by which a question is formed, question processor 216 can map the canonical concepts and semantic values to perform the following analysis: for each population research database 232(1) to 232(N), for each concept used in the question, and for each semantic value of each concept used in the question, if the semantic exists in the database and is congruent with the question, add the data source to a set of potential responders {R ...R_n} .

[0027] Next, it is beneficial to determine which databases have clinically relevant data, because there is a cost both in terms of the time required to make social arrangements to gain access to data in databases (contracts, conformance, etc.) and in terms of the actual cost of acquiring the data. In order to make such a determination, at step 115, database interface 200 may again utilize question processor 216 to determine a relevance quotient for each population research database identified by said identifying as a function of the amount of information usable for generating a response to the question in each respective population research database. To determine such a relevance quotient, the following analysis can be performed. For each potential responder {R ...R_n}, and for any covariance (i.e. , statistically significant covariance between any two concepts) CV_mn available in the set of canonical concepts {C ...C_n} used in a question, store the covariance CV_mn in the covariance set CV{ } and remove canonical concepts C_m and C_n from the set of canonical concepts C{ } for which a prevalence needs to be found. Then, for each of the remaining canonical concepts

{C ...C_n}, find the statistical prevalence (i.e. , population prevalence) of each concept in the underlying population and add it to the statistical prevalence set SP{ } . With TP defined as the total population of a given database, EP defined as the total estimated population available to meet the criteria described in the set of concepts C{Ci...C_n} used to form a given question, and assuming a set of statistically significant covariances CV{CV₁...CV_n} and prevalences PVf PV^ _^PV_n}, where every concept C appears once in at least one of CVjCVi...CV_n} or PV{PVi...PV_n}, the calculation specified in Equation 1, below, can be performed to determine EP, which can be considered a relevance quotient, although those of ordinary skill in the art will recognize after reading this disclosure in its entirety that various alternative calculations can be used to produce relevance quotients having different uses and/or meanings. For example, EP for any given database may be divided by the summation of all TP for the database for which it was computed to produce a relevance quotient indicating the percentage of individuals accounted for by that database that meet the criteria used to form a given question; additionally or alternatively, EP for any given database may be divided by the summation of all TP for all databases identified as potential responders in order to produce a relevance quotient indicating the percentage of individuals accounted for by the given database that meet the criteria used to form a given question out of all individuals accounted for by all databases identified as potential responders.

[0028] As shown in FIG. 5, a method 500 of generating an estimated prevalence for various data sources, which may be performed by any one or more appropriate components of the systems disclosed herein, such as question processor 216, may begin with step 505, at which it may be determined whether all data sources (population research databases or

, .ΡΡνΒη}) are processed. If they are not all processed, then, at step 510, the next data source may be selected. At step 515, it may be determined whether all smart concepts {C ...C_n} associated with the question submitted by the researcher have been processed. If not, then, at step 520, the next smart concept may be selected and, at step 525, the current data source PRB; may be interrogated or otherwise analyzed (e.g., via a look-up table) to determine whether it supports this smart concept . If PRB; does not support , then at step 530, PRB; may be marked as unavailable. However, if PRB; does support Cj, then, at step 535, it may be determined whether there is a covariance CVj in

{CVi ...CV_n} that involves Cj (there can be one and only one in the set). If there is a covariance CVj in {CVi ...CV_n} that characterizes , then, at step 540, CV; may be stored as the representative term to be used for in the final calculation; otherwise, at step 545, the smart concept prevalence PV; may be stored in (PY_\...PV_n} as the representative term to be used for Cj in the final calculation. Method 500 may then return to 515 and continue processing until all smart concepts {C ...C_n} have been processed against data sources PRBj, after which method 500 may return to step 505 and iterate until all data sources {PRBi ...PRB_n} have been processed.

[0029] Once all data sources {PRBi ...PRB_n} have been processed, method 500 may proceed to step 550 to determine whether any data source PRB; has been marked as available. If not, method 500 may proceed to step 585 and terminate with no estimated prevalence calculated. However, if one or more data sources are available, method 500 may proceed to step 555, and, for each data source, proceed through steps 565, 570, 575, and 580 to process each available data source PRB_{i am}j for each available data source PRB; by multiplying all the covariances (OiLi C¼), multiplying all the prevalences (Π|^ι PV_j)_> multiplying the result of steps 565 and 570 by the total population of PRBj, and storing the result of the calculation of step 575 at step 580. Method 500 may then return to step 555 to determine whether all available data sources PRBj have been processed, and, if so, the method may terminate. [0030] As shown in FIG. 6, the researcher has created the question: "how many males aged 10 to 50 have HIV and do not have Long QT syndrome?" A method like method 500 may provide estimated prevalence values or relevance quotients in connection with the question before it is actually posed to or otherwise analyzed in the context of one or more data sources. As shown in FIG. 6, such a method has been used to produce values that are then displayed on the question creating interface: 270 possible data sources with an average population of 20 patients per data source meet the question criteria. As shown in FIG. 7, a geolocation map of the data sources may be provided in connection with a question creating interface, as well as a table of names, contract status relative to the researcher (e.g. , does the researcher have permission to receive a full answer to the question or is a contract required), and the number of estimated patients for each data source.

[0031] Once estimated prevalence values and/or relevance quotients are determined for one or more population research databases 232(1) to 232(N), at step 120 of method 100, database interface

200 may provide an identification of one or more population research databases identified at step

110 and one or more relevance quotients, or information derived therefrom, determined in connection with the one or more population research databases at step 115 to the requestor as a response 236. Response 236 may in some embodiments comprise a full answer to question 228, such as a full report of relevant data in one or more population research databases, optionally including details of individual patient data records and/or statistical reports. In other embodiments, response 236 may provide a preliminary answer, which can be provided via appropriate

arrangements prior to a researcher having gained permission and/or established contractual arrangements to more directly access data in one or more particular population research databases.

For example, as shown in FIG. 6, a preliminary answer may be provided via a question creating interface in the form of a number of possible data sources and/or an average population of patients per data source that meet the question criteria. Such a preliminary answer may be provided automatedly and/or in real-time as a question is being created or after user selection of an appropriate user interface element, such as a button or hyperlink. As discussed above, such a preliminary answer may merely provide a requestor with an identification of one or more population research databases identified at step 110 and one or more relevance quotients, or information derived therefrom, determined in connection with the one or more population research databases at step 115.

This enables the requestor to determine one or more databases for which they would like to pursue data-sharing agreements, which can, in some embodiments, be automatedly arranged by one or more components of question processing system 204. Once such agreements are in place, database interface 200 may, as shown in method 800 of FIG. 8: receive an identification of at least one population research database upon which the requester wishes to process the question at step 805, which may be provided by a researcher via a question creating interface like that of FIG. 6; process the question or cause the question to be processed relative to the at least one population research database as a function of the one or more smart concepts to produce a question result, which may comprise a more comprehensive answer to the question than the preliminary answer discussed above, the question result comprising one or more smart concepts at step 810; and provide the question result to the requestor as a response 236 at step 815. The question result produced by method 800 may comprise, for example, a full report of relevant data in one or more population research databases based on one or more smart concepts, optionally including details of individual patient data records and/or one or more statistical reports. In this way, researchers can identify relevant databases via a preliminary answer and optionally also obtain comprehensive answers to their questions using a single interface, e.g. , question portal 224 and/or user interface 220.

[0032] It is to be noted that any one or more of the aspects and embodiments described herein may be conveniently implemented using one or more machines (e.g. , one or more computing devices that are utilized as a user computing device for an electronic document, one or more server devices, such as a document server, etc.) programmed according to the teachings of the present specification, as will be apparent to those of ordinary skill in the computer art. Appropriate software coding can readily be prepared by skilled programmers based on the teachings of the present disclosure, as will be apparent to those of ordinary skill in the software art. Aspects and implementations discussed above employing software and/or software modules may also include appropriate hardware for assisting in the implementation of the machine executable instructions of the software and/or software module.

[0033] Such software may be a computer program product that employs a machine-readable storage medium. A machine-readable storage medium may be any medium that is capable of storing and/or encoding a sequence of instructions for execution by a machine (e.g. , a computing device) and that causes the machine to perform any one of the methodologies and/or embodiments described herein. Examples of a machine-readable storage medium include, but are not limited to, a magnetic disk, an optical disc (e.g. , CD, CD-R, DVD, DVD-R, etc.), a magneto-optical disk, a read-only memory "ROM" device, a random access memory "RAM" device, a magnetic card, an optical card, a solid-state memory device, an EPROM, an EEPROM, and any combinations thereof. A machine- readable medium, as used herein, is intended to include a single medium as well as a collection of physically separate media, such as, for example, a collection of compact discs or one or more hard disk drives in combination with a computer memory. As used herein, a machine-readable storage medium does not include transitory forms of signal transmission.

[0034] Such software may also include information (e.g. , data) carried as a data signal on a data carrier, such as a carrier wave. For example, machine-executable information may be included as a data-carrying signal embodied in a data carrier in which the signal encodes a sequence of instruction, or portion thereof, for execution by a machine (e.g. , a computing device) and any related information (e.g. , data structures and data) that causes the machine to perform any one of the methodologies and/or embodiments described herein.

[0035] Examples of a computing device include, but are not limited to, an electronic book reading device, a computer workstation, a terminal computer, a server computer, a handheld device (e.g. , a tablet computer, a smartphone, etc.), a web appliance, a network router, a network switch, a network bridge, any machine capable of executing a sequence of instructions that specify an action to be taken by that machine, and any combinations thereof. In one example, a computing device may include and/or be included in a kiosk.

[0036] FIG. 9 shows a diagrammatic representation of one embodiment of a computing device in the exemplary form of a computer system 900 within which a set of instructions for causing a control system, such as the question processing system of FIG. 2, to perform any one or more of the aspects and/or methodologies of the present disclosure may be executed. It is also contemplated that multiple computing devices may be utilized to implement a specially configured set of instructions for causing one or more of the devices to perform any one or more of the aspects and/or

methodologies of the present disclosure. Computer system 900 includes a processor 904 and a memory 908 that communicate with each other, and with other components, via a bus 912. Bus 912 may include any of several types of bus structures including, but not limited to, a memory bus, a memory controller, a peripheral bus, a local bus, and any combinations thereof, using any of a variety of bus architectures.

[0037] Memory 908 may include various components (e.g. , machine-readable media) including, but not limited to, a random access memory component, a read only component, and any

combinations thereof. In one example, a basic input/output system 916 (BIOS), including basic routines that help to transfer information between elements within computer system 900, such as during start-up, may be stored in memory 908. Memory 908 may also include (e.g. , stored on one or more machine-readable media) instructions (e.g. , software) 920 embodying any one or more of the aspects and/or methodologies of the present disclosure. In another example, memory 908 may further include any number of program modules including, but not limited to, an operating system, one or more application programs, other program modules, program data, and any combinations thereof.

[0038] Computer system 900 may also include a storage device 924. Examples of a storage device (e.g. , storage device 924) include, but are not limited to, a hard disk drive, a magnetic disk drive, an optical disc drive in combination with an optical medium, a solid-state memory device, and any combinations thereof. Storage device 924 may be connected to bus 912 by an appropriate interface (not shown). Example interfaces include, but are not limited to, SCSI, advanced technology attachment (ATA), serial ATA, universal serial bus (USB), IEEE 1394 (FIREWIRE), and any combinations thereof. In one example, storage device 924 (or one or more components thereof) may be removably interfaced with computer system 900 (e.g. , via an external port connector (not shown)). Particularly, storage device 924 and an associated machine-readable medium 928 may provide nonvolatile and/or volatile storage of machine-readable instructions, data structures, program modules, and/or other data for computer system 900. In one example, software 920 may reside, completely or partially, within machine-readable medium 928. In another example, software 920 may reside, completely or partially, within processor 904.

[0039] Computer system 900 may also include an input device 932. In one example, a user of computer system 900 may enter commands and/or other information into computer system 900 via input device 932. Examples of an input device 932 include, but are not limited to, an alpha-numeric input device (e.g. , a keyboard), a pointing device, a joystick, a gamepad, an audio input device (e.g. , a microphone, a voice response system, etc.), a cursor control device (e.g. , a mouse), a touchpad, an optical scanner, a video capture device (e.g. , a still camera, a video camera), a touchscreen, and any combinations thereof. Input device 932 may be interfaced to bus 912 via any of a variety of interfaces (not shown) including, but not limited to, a serial interface, a parallel interface, a game port, a USB interface, a FIREWIRE interface, a direct interface to bus 912, and any combinations thereof. Input device 932 may include a touch screen interface that may be a part of or separate from display 936, discussed further below. Input device 932 may be utilized as a user selection device for selecting one or more graphical representations in a graphical interface as described above.

[0040] A user may also input commands and/or other information to computer system 900 via storage device 924 (e.g. , a removable disk drive, a flash drive, etc.) and/or network interface device 940. A network interface device, such as network interface device 940, may be utilized for connecting computer system 900 to one or more of a variety of networks, such as network 944, and one or more remote devices 948 connected thereto. Examples of a network interface device include, but are not limited to, a network interface card (e.g. , a mobile network interface card, a LAN card), a modem, and any combination thereof. Examples of a network include, but are not limited to, a wide area network (e.g. , the Internet, an enterprise network), a local area network (e.g. , a network associated with an office, a building, a campus or other relatively small geographic space), a telephone network, a data network associated with a telephone/voice provider (e.g. , a mobile communications provider data and/or voice network), a direct connection between two computing devices, and any combinations thereof. A network, such as network 944, may employ a wired and/or a wireless mode of communication. In general, any network topology may be used.

Information (e.g., data, software 920, etc.) may be communicated to and/or from computer system 900 via network interface device 940.

[0041] Computer system 900 may further include a video display adapter 952 for

communicating a displayable image to a display device, such as display device 936. Examples of a display device include, but are not limited to, a liquid crystal display (LCD), a cathode ray tube (CRT), a plasma display, a light emitting diode (LED) display, and any combinations thereof.

Display adapter 952 and display device 936 may be utilized in combination with processor 904 to provide graphical representations of aspects of the present disclosure. In addition to a display device, computer system 900 may include one or more other peripheral output devices including, but not limited to, an audio speaker, a printer, and any combinations thereof. Such peripheral output devices may be connected to bus 912 via a peripheral interface 956. Examples of a peripheral interface include, but are not limited to, a serial port, a USB connection, a FIREWIRE connection, a parallel connection, and any combinations thereof.

[0042] The foregoing has been a detailed description of illustrative embodiments of the invention. Various modifications and additions can be made without departing from the spirit and scope of this invention. Features of each of the various embodiments described above may be combined with features of other described embodiments as appropriate in order to provide a multiplicity of feature combinations in associated new embodiments. Furthermore, while the foregoing describes a number of separate embodiments, what has been described herein is merely illustrative of the application of the principles of the present invention. Additionally, although particular methods herein may be illustrated and/or described as being performed in a specific order, the ordering is highly variable within ordinary skill to achieve methods, systems, and software according to the present disclosure. Accordingly, this description is meant to be taken only by way of example, and not to otherwise limit the scope of this invention.

[0043] The present disclosure provides a number of solutions, many of which are necessarily rooted in computer technology, in order to overcome various problems extant in the art, many of which arise specifically in the realm of accessing information in disparate data sources. Exemplary embodiments have been disclosed above and illustrated in the accompanying drawings. It will be understood by those skilled in the art that various changes, omissions and additions may be made to that which is specifically disclosed herein without departing from the spirit and scope of the present invention.

Claims

What is claimed is:

1. A method of processing a question relative to one or more of a plurality of distinct and separate population research databases using semantics embodied in one or more smart concepts, the method comprising:

receiving, from a requestor, a question comprising one or more smart concepts, the one or more smart concepts defined as a function of information in at least one of the plurality of population research databases;

identifying one or more population research databases that contain information meeting a threshold, the threshold corresponding to minimum required types and amounts of information usable for generating a response to the question;

determining a relevance quotient for each population research database identified by said identifying as a function of the amount of information usable for generating a response to the question in each respective population research database; and

providing, to the requestor, an identification of one or more population research databases identified in said identifying step and the relevance quotient determined for the one or more population research databases in said determining step, or information derived therefrom.

2. A method according to claim 1, further comprising:

receiving, from the requestor, an identification of at least one population research database upon which the requester wishes to process the question;

processing the question or causing the question to be processed relative to the at least one population research database as a function of the one or more smart concepts to produce a question result, the question result comprising one or more smart concepts; and providing the question result to the requestor.

3. A method according to claim 1 or claim 2, wherein at least one of the one or more smart concepts is defined as a function of information in at least two of the plurality of population research databases.

4. A method according to claim 1 or any of the preceding claims, wherein at least one of the one or more smart concepts is defined as a function of at least two pieces of information in the at

19 ^y i^i u. ». v o least one of the plurality of population research databases, the two pieces of information having differing semantic meanings.

5. A method according to claim 1 or any of the preceding claims, wherein the relevance quotient is determined as a function of at least one statistical prevalence determined as a function of at least one of said one or more smart concepts in each population research database identified by said identifying.

6. A method according to claim 5 or any of the preceding claims, wherein the relevance quotient is determined as a function of at least one statistical covariance determined as a function of at least two of said smart concepts in each population research database identified by said identifying.

7. A method according to claim 1 or any of the preceding claims, wherein said providing includes displaying information on a display device via a video display adapter.

8. A method according to claim 1 or any of the preceding claims, wherein said providing includes emailing information.

9. A machine-readable storage medium containing machine-executable instructions for performing a method of processing a question relative to one or more of a plurality of distinct and separate population research databases using semantics embodied in one or more smart concepts, the method being executed in a question processing system, said machine-executable instructions comprising machine-executable instructions for performing the method of any of claims 1 to 8.

10. An apparatus comprising a processor and memory, the memory containing computer executable instructions, which, when executed, cause the processor to execute a method according to any of claims 1 to 8.

11. A question processing system, comprising:

a database interface; and

software for controlling the database interface, the software being designed and configured to perform the method of any of claims 1 to 8.

20 ^y i^i u. ». v o