CN116134546A

CN116134546A - Method and system for efficient sample mixing for diagnostic testing

Info

Publication number: CN116134546A
Application number: CN202180054373.5A
Authority: CN
Inventors: 吉列尔莫·约瑟·希曼; 斯里提·苏尼尔; 亚什·苏迪尔·帕蒂尔
Original assignee: Weipu Co
Current assignee: Weipu Co
Priority date: 2020-07-02
Filing date: 2021-06-30
Publication date: 2023-05-16
Also published as: EP4176451A1; WO2022006246A1; AU2021300319A1; EP4176451A4; CA3187387A1

Abstract

The present disclosure provides methods for performing or directing the mixing of a plurality of body samples. In one aspect, a method may include: (a) obtaining a plurality of subjects associated with a plurality of subjects: health data, contact tracking data, location data, movement data, or any combination thereof; and (b) processing the plurality of with a trained computer algorithm: health data, contact tracking data, location data, movement data, or any combination thereof, to assign at least some individual subjects of the plurality of subjects to a pool among a plurality of pools, wherein the number of pools in the plurality of pools is less than the number of subjects in the plurality of subjects.

Description

Method and system for efficient sample mixing for diagnostic testing

Cross reference

The present application claims the benefit of U.S. provisional patent application No. 63/047,630, filed on 7/2/2020, which provisional patent application is incorporated herein by reference in its entirety.

SUMMARY

Methods and systems for performing or directing the mixing (pooling) of multiple body samples are provided. Body samples, health data, contact tracking data, location data, and/or movement data may be collected from multiple subjects (e.g., patients), and trained computer algorithms may be used to efficiently perform or direct mixing body samples into multiple sample pools (sample pools) for diagnostic tests. Such effective sample mixing may be useful in frequent and extensive diagnostic tests for infectious diseases in a population, which may be essential for suppressing and alleviating infectious diseases, particularly in the case of an epidemic outbreak (e.g., covd-19).

In one aspect, the present disclosure provides a method comprising: (a) obtaining a plurality of subjects associated with a plurality of subjects: health data, contact tracking data, location data, movement data, or any combination thereof; and (b) processing the plurality of with a trained computer algorithm: health data, contact tracking data, location data, movement data, or any combination thereof, such that at least some individual subjects of the plurality of subjects arrive at a pool of the plurality of pools, wherein the number of pools of the plurality of pools is less than the number of subjects of the plurality of subjects.

In some embodiments, the method further comprises outputting an electronic recommendation to create, for each of at least two given pools of the plurality of pools, a mixed sample (deposited sample) by combining body samples or portions of body samples obtained from subjects in the at least two given pools. In some embodiments, the method further comprises creating a combined sample by combining body samples or portions of body samples obtained from the subject in at least two given pools for each of the at least two given pools. In some embodiments, the method further comprises obtaining a body sample or portion of a body sample from a plurality of subjects.

In some embodiments, the body sample is solely selected from the group consisting of: nasopharyngeal swab, oropharyngeal swab, blood, serum, plasma, vitreous, sputum, urine, stool, tears, sweat, saliva, semen, mucosal secretion (mucosal excretions), mucus, spinal fluid, cerebrospinal fluid (CSF), pleural fluid, peritoneal fluid, amniotic fluid, lymph fluid, ocular swab (eye swab), cheek swab, vaginal swab, cervical swab, rectal swab, cells and tissues.

In some embodiments, the method further comprises isolating nucleic acids from the body sample, and for a given pool of the plurality of pools, creating a mixed sample by combining at least some of the nucleic acids isolated from the obtained body sample from the subject in the given pool. In some embodiments, the method further comprises enriching the nucleic acids of the plurality of genomic regions. In some embodiments, the method further comprises amplifying (amplifying) at least some of the nucleic acids. In some embodiments, amplifying comprises selective amplification. In some embodiments, the amplification comprises universal amplification. In some embodiments, enriching at least some of the nucleic acids of the plurality of genomic regions comprises contacting the nucleic acids with a plurality of probes, each of the plurality of probes having sequence complementarity to at least a portion of one of the plurality of genomic regions. In some embodiments, the plurality of genomic regions comprises genomic regions associated with a disease or disorder. In some embodiments, the disease or disorder comprises 2019 coronavirus disease (covd-19), human Immunodeficiency Virus (HIV), or malaria. In some embodiments, the disease or disorder comprises COVID-19.

In some embodiments, the method further comprises performing a plurality of diagnostic tests on the plurality of mixed samples to obtain a plurality of diagnostic results associated with the plurality of mixed samples. In some embodiments, the plurality of diagnostic tests are configured to detect the presence or absence of a disease or disorder based on analyzing at least the plurality of mixed samples. In some embodiments, the disease or disorder comprises 2019 coronavirus disease (covd-19), human Immunodeficiency Virus (HIV), or malaria. In some embodiments, the disease or disorder comprises COVID-19.

In some embodiments, the method further comprises, for a given pool of the plurality of pools, detecting that there is no disease or disorder in each individual subject in the given pool when no disease or disorder is detected based on analyzing the mixed sample corresponding to the given pool. In some embodiments, the method further comprises, for a given pool of the plurality of pools, when the presence of a disease or disorder is detected based on analyzing the mixed sample corresponding to the given pool, then examining each individual subject in the given pool for the disease or disorder. In some embodiments, the method further comprises, for a given pool of the plurality of pools, when the presence of a disease or disorder is detected based on analyzing the mixed sample corresponding to the given pool, verifying each of the plurality of sub-pools of the given pool for the disease or disorder.

In some embodiments, the method further comprises detecting the presence or absence of a disease or disorder in the plurality of subjects with a clinical sensitivity (clinical sensitivity) of at least about 50%. In some embodiments, the method further comprises detecting the presence or absence of a disease or disorder in a plurality of subjects with a clinical sensitivity of at least about 70%. In some embodiments, the method further comprises detecting the presence or absence of a disease or disorder in a plurality of subjects with a clinical sensitivity of at least about 90%. In some embodiments, the method further comprises detecting the presence or absence of a disease or disorder in a plurality of subjects with a clinical sensitivity of at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or 100%.

In some embodiments, the method further comprises detecting the presence or absence of a disease or disorder in the plurality of subjects with a clinical specificity (clinical specificity) of at least about 50%. In some embodiments, the method further comprises detecting the presence or absence of a disease or disorder in a plurality of subjects with a clinical specificity of at least about 70%. In some embodiments, the method further comprises detecting the presence or absence of a disease or disorder in a plurality of subjects with a clinical specificity of at least about 90%. In some embodiments, the method further comprises detecting the presence or absence of a disease or disorder in the plurality of subjects with a clinical specificity of at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or 100%.

In some embodiments, the method further comprises detecting the presence or absence of a disease or disorder in the plurality of subjects with a Positive Predictive Value (PPV) of at least about 50%. In some embodiments, the method further comprises detecting the presence or absence of a disease or disorder in the plurality of subjects with a Positive Predictive Value (PPV) of at least about 70%. In some embodiments, the method further comprises detecting the presence or absence of a disease or disorder in the plurality of subjects with a Positive Predictive Value (PPV) of at least about 90%. In some embodiments, the method further comprises detecting the presence or absence of a disease or disorder in the plurality of subjects with a Positive Predictive Value (PPV) of at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or 100%.

In some embodiments, the method further comprises detecting the presence or absence of a disease or disorder in the plurality of subjects with a Negative Predictive Value (NPV) of at least about 50%. In some embodiments, the method further comprises detecting the presence or absence of a disease or disorder in the plurality of subjects with a Negative Predictive Value (NPV) of at least about 70%. In some embodiments, the method further comprises detecting the presence or absence of a disease or disorder in the plurality of subjects with a Negative Predictive Value (NPV) of at least about 90%. In some embodiments, the method further comprises detecting the presence or absence of a disease or disorder in the plurality of subjects with a Negative Predictive Value (NPV) of at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or 100%.

In some embodiments, the method further comprises detecting the presence or absence of a disease or disorder in the plurality of subjects with an area under the curve (AUC) of at least about 0.60. In some embodiments, the method further comprises detecting the presence or absence of a disease or disorder in the plurality of subjects with an area under the curve (AUC) of at least about 0.70. In some embodiments, the method further comprises detecting the presence or absence of a disease or disorder in the plurality of subjects with an area under the curve (AUC) of at least about 0.80. In some embodiments, the method further comprises detecting the presence or absence of a disease or disorder in the plurality of subjects with an area under the curve (AUC) of at least about 0.90. In some embodiments, the method further comprises detecting the presence or absence of a disease or disorder in the plurality of subjects with an area under the curve (AUC) of at least about 0.50, at least about 0.55, at least about 0.60, at least about 0.65, at least about 0.70, at least about 0.75, at least about 0.80, at least about 0.85, at least about 0.90, at least about 0.91, at least about 0.92, at least about 0.93, at least about 0.94, at least about 0.95, at least about 0.96, at least about 0.97, at least about 0.98, or at least about 0.99.

In some embodiments, the plurality of health data, contact tracking data, location data, movement data, or any combination thereof associated with the plurality of subjects includes de-identified data. In some embodiments, the plurality of health data comprises: diagnosis of a disease or disorder, prognosis of a disease or disorder, risk of having a disease or disorder, history of treatment of a disease or disorder, history of past treatment of a disease or disorder, history of prescribed medications, history of prescribed medical devices, age, height, weight, sex, smoking condition, one or more symptoms, and one or more vital signs. At the position of In some embodiments, the one or more vital signs include one or more of the following: heart rate, heart rate variability, blood pressure, respiratory rate, blood oxygen concentration (SpO) ₂ ) Carbon dioxide concentration in respiratory gases, hormone levels, sweat analysis, blood glucose, body temperature, impedance, conductivity, capacitance, resistivity, electromyography, galvanic skin response, nerve signals, electroencephalography, electrocardiography, immunological markers, and other physiological measurements.

In some embodiments, the trained computer algorithm comprises a trained machine learning classifier. In some embodiments, the trained machine learning classifier includes an algorithm selected from the group consisting of: support vector machines, neural networks, random forests, linear regression, logistic regression, bayesian classifiers, lifting classifiers, gradient lifting algorithms, adaptive lifting (AdaBoost) algorithms, and extreme gradient lifting (XGBoost) algorithms.

In some embodiments, the method further comprises: processing the health data, contact tracking data, location data, or movement data of the individual subjects using a trained computer algorithm to determine an expected prevalence of the disease or disorder, and assigning individual subjects of the plurality of subjects to pools of the plurality of pools based at least in part on the determined expected prevalence of the disease or disorder. In some embodiments, the method further comprises assigning individual subjects of the plurality of subjects to pools of the plurality of pools when the expected prevalence of the determined disease or disorder is less than a predetermined prevalence threshold. In some embodiments, the predetermined popularity threshold is about 5%, about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, or about 50%. In some embodiments, the method further comprises: determining a maximum pool size based on the determined expected prevalence of the disease or disorder, and assigning individual ones of the plurality of subjects to pools among the plurality of pools based on the maximum pool size. In some embodiments, the method further comprises determining an expected severity of the symptom in the plurality of subjects. In some embodiments, the expected severity of symptoms includes a mild degree of symptoms, an intermediate degree of symptoms, or a severity of symptoms.

In some embodiments, the number of pools in the plurality of pools is reduced by at least 50% relative to the number of subjects in the plurality of subjects. In some embodiments, the number of pools in the plurality of pools is reduced by at least 100% relative to the number of subjects in the plurality of subjects. In some embodiments, the number of pools in the plurality of pools is reduced by at least 200% relative to the number of subjects in the plurality of subjects. In some embodiments, the number of pools in the plurality of pools is reduced by at least 300% relative to the number of subjects in the plurality of subjects. In some embodiments, the number of pools in the plurality of pools is reduced by at least about 10%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, at least about 100%, at least about 110%, at least about 120%, at least about 130%, at least about 140%, at least about 150%, at least about 160%, at least about 170%, at least about 180%, at least about 190%, at least about 200%, at least about 210%, at least about 220%, at least about 230%, at least about 240%, at least about 250%, at least about 260%, at least about 270%, at least about 280%, at least about 290%, at least about 300%, at least about 310%, at least about 320%, at least about 330%, at least about 340%, at least about 350%, at least about 360%, at least about 370%, at least about 380%, at least about 390%, at least about 400%, or greater than about 400% relative to the number of subjects in the plurality of subjects.

In some embodiments, the method further comprises administering a therapeutically effective therapeutic dose to treat the disease or disorder detected in at least a subset of the plurality of subjects based on the presence or absence of the disease or disorder detected in the plurality of subjects.

In another aspect, the present disclosure provides a computer system comprising: a database and one or more computer processors operatively coupled to the database, the database configured to store a plurality of data associated with a plurality of subjects: health data, contact tracking data, location data, movement data, or any combination thereof; wherein the one or more computer processors are individually or collectively programmed to: processing the plurality of with a trained computer algorithm: health data, contact tracking data, location data, movement data, or any combination thereof, to assign at least some individual subjects of a plurality of subjects to a pool of a plurality of pools, wherein the number of pools of the plurality of pools is less than the number of subjects of the plurality of subjects.

Another aspect of the present disclosure provides a non-transitory computer-readable medium containing machine-executable code that, when executed by one or more computer processors, implements any of the methods described above or elsewhere herein.

Another aspect of the present disclosure provides a system comprising one or more computer processors and a computer memory coupled to the computer processors. The computer memory includes machine executable code that, when executed by one or more computer processors, implements any of the methods described above or elsewhere herein.

Further aspects and advantages of the present disclosure will become apparent to those skilled in the art from the following detailed description, wherein only illustrative embodiments of the present disclosure are shown and described. As will be realized, the present disclosure is capable of other and different embodiments and its several details are capable of modification in various obvious respects, all without departing from the present disclosure. Accordingly, the drawings and description are to be regarded as illustrative in nature and not as restrictive.

Incorporated by reference

All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference. If publications and patents or patent applications incorporated by reference contradict the disclosure contained in this specification, this specification is intended to supersede and/or take precedence over any such conflicting material.

Brief Description of Drawings

The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description and accompanying drawings (also referred to herein as "Figure") and "fig." drawings ", in which the detailed description sets forth illustrative embodiments in which the principles of the invention are utilized, wherein:

FIG. 1A illustrates an example method for performing sample mixing of multiple clinical samples for a diagnostic test.

Fig. 1B illustrates a general workflow for performing sample mixing of a plurality of clinical samples for diagnostic testing, including data entry, consuming and integrating data, performing methods for sample mixing, recommending a mixing protocol (pooling protocol), and performing diagnostic testing on a sample pool.

Fig. 1C shows a workflow for sample processing and testing.

Fig. 1D illustrates an example of a system process diagram of the methods and systems of the present disclosure.

Fig. 1E illustrates an example of a user activity diagram of the methods and systems of the present disclosure.

Fig. 2A shows an example of sample mixing of multiple clinical samples for diagnostic testing. If a sample cell comprising a plurality of individual clinical samples receives a negative clinical test output result, all individual clinical samples in the sample cell must also have a negative clinical test output result. Conversely, if a sample cell comprising a plurality of individual clinical samples receives a positive clinical test output, the individual clinical samples may be tested individually, or the sample cell may be further subdivided into smaller subsets, and the sample mixing process may be repeated.

Fig. 2B illustrates an example of how the methods and systems of the present disclosure advantageously use sample mixing. For example, about 2 out of the 10 test results currently returned positive; patterns can be learned by machine learning 8 out of 10 negative patients. Using the methods and systems of the present disclosure, predictive analysis and machine learning may be applied to identify patients with lower prevalence. Multiple predicted negative samples may be mixed together to create a sample cell, and a single diagnostic test may be performed on the sample cell.

FIG. 3 illustrates a computer system programmed or otherwise configured to implement the methods provided herein.

Fig. 4A shows the number of people who can be examined by 1000 diagnostic tests using Boston and Cambridge queue information (cohort information) (left 3 columns), clinical symptom information (middle 3 columns), and contact tracking information (right 3 columns). In each set of 3 columns, the number of people that can be examined by 1000 diagnostic tests using no sample mix (left), simple sample mix (middle) and intelligent sample mix based on the methods and systems of the present disclosure (right) are indicated.

FIG. 4B illustrates that rules for diagnostic testing may be redefined using intelligent sample mixing based on the methods and systems of the present disclosure.

Fig. 5A shows the percentage of test savings achieved using simple sample mixing (gray), intelligent sample mixing (purple) using methods and systems based on the present disclosure, and the upper limit achieved using intelligent sample mixing (dashed line).

Fig. 5B shows the percentage increase in diagnostic test capability achieved using simple sample mixing (gray), using smart sample mixing (purple) based on the methods and systems of the present disclosure, and the upper limit achieved using smart sample mixing (dashed line).

Fig. 6 shows the relative number of diagnostic tests required versus sample cell size (ranging from 1 sample per cell to 20 samples per cell) with Prevalence (PR) of 1% (light blue), 5% (orange), 10% (gray), 20% (yellow), and 30% (dark blue).

Figure 7 shows that using the methods and systems of the present disclosure, an optimal pool size of 4 samples per pool was selected and a 40% reduction in the use of diagnostic test kits (kit) was achieved, as well as 80% clinical sensitivity and 96% clinical specificity.

Detailed Description

The term "nucleic acid" or "polynucleotide" as used herein generally refers to a molecule comprising one or more nucleic acid subunits or nucleotides. The nucleic acid may comprise one or more nucleotides selected from adenine (a), cytosine (C), guanine (G), thymine (T) and uracil (U) or variants thereof. Nucleotides generally include nucleosides and at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more phosphate (PO 3) groups. Nucleotides may include nucleobases, pentoses (ribose or deoxyribose), and one or more phosphate groups, alone or in combination.

Ribonucleotides are nucleotides in which the sugar is ribose. Deoxyribonucleotides are nucleotides in which the sugar is deoxyribose. The nucleotide may be a nucleoside monophosphate or a nucleoside polyphosphate. The nucleotide may be a deoxyribonucleoside polyphosphate, such as, for example, deoxyribonucleoside triphosphates (dNTPs), which may be selected from the group consisting of deoxyadenosine triphosphate (dATP), deoxycytidine triphosphate (dCTP), deoxyguanosine triphosphate (dGTP), uridine triphosphate (dUTP), and deoxythymidine triphosphate (dTTP) dNTPs, which include a detectable label, such as a luminescent label or marker (e.g., a fluorophore). A nucleotide may comprise any subunit that may be incorporated into a growing nucleic acid strand. Such subunits may be A, C, G, T or U, or any other subunit or subunits that are specific for one or more of the complementary A, C, G, T or U or complementary to a purine (i.e., a or G or variant thereof) or pyrimidine (i.e., C, T or U or variant thereof). In some examples, the nucleic acid is deoxyribonucleic acid (DNA), ribonucleic acid (RNA), or a derivative or variant thereof. The nucleic acid may be single-stranded or double-stranded. The nucleic acid molecules may be linear, curved or circular or any combination thereof.

As used herein, the terms "nucleic acid molecule," "nucleic acid sequence," "nucleic acid fragment," "oligonucleotide," and "polynucleotide" generally refer to polynucleotides that may have various lengths, such as deoxyribonucleotides or Ribonucleotides (RNAs) or analogs thereof. The nucleic acid molecule may have a length of at least about 5 bases, 10 bases, 20 bases, 30 bases, 40 bases, 50 bases, 60 bases, 70 bases, 80 bases, 90 bases, 100 bases, 110 bases, 120 bases, 130 bases, 140 bases, 150 bases, 160 bases, 170 bases, 180 bases, 190 bases, 200 bases, 300 bases, 400 bases, 500 bases, 1 kilobases (kb), 2kb, 3kb, 4kb, 5kb, 10kb, or 50kb, or it may have any number of bases between any two of the above values. Oligonucleotides generally consist of a specific sequence of four nucleotide bases: adenine (a); cytosine (C); guanine (G); and thymine (T) (uracil (U) replaces thymine (T) when the polynucleotide is RNA). Thus, the terms "nucleic acid molecule", "nucleic acid sequence", "nucleic acid fragment", "oligonucleotide" and "polynucleotide" are at least partially intended to be alphabetical representations of polynucleotide molecules (alphabetical representation). Alternatively, these terms may be applied to the polynucleotide molecule itself. Such alphabetical representations may be entered into a database in a computer with a central processing unit and/or used for bioinformatic applications such as functional genomics and homology searches (homology searching). The oligonucleotides may include one or more non-standard nucleotides, nucleotide analogs, and/or modified nucleotides.

As used herein, the term "sample" generally refers to a biological sample. Examples of biological samples include nucleic acid molecules, amino acids, polypeptides, proteins, carbohydrates, fats, or viruses. In one example, the biological sample is a nucleic acid sample comprising one or more nucleic acid molecules. The nucleic acid molecule may be cell-free or cell-free, such as cell-free DNA (cfDNA) or cell-free RNA (cfRNA). The nucleic acid molecules can be derived from a variety of sources, including human, mammalian, non-human mammalian, simian, monkey, chimpanzee, reptile, amphibian, or avian sources. In addition, the sample may be extracted from a variety of animal fluids (animal fluids) containing cell-free sequences, including, but not limited to, body fluid samples such as blood, serum, plasma, vitreous, sputum, urine, tears, sweat, saliva, semen, mucosal secretions, mucous, spinal fluid, cerebrospinal fluid (CSF), pleural fluid, peritoneal fluid, amniotic fluid, lymph, and the like. Cell-free polynucleotides (e.g., cfDNA) may be of fetal origin (via fluid obtained from a pregnant subject), or may originate from the subject's own tissue.

As used herein, the term "subject" generally refers to an individual having a biological sample being processed or analyzed. The subject may be an animal or a plant. The subject may be a mammal, such as a human, dog, cat, horse, pig or rodent. The subject may be a patient, for example, who has or is suspected of having a disease, such as one or more cancers (e.g., brain, breast, cervical, colorectal, endometrial, esophageal, gastric, hepatobiliary, leukemia, liver, lung, lymphoma, ovarian, pancreatic, skin, urinary tract cancer), one or more infectious diseases, one or more genetic diseases, or one or more tumors, or any combination thereof. For subjects having or suspected of having one or more tumors, the tumor may be of one or more types.

As used herein, the term "whole blood" generally refers to a blood sample that has not been separated (e.g., by centrifugation) into sub-components. Whole blood of the blood sample may comprise cfDNA and/or germline DNA. Whole blood DNA (which may include cfDNA and/or germ line DNA) may be extracted from a blood sample. Whole blood DNA sequencing reads (which may include cfDNA sequencing reads and/or germline DNA sequencing reads) may be extracted from whole blood DNA.

Diagnostic tests for diseases or disorders in a subject (e.g., a patient) may be limited or rare. In some cases, a diagnostic test may be left to a symptomatic or high risk subject; this may be less desirable as asymptomatic subjects continue to be infectious and transmit infectious diseases, such as viruses (e.g., covd-19 or HIV), bacteria, or parasites (e.g., malaria). Thus, frequent and extensive diagnostic tests for infectious diseases throughout the population may be necessary for containment and mitigation, particularly in the case of epidemic outbreaks.

Methods and systems for mixing multiple body samples are provided. Body samples, health data, contact tracking data, location data, and/or movement data may be collected from multiple subjects (e.g., patients), and trained computer algorithms may be used to effectively mix the body samples into multiple sample cells for diagnostic testing. This effective sample mix can be used for frequent and extensive diagnostic tests for infectious diseases in the population, which may be necessary for containment and mitigation, especially in the case of epidemic outbreaks (e.g., covd-19, HIV or malaria).

In some embodiments, the method further comprises outputting an electronic recommendation to create, for each of at least two given pools of the plurality of pools, a mixed sample by combining body samples or portions of body samples obtained from subjects in the at least two given pools. In some embodiments, the method further comprises creating, for each of at least two given pools of the plurality of pools, a mixed sample by combining body samples or portions of body samples obtained from subjects in the at least two given pools. In some embodiments, the method further comprises obtaining a body sample or portion of a body sample from a plurality of subjects.

In some embodiments, the body sample is solely selected from the group consisting of: nasopharyngeal swab, oropharyngeal swab, blood, serum, plasma, vitreous, sputum, urine, stool, tears, sweat, saliva, semen, mucosal secretions, mucus, spinal fluid, cerebrospinal fluid (CSF), pleural fluid, peritoneal fluid, amniotic fluid, lymph, ocular swab, cheek swab, vaginal swab, cervical swab, rectal swab, cells and tissues.

In some embodiments, the method further comprises isolating nucleic acids from the body sample, and for a given pool of the plurality of pools, creating a mixed sample by combining at least some of the nucleic acids isolated from the obtained body sample from the subject in the given pool. In some embodiments, the method further comprises enriching the nucleic acids of the plurality of genomic regions. In some embodiments, the method further comprises amplifying at least some of the nucleic acids. In some embodiments, amplifying comprises selective amplification. In some embodiments, the amplification comprises universal amplification. In some embodiments, enriching the nucleic acid of the plurality of genomic regions comprises contacting the nucleic acid with a plurality of probes, each of the plurality of probes having sequence complementarity to at least a portion of one genomic region of the plurality of genomic regions. In some embodiments, the plurality of genomic regions comprises genomic regions associated with a disease or disorder. In some embodiments, the disease or disorder comprises 2019 coronavirus disease (covd-19), human Immunodeficiency Virus (HIV), or malaria. In some embodiments, the disease or disorder comprises COVID-19.

In some embodiments, the method further comprises detecting the presence or absence of a disease or disorder in a plurality of subjects with a clinical sensitivity of at least about 50%. In some embodiments, the method further comprises detecting the presence or absence of a disease or disorder in a plurality of subjects with a clinical sensitivity of at least about 70%. In some embodiments, the method further comprises detecting the presence or absence of a disease or disorder in a plurality of subjects with a clinical sensitivity of at least about 90%. In some embodiments, the method further comprises detecting the presence or absence of a disease or disorder in a plurality of subjects with a clinical sensitivity of at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or 100%.

In some embodiments, the method further comprises detecting the presence or absence of a disease or disorder in a plurality of subjects with a clinical specificity of at least about 50%. In some embodiments, the method further comprises detecting the presence or absence of a disease or disorder in a plurality of subjects with a clinical specificity of at least about 70%. In some embodiments, the method further comprises detecting the presence or absence of a disease or disorder in a plurality of subjects with a clinical specificity of at least about 90%. In some embodiments, the method further comprises detecting the presence or absence of a disease or disorder in the plurality of subjects with a clinical specificity of at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or 100%.

In some embodiments, the method further comprises detecting the presence or absence of a disease or disorder in the plurality of subjects with an area under the curve (AUC) of at least about 0.60. In some embodiments, the method further comprises detecting the presence or absence of a disease or disorder in the plurality of subjects using an area under the curve (AUC) of at least about 0.70. In some embodiments, the method further comprises detecting the presence or absence of a disease or disorder in the plurality of subjects with an area under the curve (AUC) of at least about 0.80. In some embodiments, the method further comprises detecting the presence or absence of a disease or disorder in the plurality of subjects with an area under the curve (AUC) of at least about 0.90. In some embodiments, the method further comprises detecting the presence or absence of a disease or disorder in the plurality of subjects with an area under the curve (AUC) of at least about 0.50, at least about 0.55, at least about 0.60, at least about 0.65, at least about 0.70, at least about 0.75, at least about 0.80, at least about 0.85, at least about 0.90, at least about 0.91, at least about 0.92, at least about 0.93, at least about 0.94, at least about 0.95, at least about 0.96, at least about 0.97, at least about 0.98, or at least about 0.99.

In some embodiments, the plurality of health data comprises: diagnosis of a disease or disorder, prognosis of a disease or disorder, risk of having a disease or disorder, history of treatment of a disease or disorder, history of past treatment of a disease or disorder, history of prescribed medications, history of prescribed medical devices, age, height, weight, sex, smoking condition, one or more symptoms, and one or more vital signs. In some embodiments, the one or more vital signs include one or more of the following: heart rate, heart rate variability, blood pressure, respiratory rate, blood oxygen concentration (SpO) ₂ ) Carbon dioxide concentration in respiratory gases, hormone levels, sweat analysis, blood glucose, body temperature, impedance, conductivity, capacitance, resistivity, electromyography, galvanic skin response, nerve signals, electroencephalography, electrocardiography, immunological markers, and other physiological measurements.

In some embodiments, the method further comprises: processing the health data, contact tracking data, location data, or movement data of the individual subjects using a trained computer algorithm to determine an expected prevalence of the disease or disorder, and assigning individual subjects of the plurality of subjects to pools of the plurality of pools based at least in part on the determined expected prevalence of the disease or disorder. In some embodiments, the method further comprises assigning individual subjects of the plurality of subjects to pools of the plurality of pools when the expected prevalence of the determined disease or disorder is less than a predetermined prevalence threshold. In some embodiments, the predetermined popularity threshold is about 5%, about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, or about 50%. In some embodiments, the method further comprises: determining a maximum pool size based on the determined expected prevalence of the disease or disorder, and assigning individual ones of the plurality of subjects to pools among the plurality of pools based on the maximum pool size.

FIG. 1A illustrates an example method 100 for performing sample mixing of multiple clinical samples for a diagnostic test. The method 100 may include obtaining a plurality of body samples from a plurality of subjects (e.g., operation 102). In some embodiments, the body sample comprises: nasopharyngeal swab, oropharyngeal swab, blood, serum, plasma, vitreous, sputum, urine, stool, tears, sweat, saliva, semen, mucosal secretion, mucus, spinal fluid, cerebrospinal fluid (CSF), pleural fluid, peritoneal fluid, amniotic fluid, lymph, ocular swab, cheek swab, vaginal swab, cervical swab, rectal swab, cell or tissue. Next, the method 100 may include obtaining a plurality of subjects associated with the plurality of subjects: health data, contact tracking data, location data, movement data, or any combination thereof (e.g., operation 104). In some cases, any data herein (e.g., health data, contact tracking data, location data, or movement data associated with a plurality of subjects) includes de-identified data. In some embodiments, the plurality of health data comprises: diagnosis of a disease or disorder, prognosis of a disease or disorder, risk of having a disease or disorder, history of treatment of a disease or disorder, history of past treatment of a disease or disorder, history of prescribed medications, history of prescribed medical devices, age, height, weight, sex, smoking condition, one or more symptoms, and one or more vital signs. In some embodiments, the one or more vital signs include one or more of the following: heart rate, heart rate variability, blood pressure, respiratory rate, blood oxygen concentration (SpO) ₂ ) Carbon dioxide concentration in respiratory gases, hormone levels, sweat analysis, blood glucose, body temperature, impedance, conductivity, capacitance, resistivity, electromyography, galvanic skin response, nerve signals, electroencephalography, electrocardiography, immunological markers, and other physiological measurements. In some embodimentsIn an example, the plurality of contact tracking data, location data, or movement data includes data associated with the subject's environment, location, movement, or daily schedule. For example, for a delivery warehouse whose employees are being diagnostically inspected, intelligent blending may be performed based on the employee's delivery route, shift schedule, contact tracking information, and the like. In some embodiments, the method 100 is applied to gene sequencing of a sample such that sample mixing is performed with low mutation rates. Next, the method 100 may include processing the plurality of with a trained computer algorithm: health data, contact tracking data, location data, movement data, or any combination thereof, such that at least some individual subjects of the plurality of subjects arrive at one of the plurality of pools (as in operation 106). In some embodiments, the number of pools in the plurality of pools is less than the number of subjects in the plurality of subjects. In some embodiments, the trained computer algorithm comprises a trained machine learning classifier. In some embodiments, the trained machine learning classifier includes an algorithm selected from the group consisting of: support vector machines, neural networks, random forests, linear regression, logistic regression, bayesian classifiers, lifting classifiers, gradient lifting algorithms, adaptive lifting (AdaBoost) algorithms, and extreme gradient lifting (XGBoost) algorithms.

FIG. 1B illustrates a general workflow for performing sample blending of a plurality of clinical samples for diagnostic testing, including data entry, consuming and integrating data, performing methods for sample blending, recommending blending protocols, and performing diagnostic tests on a sample pool. In some embodiments, performing a diagnostic test on the cuvette includes performing liquid processing of a plurality of body samples and/or the cuvette using an integrated robot and/or a robotic Application Programming Interface (API). In some embodiments, performing a diagnostic test on a cuvette includes detecting a plurality of diseases or disorders (e.g., common cold, influenza, a covd-19 infection, and/or a covd-19 immunization) using the same cuvette. In some embodiments, performing a diagnostic test on the cuvette includes labeling (e.g., using a sample tag or sample barcode) a plurality of body samples and/or the cuvette to increase the multiplicity of diagnostic tests (e.g., by DNA sequencing, RNA sequencing, or reverse transcription polymerase chain reaction (RT-PCR)). In some embodiments, performing a diagnostic test on the sample cell includes (e.g., where the mutation rate is low) gene sequencing a plurality of body samples and/or sample cells.

Fig. 1C shows a workflow for sample processing and testing. The workflow for performing sample processing and testing may include sample collection of a plurality of body samples from a plurality of subjects. Logical synergy (logistical synergies) (or drawbacks) may be related to the processing of body samples. The software may require a seamless stream of information (e.g., if the body sample at the collection point lacks the required data, the subject sample may be identified by the connected EHR system using a bar code). Next, the workflow for performing sample processing and testing may include extracting nucleic acids (e.g., DNA or RNA) from a body fluid sample. Next, a workflow for sample processing and testing may include mixing samples, where a pool is selected using the methods and systems of the present disclosure. Machine learning is used to select such pools to inform decisions about which samples to mix together and/or the sample sizes to mix together. Next, the workflow for performing sample processing and testing may include performing diagnostic testing or packaged testing (panel) on the mixed sample (e.g., RT-PCR diagnostic panel).

Fig. 2B illustrates an example of how the methods and systems of the present disclosure advantageously use sample mixing. For example, about 2 out of the 10 test results currently returned positive; patterns can be learned by machine learning 8 out of 10 negative patients. Using the methods and systems of the present disclosure, predictive analysis and machine learning may be applied to identify patients with lower prevalence. Multiple predicted negative samples may be mixed together to create a sample cell, and a single diagnostic test may be performed on the sample cell. Thus, the same number of subjects can be tested using a smaller number of diagnostic tests; the saved diagnostic test can be used to test a wider population of subjects. In general, when a disease or disorder has a sufficiently high prevalence, the expected probability of encountering at least one positive sample in the sample cell may increase to a sufficiently high value such that the savings obtained by sample mixing may be destroyed. Thus, the intelligent mixing strategy of the disclosed methods and systems enables efficient use of mixing tests (pool testing).

In some embodiments, nucleic acids may be extracted from a body sample and analyzed for diagnostic testing. For example, any suitable sequencing method may be used to generate sequencing reads from nucleic acids. The sequencing method may be a first generation sequencing method, such as Maxam-Gilbert or Sanger sequencing, or a high throughput sequencing (e.g., next generation sequencing or NGS) method. The high throughput sequencing method may sequence at least 10000, 100000, 1 million, 1 hundred million, 10 hundred million or more polynucleotide molecules simultaneously (or substantially simultaneously). Sequencing methods may include, but are not limited to: pyrosequencing, sequencing by synthesis, single molecule sequencing, nanopore sequencing, semiconductor sequencing, ligation sequencing, hybridization sequencing, digital gene expression (helics), massively parallel sequencing (e.g., helicos), cloned single molecule array (Solexa/Illumina), sequencing using PacBIO, SOLiD, ion Torrent, or nanopore platform.

In some embodiments, sequencing comprises Whole Genome Sequencing (WGS). Sequencing can be performed at a depth sufficient to assess tumor progression or tumor non-progression in a subject with a desired performance, e.g., accuracy, sensitivity, specificity, positive Predictive Value (PPV), negative Predictive Value (NPV), or area under the curve (AUC) of a subject's working characteristic curve (ROC).

In some embodiments, the sequencing reads can be aligned with or mapped to a reference genome. The reference genome may include at least a portion of a genome (e.g., a human genome). The reference genome may include the entire genome (e.g., the entire human genome). The reference genome may include the entire genome (e.g., the entire human genome that converts cytosine to thymine) to which certain base transformations have been applied, as may be used for methylation data alignment. The reference genome may comprise a database comprising a plurality of genomic regions corresponding to coding and/or non-coding genomic regions of the genome. The database may include a plurality of genomic regions corresponding to disease-associated coding and/or non-coding genomic regions of the genome, such as Single Nucleotide Variations (SNV), copy Number Variations (CNV), insertions or deletions (indels), and fusion genes. The alignment may be performed using the Burrows-Wheeler algorithm (BWA), sambac algorithm, samtools algorithm, or any other suitable alignment algorithm.

In some embodiments, quantitative measurements of sequencing reads may be generated for each of a plurality of genomic regions. Quantitative measurements of sequencing reads, such as counts of sequencing reads aligned with a given genomic region, may be generated. A sequencing read with a portion or all of the sequencing reads aligned with a given genomic region may account for a quantitative measurement of that genomic region.

In some embodiments, the genomic region may comprise a disease marker. Patterns of specific and non-specific genomic regions may be indicative of a disease progression state or a disease non-progression state. Changes in these patterns of genomic regions over time may be indicative of changes in the disease progression state or the disease non-progression state.

In some embodiments, the nucleic acid may be determined by performing a binding measurement (binding measurement) of the nucleic acid at each of the plurality of genomic regions. In some embodiments, performing the binding measurement includes determining the nucleic acid using a probe selective for at least a portion of the plurality of genomic regions in the plurality of nucleic acids. In some embodiments, the probe is a nucleic acid molecule having sequence complementarity to a nucleic acid sequence of a plurality of genomic regions. In some embodiments, the nucleic acid molecule is a primer or an enrichment sequence. In some embodiments, the assay comprises using array hybridization or Polymerase Chain Reaction (PCR), or nucleic acid sequencing.

In some embodiments, the nucleic acid is enriched for at least a portion of the plurality of genomic regions. In some embodiments, enriching comprises amplifying at least some of the nucleic acids. For example, a nucleic acid can be amplified by selective amplification (e.g., by using a set of primers or probes that comprise a nucleic acid molecule having sequence complementarity to the nucleic acid sequences of multiple genomic regions). Alternatively or in combination, the nucleic acid may be amplified by universal amplification (e.g., by using universal primers). In some embodiments, enriching comprises selectively isolating at least a portion of the plurality of nucleic acids.

In some embodiments, the sequencing reads may be normalized or corrected. For example, the sequencing reads may be de-duplicated, normalized, and/or corrected to account for known bias in sequencing and library preparation and/or known bias in sequencing and library preparation. In some embodiments, a subset of quantitative measurements (e.g., statistical measurements) may be filtered out, e.g., based on whether such quantitative measurements (e.g., at different points in time) vary significantly from the variation observed in unaffected subjects (e.g., background profile of nucleic acid (background profile)).

The plurality of genomic regions may comprise at least about 10 distinct genomic regions, at least about 50 distinct genomic regions, at least about 100 distinct genomic regions, at least about 500 distinct genomic regions, at least about 1 thousand distinct genomic regions, at least 5 thousand distinct genomic regions, at least 1 ten thousand distinct genomic regions, at least 5 ten thousand distinct genomic regions, at least 10 ten thousand distinct genomic regions, at least about 50 ten thousand distinct genomic regions, at least about 100 ten thousand distinct genomic regions, at least about 200 ten thousand distinct genomic regions, at least about 300 ten thousand distinct genomic regions, at least about 400 ten thousand distinct genomic regions, at least about 500 ten thousand distinct genomic regions, at least about 1000 ten thousand distinct genomic regions, at least about 1500 ten thousand distinct genomic regions, at least about 2000 ten thousand distinct genomic regions, at least about 2500 ten thousand distinct genomic regions, at least about 3000 ten thousand distinct genomic regions, or more than about 3000 ten thousand distinct genomic regions.

Computer system

The present disclosure provides a computer system programmed to implement the methods of the present disclosure. Fig. 3 shows a computer system 301, the computer system 301 being programmed or otherwise configured to: for example, directing obtaining a body sample from a subject, obtaining health data, contact tracking data, location data, or movement data associated with the subject, processing the health data, contact tracking data, location data, or movement data using a trained computer algorithm to assign an individual subject to a pool, outputting an electronic recommendation to create a mixed sample by combining body samples or portions of body samples obtained from subjects in the pool, directing creating a mixed sample by combining body samples or portions of body samples obtained from subjects in the pool, directing isolation of nucleic acids from the body sample, directing amplification of nucleic acids, directing a diagnostic test on the mixed sample to obtain diagnostic results associated with the mixed sample, and detecting that no disease or disorder is present in the individual subject of the pool when no disease or disorder is detected based on analyzing the mixed sample corresponding to the pool. The computer system 301 may adjust various aspects of the analysis, calculation, and generation of the present disclosure, such as, for example, directing the acquisition of a body sample from a subject, obtaining health data, contact tracking data, location data, or movement data associated with the subject, processing the health data, contact tracking data, location data, or movement data using trained computer algorithms to assign individual subjects to a pool, outputting electronic recommendations to create a mixed sample by combining body samples or portions of body samples obtained from subjects in the pool, directing the creation of a mixed sample by combining body samples or portions of body samples obtained from subjects in the pool, directing the isolation of nucleic acids from the body samples, directing the enrichment of nucleic acids, directing the diagnostic testing of the mixed sample to obtain diagnostic results associated with the mixed sample, and detecting that no disease or disorder is present in the individual subjects of the pool when no disease or disorder is detected based on analyzing the mixed sample corresponding to the pool. The computer system 301 may be the user's electronic device or a computer system remotely located relative to the electronic device. The electronic device may be a mobile electronic device.

Computer system 301 includes a central processing unit (CPU, also referred to herein as a "processor" and a "computer processor") 305, which may be a single-core or multi-core processor or more than one processor for parallel processing. Computer system 301 also includes memory or memory location 310 (e.g., random access memory, read only memory, flash memory), electronic storage unit 315 (e.g., hard disk), communication interface 320 (e.g., network adapter) for communicating with one or more other systems, and peripheral devices 325, such as cache memory, other memory, data storage, and/or electronic display adapter. The memory 310, the storage unit 315, the interface 320, and the peripheral devices 325 communicate with the CPU 305 through a communication bus (solid line), such as through a motherboard (athermal). The storage unit 315 may be a data storage unit (or data repository) for storing data. Computer system 301 may be operably coupled to a computer network ("network") 330 by way of a communication interface 320. The network 330 may be the Internet (Internet), and/or an extranet, or an intranet and/or an extranet in communication with the Internet. In some cases, network 330 is a telecommunications and/or data network. The network 330 may include one or more computer servers, which may support distributed computing, such as cloud computing. For example, one or more computer servers may implement cloud computing on network 330 ("cloud") to perform various aspects of the analysis, computation, and generation of the present disclosure, such as, for example, directing the acquisition of a body sample from a subject, obtaining health data, contact tracking data, location data, or movement data associated with a subject, processing the health data, contact tracking data, location data, or movement data using trained computer algorithms to assign individual subjects to a pool, outputting electronic recommendations to create a mixed sample by combining body samples or portions of body samples obtained from subjects in the pool, directing the creation of a mixed sample by combining body samples or portions of body samples obtained from subjects in the pool, directing the isolation of nucleic acids from the body sample, directing the amplification of nucleic acids, directing the diagnostic test of the mixed sample to obtain diagnostic results associated with the mixed sample, and detecting the absence of a disease or disorder in individual subjects of the pool when the absence of a disease or disorder is detected based on analyzing the mixed sample corresponding to the pool. Such cloud computing may be provided by cloud computing platforms such as Amazon Web Services (AWS), microsoft Azure, google cloud platform, and IBM cloud. In some cases, with the aid of computer system 301, network 330 may implement a peer-to-peer network (Peer-to-Peer network), which may enable devices coupled to computer system 301 to function as clients or servers.

The CPU 305 may execute a series of machine readable instructions, which may be embodied in a program or software. The instructions may be stored in a memory location, such as memory 310. The instructions may be directed to the CPU 305, which may then program the CPU 305 or otherwise configure the CPU 305 to implement the methods of the present disclosure. Examples of operations performed by the CPU 305 may include read, decode, execute, and write back.

The CPU 305 may be part of a circuit such as an integrated circuit. One or more other components of system 301 may be included in the circuit. In some cases, the circuit is an Application Specific Integrated Circuit (ASIC).

The storage unit 315 may store files such as drivers, libraries, and saved programs. The storage unit 315 may store user data, such as user preferences and user programs. In some cases, computer system 301 may include one or more additional data storage units external to computer system 301, such as on a remote server in communication with computer system 301 via an intranet or the Internet.

Computer system 301 may communicate with one or more remote computer systems over network 330. For example, computer system 301 may communicate with a remote computer system of a user (e.g., doctor, nurse, caregiver, patient, or subject). Examples of remote computer systems include personal computers (e.g., portable PCs), tablet or tablet PCs (e.g.

iPad、/>

Galaxy Tab), telephone, smart phone (e.g. +.>

iPhone, android enabled device, +.>

) Or a personal digital assistant. A user may access computer system 301 via network 330.

The methods as described herein may be implemented by way of machine (e.g., computer processor) executable code stored in an electronic storage location of computer system 301, such as, for example, in memory 310 or electronic storage unit 315. The machine executable code or machine readable code may be provided in the form of software. During use, code may be executed by processor 305. In some cases, code may be retrieved from the memory unit 315 and stored on the memory 310 for quick access by the processor 305. In some cases, electronic storage 315 may be eliminated and the machine-executable instructions stored in memory 310.

The code may be pre-compiled and configured for use with a machine having a processor adapted to execute the code, or may be compiled during runtime. The code may be provided in the form of a programming language that is selectable to enable execution of the code in a precompiled or as-compiled (as-loaded) manner.

Aspects of the systems and methods provided herein, such as computer system 301, may be embodied in programming. Aspects of the technology may be considered an "article" or "article (articles of manufacture)" generally in the form of machine (or processor) executable code and/or associated data embodied or carried out by a machine-readable medium. The machine executable code may be stored on an electronic storage unit such as memory (e.g., read only memory, random access memory, flash memory) or a hard disk. The "storage" media may include any or all of the tangible memory of a computer, processor, etc., or related modules thereof, such as various semiconductor memories, tape drives, disk drives, etc., which may provide non-transitory storage for software programming at any time. All or part of the software may sometimes communicate over the internet or a variety of other telecommunications networks. For example, such communication may cause software to be loaded from one computer or processor to another computer or processor, e.g., from a management server or host to a computer platform of an application server. Accordingly, another type of medium that may carry software elements includes light, electrical and electromagnetic waves, such as those used between local devices, through wired and fiber optic landline networks, and over a variety of air-links (air-links). Physical elements carrying such waves, such as wired or wireless links, optical links, etc., may also be considered to be media carrying software. As used herein, unless limited to a non-transitory, tangible "storage" medium, terms such as computer or machine "readable medium" refer to any medium that participates in providing instructions to a processor for execution.

Thus, a machine-readable medium, such as computer-executable code, may take many forms, including but not limited to, tangible storage media, carrier wave media, or physical transmission media. Nonvolatile storage media includes, for example, optical or magnetic disks, such as any storage devices in any computer or the like, such as may be used to implement a database or the like as shown in the accompanying drawings. Volatile storage media include dynamic memory, such as the main memory of a computer platform. Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that comprise a bus within a computer system. Carrier wave transmission media can take the form of electrical or electromagnetic signals, or acoustic or light waves, such as those generated during Radio Frequency (RF) and Infrared (IR) data communications. Thus, common forms of computer-readable media include, for example: a floppy disk (floppy disk), a flexible disk (flexible disk), hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD, or DVD-ROM, any other optical medium, punch card paper tape, any other physical storage medium with punch patterns, RAM, ROM, PROM and EPROM, FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, a cable or link transporting such a carrier wave, or any other medium from which a computer can read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution.

The computer system 301 may include an electronic display 335 or be in communication with the electronic display 335, the electronic display 335 including a User Interface (UI) 340 for providing, for example, health data, contact tracking data, location data or movement data, recommendations for creating a blended sample, and diagnostic results for the blended sample. Examples of UIs include, but are not limited to, graphical User Interfaces (GUIs) and web-based user interfaces.

The methods and systems of the present disclosure may be implemented by one or more algorithms. The algorithm may be implemented in software after execution by the central processor 305. The algorithm may, for example, direct the obtaining of a body sample from a subject, obtain health data, contact tracking data, location data, or movement data associated with the subject, process the health data, contact tracking data, location data, or movement data with a trained computer algorithm to assign individual subjects to the pool, output an electronic recommendation to create a mixed sample by combining body samples or portions of body samples obtained from subjects in the pool, direct the creation of a mixed sample by combining body samples or portions of body samples obtained from subjects in the pool, direct the isolation of nucleic acids from the body samples, direct the amplification of nucleic acids, direct the diagnostic test of the mixed sample to obtain diagnostic results associated with the mixed sample, and detect that no disease or disorder is present in individual subjects of the pool when no disease or disorder is detected based on analyzing the mixed sample corresponding to the pool.

Example

Example 1: efficient sample mixing for improving diagnostic test capability of covd-19

Using the methods and systems of the present disclosure, efficient sample mixing was performed to improve the diagnostic test ability of 2019 coronavirus disease (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus. Sample mixing is performed using one of three different types of information to perform classification of a plurality of individual samples, each individual sample being classified into a respective sample cell among a plurality of sample cells, the information including Boston and Cambridge queue information, clinical symptom information, and contact tracking information. Fig. 4A shows the number of people who can be examined by 1000 diagnostic tests using Boston and Cambridge queue information (left 3 columns), clinical symptom information (middle 3 columns), and contact tracking information (right 3 columns). In each set of 3 columns, the number of people that can be examined by 1000 diagnostic tests using no sample mix (left), simple sample mix (middle) and intelligent sample mix based on the methods and systems of the present disclosure (right) are indicated.

As shown in fig. 4A, simple sample mixing is preferred over no sample mixing for all three different types of information used; furthermore, intelligent sample mixing based on the methods and systems of the present disclosure is superior to simple sample mixing (and obviously also for clinical symptom information and contact tracking information) for all three different types of information used. Thus, using intelligent sample mixing based on the methods and systems of the present disclosure, the diagnostic test capability of covd-19 can be improved by a factor of about 3, at least 40% cost savings with a specificity of 96%.

FIG. 4B illustrates that rules for diagnostic testing may be redefined using intelligent sample mixing based on the methods and systems of the present disclosure. Based on the current state (left), the symptoms themselves are epidemics. Some subjects meeting a particular set of criteria are subject to testing, while others are denied testing (e.g., dosing of diagnostic tests due to insufficient testing capability). Using intelligent sample mixing (right) based on the methods and systems of the present disclosure, the ratio of available diagnostic tests to individuals receiving the covd-19 test is no longer 1:1. the ratio may be effectively increased to, for example, about 1.1, about 1.2, about 1.3, about 1.4, about 1.5, about 1.6, about 1.7, about 1.8, about 1.9, about 2.0, about 2.1, about 2.2, about 2.3, about 2.4, about 2.5, about 2.6, about 2.7, about 2.8, about 2.9, about 3.0, or greater than about 3.0.

Example 2: efficient sample mixing using precision FDA data

Using the methods and systems of the present disclosure, precision FDA data was used for efficient sample mixing to improve the diagnostic test capabilities of COVID-19. Precision fda data was analyzed as follows. First, a dataset is obtained. Next, the COVID-19-dependent observations are filtered.

Next, patient detailed information (e.g., clinical health data, contact tracking data, location data, movement data, and demographic data including age) is obtained regarding these observations. Next, a diagnostic test output result (expressed in binary format, where a value of 1 indicates a positive test result and a value of 0 indicates a negative test result) is obtained. Next, gender data (expressed in binary format, where a value of 0 indicates male and a value of 1 indicates female) is obtained. Next, race and ethnic data (represented by one hot coding through 6 binary values corresponding to each race/ethnic category) are obtained. Next, the data is filtered to preserve certain columns corresponding to relevant data characteristics including marital status, race, ethnicity, gender, medical cost, age, county positive cases, city positive cases, county popularity, city popularity. As an example of an input data structure, the data for each subject or patient may include one or more of the following: diagnostic test results, age, gender (or another de-identified geolocation), postal code (or other), cause of diagnostic test, vital signs (if available and accessible), and symptoms (if available). For example, symptoms may include fever, cough, sore throat, fatigue, body temperature, respiratory frequency, loss of taste, loss of sense of smell, vomiting, shortness of breath, chills, hypoxia, and the like. Next, epidemiological data is obtained from the Starschema COVID-19 epidemiological dataset.

Next, the data is divided into a training data set and a test data set. Next, supervised machine learning classifiers (e.g., random forest classifiers, extreme gradient boosting (XGBoost), and gradient boosting) are created and trained using the training data set. Next, a predicted output result of the test dataset is obtained. Next, a classification report and a confusion matrix are generated to evaluate the performance of the classifier. Next, the total number of positive cases and negative cases in the test dataset is obtained. Next, a set of feature importance values (e.g., weights) is calculated.

Table 1 shows the relative performance metrics obtained using three different supervised machine learning classifiers (random forest classifier, extreme gradient boosting (XGBoost), and gradient boosting). The random forest classifier has the following top 5 weighting features: fever, cough, body temperature, respiratory rate and loss of taste. The XGBoost classifier has the following top 5 weighting features: fever, cough, hypoxia, loss of body temperature and taste. The gradient lifting classifier has the following top 5 weighting features: fever, cough, body temperature, respiratory rate and loss of taste.

Table 1: performance comparison between machine learning classifier models

Fig. 5A shows the percentage of test savings achieved using simple sample mixing (gray), intelligent sample mixing (purple) using methods and systems based on the present disclosure, and the upper limit achieved using intelligent sample mixing (dashed line). As shown in fig. 5A, the percentage of test savings achieved using simple blending decreases rapidly as the prevalence of disease increases, and simple blending is not feasible above about 25%; furthermore, using intelligent sample mixing based on the methods and systems of the present disclosure achieves a greater percentage of test savings in all disease prevalence than using simple sample mixing.

Fig. 5B shows the percentage increase in diagnostic test capability achieved using simple sample mixing (gray), using smart sample mixing (purple) based on the methods and systems of the present disclosure, and the upper limit achieved using smart sample mixing (dashed line). As shown in fig. 5B, the percentage increase in diagnostic test capability achieved with simple mixing decreases rapidly with increasing disease prevalence, and simple mixing is not feasible above about 25%; furthermore, the use of intelligent sample mixing based on the methods and systems of the present disclosure achieves a greater percentage of diagnostic test capability improvements in all disease prevalence than the use of simple sample mixing. Thus, using intelligent sample mixing based on the methods and systems of the present disclosure, consistent improvements in inspection capability are achieved in all popular environments, and pool prevalence of less than 30% is strategically and aggressively maintained.

Example 3: diagnostic test number required for intelligent sample cell test

Using the methods and systems of the present disclosure, efficient sample mixing is performed for improving the diagnostic test capabilities of COVID-19. Fig. 6 shows the relative number of diagnostic tests required versus sample cell size (ranging from 1 sample per cell to 20 samples per cell) with Prevalence (PR) of 1% (light blue), 5% (orange), 10% (gray), 20% (yellow), and 30% (dark blue). The relative number of tests is calculated by simulating the utilization of resources. The relative number of tests is minimized based on both methods. First, an optimal sample pool size or sample mixing strategy is selected. By referring to the resulting pattern, the sample cell size that provides the lowest number of diagnostic tests is determined. Second, the Prevalence (PR) is reduced. The lower prevalence within the cuvette results in a lower number of required diagnostic tests, thereby improving the first strategy of selecting the optimal cuvette size or sample mixing strategy. Using the methods and systems of the present disclosure, the prevalence of sample pools is reduced by intelligent segmentation using machine learning methods.

Example 4: improving diagnostic test capability with high clinical sensitivity

Assuming a disease prevalence of 10%, a clinical sensitivity of 90%, a clinical specificity of 90%, a population of 40,000 clinical test samples (e.g., corresponding to a daily test capacity of Quest Diagnostic) is simulated. Using the methods and systems of the present disclosure, an optimal pool size of 4 samples per pool was selected and a 40% reduction in use of the diagnostic test kit was achieved, as well as 80% clinical sensitivity and 96% clinical specificity (as shown in fig. 7). Thus, these results indicate that the test ability of Quest Diagnostic increases by 16,000 Diagnostic tests without changing the availability of its initial kit.

While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. It is not intended that the invention be limited to the specific examples provided in this specification. While this invention has been described with reference to the above-mentioned specification, the descriptions and illustrations of the embodiments herein are not intended to be construed in a limiting sense. Many alterations, modifications and substitutions will now occur to those skilled in the art without departing from the invention. Furthermore, it should be understood that all aspects of the invention are not limited to the specific depictions, configurations or relative proportions set forth herein which depend upon a variety of conditions and variables. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is therefore contemplated that the present invention shall also cover any such alternatives, modifications, variations or equivalents. The accompanying claims are intended to define the scope of the invention and are therefore covered by methods and structures within the scope of these claims and their equivalents.

Claims

1. A method, comprising:

(a) Obtaining a plurality of subjects associated with: health data, contact tracking data, location data, movement data, or any combination thereof; and

(b) Processing the plurality of with a trained computer algorithm: health data, contact tracking data, location data, movement data, or any combination thereof, to assign at least some individual subjects of the plurality of subjects to pools among a plurality of pools, wherein the number of pools in the plurality of pools is less than the number of subjects in the plurality of subjects.

2. The method of claim 1, further comprising outputting an electronic recommendation to create, for each of at least two given pools of the plurality of pools, a mixed sample by combining body samples or portions of body samples obtained from subjects in the at least two given pools.

3. The method of claim 1, further comprising, for each of at least two given pools of the plurality of pools, creating a mixed sample by combining body samples or portions of body samples obtained from subjects in the at least two given pools.

4. The method of claim 2 or 3, further comprising obtaining the body sample or portion of a body sample from the plurality of subjects.

5. The method of any of claims 2-4, wherein the body sample is solely selected from the group consisting of: nasopharyngeal swab, oropharyngeal swab, blood, serum, plasma, vitreous, sputum, urine, stool, tears, sweat, saliva, semen, mucosal secretions, mucus, spinal fluid, cerebrospinal fluid (CSF), pleural fluid, peritoneal fluid, amniotic fluid, lymph, ocular swab, cheek swab, vaginal swab, cervical swab, rectal swab, cells and tissues.

6. The method of claim 5, further comprising isolating nucleic acids from the body sample, and for a given pool of the plurality of pools, creating the mixed sample by combining at least some nucleic acids isolated from the obtained body sample from a subject in the given pool.

7. The method of claim 6, further comprising enriching the nucleic acids of the plurality of genomic regions.

8. The method of claim 6 or 7, further comprising amplifying at least some of the nucleic acids.

9. The method of claim 8, wherein the amplifying comprises selective amplifying.

10. The method of claim 8, wherein the amplifying comprises universal amplifying.

11. The method of claim 7, wherein enriching the nucleic acid of the plurality of genomic regions comprises contacting the nucleic acid with a plurality of probes, each of the plurality of probes having sequence complementarity to at least a portion of one of the plurality of genomic regions.

12. The method of claim 11, wherein the plurality of genomic regions comprises genomic regions associated with a disease or disorder.

13. The method of claim 12, wherein the disease or disorder comprises 2019 coronavirus disease (covd-19), human Immunodeficiency Virus (HIV), or malaria.

14. The method of claim 13, wherein the disease or disorder comprises covd-19.

15. The method of any of claims 3-14, further comprising performing a plurality of diagnostic tests on a plurality of mixed samples to obtain a plurality of diagnostic results associated with the plurality of mixed samples.

16. The method of claim 15, wherein the plurality of diagnostic tests are configured to detect the presence or absence of a disease or disorder based on analyzing at least the plurality of mixed samples.

17. The method of claim 16, wherein the disease or disorder comprises 2019 coronavirus disease (covd-19).

18. The method of claim 16 or 17, further comprising, for a given pool of the plurality of pools, detecting the absence of the disease or disorder in each individual subject in the given pool when the absence of the disease or disorder is detected based on analyzing a mixed sample corresponding to the given pool.

19. The method of claim 16 or 17, further comprising, for a given pool of the plurality of pools, when the presence of the disease or disorder is detected based on analyzing a mixed sample corresponding to the given pool, then examining each individual subject in the given pool for the disease or disorder.

20. The method of claim 16 or 17, further comprising, for a given pool of the plurality of pools, when the presence of the disease or disorder is detected based on analyzing a mixed sample corresponding to the given pool, verifying each of a plurality of sub-pools of the given pool for the disease or disorder.

21. The method of any one of claims 16-20, further comprising detecting the presence or absence of the disease or disorder in the plurality of subjects with a clinical sensitivity of at least about 50%.

22. The method of claim 21, further comprising detecting the presence or absence of the disease or disorder in the plurality of subjects with a clinical sensitivity of at least about 70%.

23. The method of claim 22, further comprising detecting the presence or absence of the disease or disorder in the plurality of subjects with a clinical sensitivity of at least about 90%.

24. The method of any one of claims 16-20, further comprising detecting the presence or absence of the disease or disorder in the plurality of subjects with a clinical specificity of at least about 50%.

25. The method of claim 24, further comprising detecting the presence or absence of the disease or disorder in the plurality of subjects with a clinical specificity of at least about 70%.

26. The method of claim 25, further comprising detecting the presence or absence of the disease or disorder in the plurality of subjects with a clinical specificity of at least about 90%.

27. The method of any one of claims 16-20, further comprising detecting the presence or absence of the disease or disorder in the plurality of subjects with a Positive Predictive Value (PPV) of at least about 50%.

28. The method of claim 27, further comprising detecting the presence or absence of the disease or disorder in the plurality of subjects with a Positive Predictive Value (PPV) of at least about 70%.

29. The method of claim 28, further comprising detecting the presence or absence of the disease or disorder in the plurality of subjects with a Positive Predictive Value (PPV) of at least about 90%.

30. The method of any one of claims 16-20, further comprising detecting the presence or absence of the disease or disorder in the plurality of subjects with a Negative Predictive Value (NPV) of at least about 50%.

31. The method of claim 30, further comprising detecting the presence or absence of the disease or disorder in the plurality of subjects with a Negative Predictive Value (NPV) of at least about 70%.

32. The method of claim 31, further comprising detecting the presence or absence of the disease or disorder in the plurality of subjects with a Negative Predictive Value (NPV) of at least about 90%.

33. The method of any one of claims 16-20, further comprising detecting the presence or absence of the disease or disorder in the plurality of subjects using an area under the curve (AUC) of at least about 0.60.

34. The method of claim 33, further comprising detecting the presence or absence of the disease or disorder in the plurality of subjects using an area under the curve (AUC) of at least about 0.70.

35. The method of claim 34, further comprising detecting the presence or absence of the disease or disorder in the plurality of subjects using an area under the curve (AUC) of at least about 0.80.

36. The method of claim 35, further comprising detecting the presence or absence of the disease or disorder in the plurality of subjects using an area under the curve (AUC) of at least about 0.90.

37. The method of claim 1, wherein the plurality of health data comprises: diagnosis of a disease or disorder, prognosis of a disease or disorder, risk of having a disease or disorder, history of treatment of a disease or disorder, history of past treatment of a disease or disorder, history of prescribed medications, history of prescribed medical devices, age, height, weight, sex, smoking condition, one or more symptoms, and one or more vital signs.

38. The method of claim 37, wherein the one or more vital signs comprise one or more of: heart rate, heart rate variability, blood pressure, respiratory rate, blood oxygen concentration (SpO) ₂ ) Carbon dioxide concentration in respiratory gases, hormone levels, sweat analysis, blood glucose, body temperature, impedance, conductivity, capacitance, resistivity, electromyography, galvanic skin response, nerve signals, electroencephalography, electrocardiography, immunological markers, and other physiological measurements.

39. The method of claim 1, wherein the trained computer algorithm comprises a trained machine learning classifier.

40. The method of claim 39, wherein the trained machine learning classifier comprises an algorithm selected from the group consisting of: support vector machines, neural networks, random forests, linear regression, logistic regression, bayesian classifiers, lifting classifiers, gradient lifting algorithms, adaptive lifting (AdaBoost) algorithms, and extreme gradient lifting (XGBoost) algorithms.

41. The method of claim 1, further comprising: processing the health data, contact tracking data, location data, or movement data of the individual subjects using the trained computer algorithm to determine an expected prevalence of a disease or disorder, and assigning individual subjects of the plurality of subjects to pools of the plurality of pools based at least in part on the determined expected prevalence of the disease or disorder.

42. The method of claim 41, further comprising assigning individual subjects of the plurality of subjects to pools of the plurality of pools when the determined expected prevalence of the disease or disorder is less than a predetermined prevalence threshold.

43. The method of claim 42, wherein the predetermined popularity threshold is about 5%, about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, or about 50%.

44. The method of claim 41, further comprising determining a maximum pool size based on the determined expected prevalence of the disease or disorder, and assigning individual ones of the plurality of subjects to pools among the plurality of pools based on the maximum pool size.

45. The method of any one of claims 1-44, wherein the number of pools in the plurality of pools is reduced by at least 50% relative to the number of subjects in the plurality of subjects.

46. The method of claim 45, wherein the number of pools in the plurality of pools is reduced by at least 100% relative to the number of subjects in the plurality of subjects.

47. The method of claim 46, wherein the number of pools in the plurality of pools is reduced by at least 200% relative to the number of subjects in the plurality of subjects.

48. The method of claim 47, wherein the number of pools in the plurality of pools is reduced by at least 300% relative to the number of subjects in the plurality of subjects.

49. The method of any one of claims 1-48, further comprising administering a therapeutically effective therapeutic dose to treat the disease or disorder detected in at least a subset of the plurality of subjects based on the detection of the presence or absence of the disease or disorder in the plurality of subjects.

50. A computer system, comprising:

a database configured to store a plurality of data associated with a plurality of subjects: health data, contact tracking data, location data, movement data, or any combination thereof; and

one or more computer processors operably coupled to the database, wherein the one or more computer processors are individually or collectively programmed to:

processing the plurality of with a trained computer algorithm: health data, contact tracking data, location data, movement data, or any combination thereof, to assign at least some individual subjects of the plurality of subjects to pools among a plurality of pools, wherein the number of pools in the plurality of pools is less than the number of subjects in the plurality of subjects.

51. A non-transitory computer-readable medium containing machine-executable instructions that, when executed by one or more computer processors, perform a method comprising: