WO2004044236A1 - Evaluation d'etat - Google Patents

Evaluation d'etat Download PDF

Info

Publication number
WO2004044236A1
WO2004044236A1 PCT/AU2003/001517 AU0301517W WO2004044236A1 WO 2004044236 A1 WO2004044236 A1 WO 2004044236A1 AU 0301517 W AU0301517 W AU 0301517W WO 2004044236 A1 WO2004044236 A1 WO 2004044236A1
Authority
WO
WIPO (PCT)
Prior art keywords
subject
data
ofthe
status
anay
Prior art date
Application number
PCT/AU2003/001517
Other languages
English (en)
Inventor
Richard Bruce Brandon
Mervyn Rees Thomas
Original Assignee
Genomics Research Partners Pty Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Genomics Research Partners Pty Ltd filed Critical Genomics Research Partners Pty Ltd
Priority to AU2003275800A priority Critical patent/AU2003275800B2/en
Priority to CA2505151A priority patent/CA2505151C/fr
Priority to NZ539578A priority patent/NZ539578A/en
Priority to EP03810913A priority patent/EP1581658A4/fr
Publication of WO2004044236A1 publication Critical patent/WO2004044236A1/fr

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6834Enzymatic or biochemical coupling of nucleic acids to a solid phase
    • C12Q1/6837Enzymatic or biochemical coupling of nucleic acids to a solid phase using probe arrays or probe chips
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/30Unsupervised data analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/40ICT specially adapted for the handling or processing of patient-related medical or healthcare data for data related to laboratory analysis, e.g. patient specimen analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/40Encryption of genetic data
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Definitions

  • the present invention relates to a method and apparatus for determining the status of a subject, and in particular for determining the ability of a subject such as a human, horse or camel to compete in a sporting and/or racing event by evaluating, for example, molecules obtained from blood ofthe subject.
  • a condition of a performance animal may typically be determined by conventional means such as a blood profile test (determining conventional haematological and serum biochemical parameters) and clinical appraisal.
  • a blood profile test determining conventional haematological and serum biochemical parameters
  • clinical appraisal determining conventional haematological and serum biochemical parameters
  • these tests are of limited value because a conelation between results of a blood profile test or clinical appraisal and a condition or state of a performance animal is minimal.
  • a blood profile test may be suitable for providing some information in relation to an animal that is clinically diseased or ill, but is rarely suitable for determining fitness to perform of an animal, particularly if the animal is healthy according to use of cunent clinical appraisal methods, and particularly if the animal cannot communicate information about its condition.
  • blood profile tests are relatively inexpensive and easy to perform, they do not provide assessment of a wide range of conditions, conelations between test results and conditions of performance animals are poor, are limited to assessment of a few diseases, and are sometimes only useful in assessment of advanced stages of disease where clinical intervention is too late to prevent significant loss of performance.
  • a final report of the results of a blood test to an end user often requires involvement of multiple parties each providing separate input to the report.
  • a veterinarian may collect a blood sample, the sample is transported or sent to a laboratory for analysis, personnel in the laboratory perform an analysis using machinery on the blood sample, automated results from the analysis, with or without a veterinary pathologist inte ⁇ retation, are returned to the veterinarian who then inte ⁇ rets the results and provides a separate report to the trainer.
  • the process is laborious, time consuming, subject to enor and inte ⁇ retation bias and may or may not contain information relevant to the end user.
  • Bioinformatics may be used with genetic based diagnosis of an animal's health.
  • WO 01/25473 describes a method of characterising a biological condition or agent using calibrated gene expression profiles.
  • a test is performed to obtain a specific profile, which is then analysed.
  • the collected profile is compared to a predetermined profile to determine if the condition has been conectly identified.
  • US 6,287,254 describes a system that allows users to perform DNA genetic profiling to determine the susceptibility of a subject to a condition.
  • the subject is profiled to determine the presence of predetermined genes, which in turn indicate the susceptibility of a subject to a respective condition. Again, this requires specific tests for specific conditions, and only allows the susceptibility of a subject to be determined.
  • the convention in the art is to assay the molecules present in a particular tissue of a subject to evaluate conditions that are specific to that tissue. This convention in the art fails to contemplate the advantage of using blood to evaluate profiles, such as biological markers, for conditions that exist in all the tissues of the body, such that blood molecules act as sunogate reporters for conditions affecting any part ofthe body.
  • the present invention provides a method of determining the status of a subject, the method including: a) Obtaining subject data, the subject data including respective values for each of a number of parameters, the parameter values being indicative of the cunent biological status ofthe subject; b) Comparing the subject data to predetermined data, the predetermined data including for each of a number of conditions : i) A range of values for at least some ofthe parameters; and, ii) An indication ofthe condition; and, c) Determining the status ofthe subject in accordance with the results ofthe comparison, the status indicating at least one of the presence, absence or degree of one or more of the conditions.
  • parameter values may include complex relevant summaries of the parameters (for example regularised linear discriminant function coefficients or support vectors from a support vector machine model).
  • the indication ofthe condition can include at least one of: a) An indication of the stage of a condition; b) An indication ofthe degree of a condition; and c) An indication ofthe degree of health of a subject.
  • the number of parameters is typically greater than about 100, 200, 300, 400, 500 and preferably between about 1000 and about 6000.
  • the term "about” refers to values (e.g., amounts, concentrations, time etc) that vary by as much as 30%, 20%, 10%, 5%, or even by as much as 4%, 3%, 2%, 1% to a specified or reference value.
  • the method typically includes generating a report representing the status ofthe subject.
  • the method can also include determining the ability of the subject to perform in a sporting and/or racing event in accordance with the presence, absence or degree of any conditions.
  • individual parameters are representative of the level, abundance or functional activity of an agent in the subject or in a biological sample obtained from the subject.
  • the agent is a biological molecule, which includes any compound that is found intracellularly or extracellularly in an organism, including biological fluids, or in cells as a result of anabolic or catabohc processes within a cell, or as a result of cell uptake from the extracellular environment, by whatever means.
  • biological molecule is used herein in its broadest sense and includes a molecule having activity in a biological sense.
  • the biological may be selected from one or more of: a) A nucleic acid molecule; b) A proteinaceous molecule; c) An amino acid d) A carbohydrate; e) A lipid; f) A steroid; g) An inorganic molecule; h) An ion; i) A drug; j) A chemical; k) A metabolite;
  • parameters are representative of at least a subset of a biomolecular system defining a class of biomolecular component types.
  • gene transcripts are one example of a biomolecular component type that are generally associated with a biomolecular system generally refened to as the "transcriptome”.
  • Proteins are another example of a biomolecular component type and generally associated with a biomolecular system refened to as the proteome.
  • metabolites are metabolites, which are generally associated with a biomolecular system refened to as the "metabolome”.
  • At least some of the parameters profile a subset of at least one biomolecular system selected from a transcriptome and a proteome of one or more specific cell types.
  • parameters can be measured however, such as the near LR spectrum or mass spectroscopy spectrum of the subject's blood or of the isolated components of the subjects blood (eg serum, white blood cells, or white blood cell membranes), general measurements, such as temperature, or other biological indicators.
  • the near LR spectrum or mass spectroscopy spectrum of the subject's blood or of the isolated components of the subjects blood eg serum, white blood cells, or white blood cell membranes
  • general measurements such as temperature, or other biological indicators.
  • the method usually includes: a) Receiving confirmation ofthe determined status; and, b) Updating the predetermined data in accordance with the confirmed status and the subject data.
  • the predetermined data can include phenotypic information of the individuals, and the subject data can include phenotypic information regarding the subject, the phenotypic information including details of one or more phenotypic traits.
  • the method can include comparing the subject data to predetermined data for individuals having one or more phenotypic traits in common with the subject.
  • the predetermined data is preferably diagnostic signatures, the method including determining a diagnostic signature for a respective condition by data mining subject data relating to a number of individuals having known conditions, or degrees of conditions, each diagnostic signature including a range of values for at least some ofthe parameters.
  • the subject data can be determined by at least one of: a) Clinical trials; and, b) Diagnosis of conditions within subjects.
  • the diagnosis may be performed in accordance with the method of the first broad form of the invention and being subsequently confirmed by medically trained personnel.
  • the predetermined data can be diagnostic signatures, the method including determining a diagnostic signature for a respective condition by: a) Obtaining data relating to a number of individuals, the data including: i) An indication of the status of the individual; ii) Respective values for each ofthe number of parameters; b) Selecting one or more groups of individuals in accordance with the status of the individuals and the condition; and, c) Determining a range of parameter values for each group in accordance with the parameter values of the individuals, the range of parameter values representing a diagnostic signature for the respective group.
  • the method typically includes: a) Comparing the data for each ofthe individuals to predetermined criteria; and, b) Selectively excluding one or more individuals from a respective group in accordance with the results ofthe comparison.
  • the method can include: a) Receiving confirmation ofthe determined status; b) Comparing the data for each ofthe individuals to predetermined criteria; and, c) Updating the predetermined data in accordance with the confirmed status and the subject data in response to a successful comparison.
  • the predetermined criteria generally represent quality control criteria.
  • the method therefore typically further includes: a) Comparing the data for each ofthe individuals to each other; and, b) Selectively excluding one or more individuals from a respective group in accordance with the results of the comparison.
  • the method can include, for each selected group: a) Determining parameters that allow the group to be distinguished from each other group; and, b) Determining a range of parameter values for the selected parameters in accordance with the parameter values ofthe individuals in the respective group.
  • the method includes for each condition: a) Determining parameters that allow the degree ofthe condition to be determined; and, b) Determining a range of parameter values for the selected parameters taking account of the relationship between these parameter values and the degree ofthe condition.
  • the method may include for each diagnostic signatures: a) Obtaining data for an individual having the respective condition; b) Comparing the parameter values for the individual to the respective diagnostic signature; and, c) Revising the diagnostic signature in accordance with an unsuccessful comparison.
  • the method typically further includes generating a report representing the status of the subject.
  • the method can be performed using a system including at least one end station coupled to a base station via a communications network, the method including causing the base station to: a) Receive the subject data from the end station via the communications network; b) Determine the status ofthe subject; c) Transfer an indication of the subject status to the end station via the communications network.
  • the subjects and individuals can include: a) Horses; b) Camels; c) Greyhounds; d) Human Athletes; and, e) Other Performance animals.
  • the present invention provides apparatus for determining the status of a subject, the apparatus including a processing system adapted to: a) Obtain subject data, the subject data including respective values for each of a number of parameters, the parameter values being indicative of the cunent biological status ofthe subject; b) Compare the subject data to predetermined data, the predetermined data including for each of a number of conditions: i) A range of values for at least some ofthe parameters; and, ii) An indication ofthe condition; and, c) Determine the status of the subject in accordance with the results of the comparison, the status indicating at least one of the presence, absence or degree of one or more of the conditions.
  • the apparatus can therefore be adapted to perform the method of the first broad form of the invention.
  • the present invention provides a computer program product for determining the status of a subject, the computer program product including computer executable code which when executed on a suitable processing system causes the processing system to perform the method ofthe first broad form ofthe invention.
  • the present invention provides a method of determining diagnostic signatures for use in the status determination of a subject, the method including: a) Obtaining data relating to a number of individuals, the data including: i) An indication ofthe status ofthe individual, including an indication of at least one definitively diagnosed condition; ii) Respective values for each ofthe number of parameters; b) Selecting one or more groups of individuals in accordance with the status of the individuals and the condition; and, c) Determining a range of parameter values for each group in accordance with the parameter values of the individuals, the range of parameter values representing a diagnostic signature for a respective group.
  • the method preferably includes, for each selected group: a) Determining parameters that allow the group to be distinguished from each other group; and, b) Determining a range of parameter values for the selected parameters in accordance with the parameter values ofthe individuals in the respective group.
  • the method typically includes for each diagnostic signature: a) Obtaining data for an individual having the respective condition; b) Comparing the parameter values for the individual to the respective diagnostic signature; and, c) Revising the diagnostic signature in accordance with an unsuccessful comparison.
  • the data for each ofthe individuals can be determined by at least one of: a) Clinical trials; and, b) Diagnosis of conditions within subjects.
  • the diagnosis can be confirmed by a medical practitioner or veterinarian.
  • the method may include: a) Receiving confirmation ofthe determined status; b) Comparing the data for each ofthe individuals to predetermined criteria; and, c) Updating the predetermined data in accordance with the confirmed status and the subject data in response to a successful comparison.
  • the method can include: a) Comparing the data for each ofthe individuals to predetermined criteria; and, b) Selectively excluding one or more individuals from a respective group in accordance with the results ofthe comparison.
  • the predetermined criteria may represent quality control criteria.
  • the method can include: a) Comparing the data for each ofthe individuals to each other; and, b) Selectively excluding one or more individuals from a respective group in accordance with the results of the comparison.
  • the conditions can include at least one of: a) A disease; and, b) An assessment that the individual is healthy.
  • the present invention provides apparatus for determining diagnostic signatures for use in the status determination of a subject, the apparatus being adapted to perform the method ofthe fourth broad form ofthe invention.
  • the present invention provides a computer program product for determining diagnostic signatures for use in the status determination of a subject, the computer program product including computer executable code which when executed on a suitable processing system causes the processing system to perform the method of the fourth broad form ofthe invention.
  • the present invention provides a method of allowing a user to determine the status of a subject, the method including: a) Receiving subject data from the user via a communications network, the subject data including respective values for each of a number of parameters, the parameter values being indicative of the cunent biological status of the subject; b) Comparing the subject data to predetermined data, the predetermined data including for each of a number of conditions: i) Values for at least some ofthe parameters; and, ii) An indication ofthe condition; and, c) Determining the status ofthe subject in accordance with the results ofthe comparison, the status indicating the presence and/or absence ofthe one or more conditions; and, d) Transferring an indication of the status of the subject to the user via the communications network.
  • the method generally further includes: a) Having the user determine the subject data using a remote end station; and, b) Transferring the subject data from the end station to the base station via the communications network.
  • the base station can include first and second processing systems, in which case the method can include: a) Transferring the subject data to the first processing system; b) Transferring the subject data to the second processing system; and, c) Causing the second processing system to perform the comparison.
  • the method may also include: a) Transferring the results ofthe comparison to the first processing system; and, b) Causing the first processing system to determine the status ofthe subject.
  • the method preferably includes at least one of: a) Transferring the subject data between the communications network and the first processing system through a first firewall; and, b) Transferring the subject data between the first and the second processing systems through a second firewall.
  • the second processing system may be coupled to a database adapted to store the predetermined data, the method including: a) Querying the database to obtain at least selected predetermined data from the database; and, b) Comparing the selected predetermined data to the subject data.
  • the second processing system can be coupled to a subject database, the method including storing the subject data in the subject database.
  • the status may include details of any conditions of the individuals, in which case the method can include determining any conditions displayed by the user.
  • the method may also include determining the ability of the subject to perform in a sporting and/or racing event in accordance with any determined conditions.
  • the method can include having the user determine the subject data using a secure anay, the secure anay of elements capable of determining the quantity of a biological molecule and having a number of features each located at respective position(s) on the and a respective code.
  • the method typically includes causing the base station to: a) Determine the code from the subject data; * ⁇ b) Determine a layout indicating the position of each feature on the anay; c) Determine the parameter values in accordance with the determined layout, and the subject data.
  • the secure anay may consist of a set of randomly located features, v each feature being tagged to identify the molecular marker with which it is associated, for example the features may micro beads tagged with an oligonucleotide bead type identifier and a probe oligonucleotide, self assembled onto an etched fibre optic bundle.
  • the method may include having the user determine the subject data using a -5 secure anay of elements capable of determining the quantity of a biological molecule, the secure anay having a number of features each tagged with an identifier determining the type of biological molecule to which they bind, and a respective code, the method including causing the base station to: a) Determine the code from the subject data; 0 b) Determine a layout indicating the position of each feature on the anay; c) Determine the parameter values in accordance with the determined layout, and the subject data.
  • the method may also include: a) Receiving confirmation ofthe determined status from the user; and, b) Updating the predetermined data in accordance with the confirmed status and the subject data.
  • the features can include at least one of: a) An oligonucleotide; b) A nucleotide; c) A peptide; d) An amino acid; e) An antibody; f) A carbohydrate; g) A lipid; h) A cell; and, i) An organism.
  • the method can also include causing the base station to: a) Determine payment information, the payment information representing the provision of payment by the user; and, b) Perform the comparison in response to the determination ofthe payment information.
  • the present invention provides a base station for determining the status of a subject, the base station including: a) A store method for storing predetermined data, the predetermined data including for each of a number of conditions: i) Values for at least some ofthe parameters; and, ii) An indication ofthe condition; and, b) A processing system, the processing system being adapted to: i) Receive subject data from the user via a communications network, the subject data including respective values for each of a number of parameters, the parameter values being indicative ofthe cunent biological status ofthe subject; ii) Compare the subject data to the predetermined data; iii) Determine the status of the subject in accordance with the results of the comparison; and, c) Output an indication of the status of the subject to the user via the communications network.
  • the processing system can be adapted to receive subject data from a remote end station adapted to determine the subject data.
  • the processing system may include: 0 a) A first processing system adapted to: i) Receive the subject data; and ii) Determine the status of the subject in accordance with the results of the comparison; and, b) A second processing system adapted to: 5 i) Receive the subject data from the processing system; and, ii) Perform the comparison; and, iii) Transfer the results to the first processing system.
  • the base station typically includes:
  • a first firewall for coupling the first processing system to the communications network; and, b) A second firewall for coupling the first and the second processing systems.
  • the processing system can be coupled to a subject database, the processing system being
  • the method of performing the comparison can include causing the second processing system to: a) Obtain the predetermined data in the form of a set of signatures; and, b) Use the signatures to classify the subject data into a respective one ofthe groups.
  • the method may further include determining one or more conditions displayed by the subject in accordance with the determined group.
  • the subject data may be determined using a secure anay, the secure anay having a number of features each located at respective position on the anay, and a respective code, the processing system being adapted to: a) Determine the code from the subject data; b) Determine a layout indicating the position of each feature on the anay; c) Determining the parameter values in accordance with the determined layout, and the subject data.
  • the processing system can be adapted to: a) Receive confirmation ofthe determined ability; and, b) Update the predetermined data in accordance with the determined ability and the subject data.
  • the base station of the eighth broad form of the invention may therefore be adapted to perform the method ofthe seventh broad form ofthe invention.
  • the present invention provides a computer program product for implementing a base station for determining the status of a subject, the computer program product including computer executable code which when executed on a suitable processing system causes the processing system to perform the method of the seventh broad form of the invention.
  • the present invention provides an end station adapted to determine the status of a subject, the end station including a processor adapted to: a) Determine subject data from the user, the subject data including the subject data including respective values for each of a number of parameters, the parameter values being indicative ofthe cunent biological status ofthe subject; b) Transfer the subject data to a base station via a communications network, the base station being adapted to: i) Compare the subject data to predetermined data for one or more individuals, the predetermined data including: (1) One or more parameter values for the respective individual; and, (2) An indication ofthe status of each individual; and, ii) Determine the status of the subject in accordance with the results of the comparison; and, c) Receive an indication ofthe status ofthe subject via the communications network.
  • the end station is typically adapted to cooperate with the base station of the eighth broad form the invention to perform the method ofthe seventh broad form ofthe invention.
  • the present invention provides a computer program product for determining the status of a subject, the computer program product including computer executable code which when executed on a suitable processing system causes the processing system to operate as an end station according to the seventh broad form ofthe invention.
  • the present invention provides a method of determining the ability of a subject to perform in a sporting and/or racing event, the method including: a) Obtaining subject data, the subject data including one or more parameter values, at least one of the parameter being indicative of the cunent biological status of the subject; b) Comparing the subject data to predetermined data, the predetermined data including for each of a number of individuals: i) One or more parameter values for the respective individual; and, ii) An indication ofthe status of each individual; c) Determining the status ofthe subject in accordance with the results ofthe comparison; and, d) Providing an indication ofthe ability in accordance with the results ofthe comparison.
  • the method of determining the status ofthe subject may be the method of the first or seventh broad forms ofthe invention.
  • the status of each individual typically indicates any conditions displayed by the user, in which case the method typically includes: a) Determining any conditions displayed by the user in accordance with the results ofthe comparison; and, b) Determining the ability in accordance with the determined conditions.
  • the present invention provides apparatus for determining the ability of a subject to perform in a sporting and/or racing event, the apparatus including a processing system adapted to: a) Obtain subject data, the subject data including one or more parameter values, at least one ofthe parameter being indicative ofthe cunent biological status ofthe subject; b) Compare the subject data to predetermined data, the predetermined data including for each of a number of individuals: i) One or more parameter values for the respective individual; and, ii) An indication ofthe status of each individual; c) Determine the status of the subject in accordance with the results of the comparison; and, d) Provide an indication ofthe ability in accordance with the results ofthe comparison.
  • the processing system is generally adapted to perform the method of the ninth broad form of the invention.
  • the present invention provides a computer program product for determining the ability of a subject to perform in a sporting and/or racing event, the computer program product including computer executable code which when executed on a suitable processing system causes the processing system to perform the method of the ninth broad form ofthe invention.
  • the present invention provides a method of providing secure anays, each anay including a number of predetermined features, the method including: a) Determining a number of respective feature layouts, each layout representing the positioning of each feature on a respective anay; b) Determining a number of codes, each code conesponding to a respective layout; c) Generating a number of anays in accordance with at least one of: i) a respective layout, and including the conesponding code thereon, the code being used in processing the anay; and, ii) as a self assembled random anay of tagged features, each feature coded with information describing the molecular identity of the probe which it contains, and including the conesponding code thereon, the code being used in processing the anay.
  • the method can be performed to provide the anays on behalf of an entity, the method including providing an indication of the layouts and conesponding codes to the entity, to thereby allow the entity to process the anays.
  • the method of determining the layouts typically includes: a) Determining a prefened layout; and, b) Moving the position of one or more of the features from the position in the prefened layout to alternative position.
  • the method can include: a) Determining the type of each feature; and, b) Exchanging the position of one or more features having different feature types.
  • the present invention provides a method comprising: a) for each of a plurality of animals having a known status, measuring a number of biological factors potentially indicative of said status; b) analysing said biological factors to obtain at least one model providing a statistical conelation between said biological factors and said status; c) storing at least one said model; and d) responsive to a request for status determination of a particular animal, the request including, for the particular animal, measures of at least some of the number of biological factors potentially indicative of said status, applying at least one stored model to the information in the request in order to attempt to determine the status of the particular animal.
  • the present invention provides, the method comprises: a) for each of a plurality of animals having a known condition, measuring a number of biological factors potentially indicative of said condition; b) determining at least one model that provides a statistical conelation between said biological factors and said condition; c) storing said at least one model; and d) responsive to a request for status determination of a particular animal, the request including, for the particular animal, measures of at least some of the number of biological factors potentially indicative of said status, applying at least one stored model to the information in the request in order to attempt to determine the status of the particular animal.
  • the present invention provides a method comprising: a) providing a system including a database of (a) statistical models that conelate biological factors to known conditions, and (b) statistical models that correlate known conditions or biological factors to known statuses; b) responsive to a user request for a status determination for a particular animal, said request including measures of at least some biological factors, applying at least one statistical model from the database to at least some of the biological factors in the request in order to determine whether the animal has a known condition or a known status; and c) providing the user with the status determination.
  • the user is preferably at a remote location from the database and wherein the user is only provided with the status determination if the user is authorised to access the system.
  • a request typically includes a unique identity for the animal and wherein the system stores information relating to the animal based on its identity.
  • the method preferably further comprises determining the status of the animal based at least in part on previously stored information about the animal.
  • the method can further comprise providing the user with a list of additional information that might be useful in making a status determination.
  • the present invention provides a method comprising: a) providing a system including a database of (a) statistical models that conelate biological factors of horses to known conditions in horses, and (b) statistical models that conelate known conditions in horses or biological factors of horses to known statuses of horses; b) responsive to a user request for a status determination for a particular horse, said request including measures of at least some biological factors of the particular horse, applying at least one statistical model from the database to at least some of the biological factors in the request in order to determine whether the horse has a known condition or a known status; and c) providing the user with the status determination of the horse.
  • the user When the user is at a remote location from the database the user is typically only provided with the status determination if the user is authorised to access the system.
  • the request can include a unique identity for the horse and wherein the system stores information relating to the horse based on its identity.
  • the method can further comprise determining the status of the horse based at least in part on previously stored information about the horse.
  • the method may further comprise providing the user with a list of additional information about the horse that was not provided with the request and that might be useful in making a status determination about the horse.
  • Figure 1 is a schematic diagram of an example of a processing system for implementing examples of the invention
  • FIG. 2 is a flow chart outlining the process implemented by the system of Figure 1;
  • Figure 3 is a schematic diagram of an example of a distributed architecture
  • Figure 4 is a schematic diagram of an example of one ofthe end stations of Figure 3;
  • Figure 5 depicts a flow chart ofthe process implemented by the system of Figure 3;
  • Figure 6A is a flow chart of an example ofthe process for generating diagnostic signatures
  • Figure 6B is an example ofthe data flow for the process for generating diagnostic signatures
  • Figures 7 A and 7B are a flow chart of an example of the process of comparing the subject data to the diagnostic signatures
  • Figure 8 is a schematic diagram of a second example of a distributed architecture.
  • Figure 9 is a flow chart of an example ofthe process for generating secure anays.
  • Figure 10 is a flow chart of an example of the process for generating subject data using the secure anays.
  • Figure 11 is a flow chart of an example ofthe process of data mining;
  • Figure 12 is a flow diagram illustrating dataflow steps in a specific example as part of a computer system capable of delivery of remote diagnostic services
  • Figure 13 is a flow diagram showing an example of the processing associated with diagnosing a condition of an animal in accordance with a specific example
  • Figure 14 is a diagram illustrating an environment for working the specific example shown in
  • Figure 15 is a flow diagram illustrating an example of the processing associated with preparing an anay in accordance with a specific example ofthe invention.
  • Figure 16 is a flow diagram showing steps for determining a nucleic acid expression level in a biological sample
  • Figure 17 is a flow diagram illustrating steps for building a database in accordance with a specific example
  • Figure 18 is a trace output from the Agilent Lab-on-a-Chip system, representing high quality
  • RNA as determined by GeneChip® analysis of the RNA.
  • the first peak from the left is a marker of known quantity.
  • the second and third peaks represent the 18S and 28S RNA.
  • 28S peak should be larger than the 18S peak in exactly the proportions shown here.
  • the rest ofthe trace is relatively flat representing high quality RNA.
  • Figure 19 is a trace output from the Agilent Lab-on-a-Chip system, representing low quality
  • RNA as determined by GeneChip® analysis of the RNA.
  • the yield is low (the 18S and 28S peaks are small compared to the first control peak) and the sloping trace represents degraded
  • Figure 20 is a photographic representation of a screen capture from MAS 5 of a .DAT file for a single GeneChip®.
  • the actual chip is contained within the outer blue borders. Genetrax is spelled out in the top left-hand comer through the binding of the B2 oligo during the hybridisation process.
  • the bottom sixth of the chip is black because it contains no oligonucleotides.
  • Figure 21 is a photographic representation of a close-up of the top left-hand corner of the screen capture shown in Figure 20.
  • MAS 5 has laid down a grid on top ofthe oligonucleotide squares as part of the orientation process. It is important that the software recognises each square accurately, given that the outer pixels are discarded. The outer-most border, a grid in the top left-hand corner and the G of Genetrax can be seen.
  • These squares consist of oligonucleotides that bind to the spiked-in B2 oligo. Detail of some of the oligonucleotides for horse genes can be seen with some squares lighting up and some squares remaining dark.
  • Figure 22 shows a scatter plot ofthe four conditions (i.e., osteoarthritis (A), EHN (E), gastric ulcer syndrome (G) and normal ( ⁇ )) with respect to the first two linear discriminant functions in the demonstration study.
  • FIG. 1 shows a processing system suitable for implementing the present invention.
  • Figure 1 shows a processing system 10 including a processor 20, a memory 21, an optional input/output (I/O) device 22 and an interface 23 coupled together via a bus 24.
  • the interface 23 is adapted to couple the processing system 10 to one or more databases shown generally at 11.
  • the processing system 10 is adapted to receive subject data, which is data representative ofthe cunent biological status of a subject.
  • subject data is typically in the form of raw data and therefore requires inte ⁇ retation to allow the status of the subject to be determined. This is achieved by having the processing system 10 compare the subject data to predetermined data stored in the database 11.
  • the predetermined data includes data representative of the biological status of a number of individuals, together with an indication ofthe actual status ofthe individuals when the predetermined data was collected.
  • the processing system may be any form of processing system suitably programmed to perform the analysis, as will be described in more detail below.
  • the processing system may therefore be a suitably programmed computer, laptop, palm computer, or the like.
  • specialised hardware or the like may be used. This allows the hardware system to be implemented as a portable device, such as a PDA which may be coupled to the database 11 via a suitable communications network, such as the Internet, as will be appreciated by persons skilled in the art.
  • the user determines subject data in the form of parameter values representing the cunent biological status of the subject.
  • the parameter values represent specific measurements of selected parameters that represent the cunent biological status of the subject. It will be appreciated that a number of different forms of parameters may be used, as will be described in more detail below.
  • the user provides the parameter values to the processing system 10, which then operates to compare the subject data to the predetermined data at step 120.
  • the predetermined data includes parameter values for a number of individuals having a range of different biological states.
  • Comparing the subject and predetermined data allows the processing system 10 to determine the status of the subject in accordance with the results of the comparison at step 130.
  • the processing system 10 will attempt to identify individuals having similar parameter values to the subject. The status of the subject will then be determined to be similar to that of the identified individuals.
  • processing system 10 provides an indication of the status to the user at step 140.
  • This procedure can therefore be used to identify a wide range of conditions that may be displayed by the subject.
  • the system can be adapted to determine the presence or absence of one or more of a number of conditions in the subject.
  • the subject being an athletic performance subject, such as a human, race horse, camel, llama, greyhound, or the like, this allows an assessment to be made of the impact of the presence or absence of the conditions on the ability ofthe performance animal to compete in events, such as races.
  • each of the number of conditions must have been previously identified in the individuals, and it is therefore necessary to have predetermined data for a number of individuals, with at least some of the individuals having one or more of the conditions, and at various stages ofthe conditions. Furthermore, it is also necessary to utilise a sufficiently large number of parameters to allow each of the respective conditions to be distinguished on a statistical basis, and a sufficiently large number of individuals in the sample from which predetermined data are obtained.
  • the predetermined data it is typical for the predetermined data to ultimately include values for a large number of parameters and individuals. As a result the determination of the predetermined data is typically a time consuming and expensive procedure. This has an impact on the manner in which the system is implemented, primarily as it is not feasible for individual users wanting to implement the method to collect their own predetermined data. Accordingly, in one example, the techniques may be implemented using a distributed processing system an example of which is shown in Figure 3.
  • the apparatus is formed from a base station 1 coupled to a number of end stations 3 via a communications network 2, and or via a number of LANs (Local Area Networks) 4.
  • the base station 1 is generally formed from one or more of the processing systems 10 coupled to a data store, such as the database 11, as shown.
  • the processing system 10 operates substantially as described above to process data received via the communications networks 2, 4.
  • the processing system 10 can then supply an indication of the determined subject status back to the respective end station 3 via the communications network 2, 4, as will be understood by a person skilled in the art.
  • this allows the base station to be administered by an operator, that provides services allowing users of the end stations 3 to determine the status of a subject. This in turn overcomes the need for each user to obtain their own predetermined data. Furthermore, by having the base station 1 perform the comparison of the subject and predetermined data, and determine the status, this allows the operator of the base station 1 to restrict access to the predetermined data, thereby preventing the data being accessed and used by unauthorised third parties. This, in turn allows the operator to charge a fee for the provision of an indication ofthe status ofthe subject, as will be described in more detail below.
  • the data are protected, for example, by known encryption techniques, before being sent from the end stations 3 to the base station 10.
  • the results produced by the base station 10 a preferably encrypted before being sent back to the end stations 3. In this manner, the privacy and security of queries and results are maintained.
  • the system may be implemented using a number of different architectures.
  • the communications network 2 is preferably the Internet 2, with the LANs 4 representing private LANs, such as LANs within a company or the like.
  • the services provided by the base station 1 are generally accessible via the Internet 2. Accordingly, in order to provide a suitable implementation, the processing system 10 can be adapted to generate web pages, or the like, that can be viewed by users of the end stations 3. Accordingly, the processing system 10 may be any suitable form of processing system that executes appropriate application software stored in the memory 21 to allow the desired functionality to be achieved.
  • the base station 1 includes a processing system, such as a network server, web server or the like.
  • the end stations 3 must be capable of communicating with the base station 1 to allow browsing of web pages, or the transfer of data in other manners. Accordingly, as shown in Figure 4, in this example, the end stations 3 are formed from a processing system including a processor 30, a memory 31, an input/output (I/O) device 32 and an interface 33 coupled together via a bus 34.
  • the interface 33 which may be a network interface card or the like is used to couple the end station to the Internet 2 or one ofthe respective LANs 4.
  • the end station 3 may be formed from any suitable processing system such as a suitably programmed PC, Internet Terminal, Lap-top, hand held PC or the like which is typically operating application software to enable web browsing or the like.
  • the end station 3 may be formed from specialised hardware, such as an electronic touch sensitive screen coupled to suitable processor and memory.
  • the end stations 3 may be connected to the Internet 2 or the LANs 4 via wired or wireless connections, as will be appreciated by a person skilled in the art. This allows the end stations 3 to be implemented as hand held devices wireless devices, as will be described in more detail below.
  • the process begins at step 200 with the user determining the parameter values for the subject.
  • the parameter values are then encoded as subject data by the end station 3 at step 210. This is typically achieved in accordance with a predetermined algorithm such that the subject data has a predetermined format that can be inte ⁇ reted by the base station 1.
  • the subject data may be protected by encryption at this time.
  • the user accesses the base station 1 using the end station 3.
  • the end station 3 Preferably only authorised users may access the system in the base station 1.
  • the user ofthe end station 3 may be required to either register with the base station 1 or supply a previously determined user name and password. In particular, this is performed to allow the base station 1 to determine the identity of the user and therefore confirm that the user has authorisation to utilise the services provided by the base station 1 and/or to ensure that payment can be obtained for the provision ofthe services.
  • the user name and password will typically be provided when the user registers with the base station 1 on a first occasion. At this point the user has to make provisions for payments, such as the provision of account details, thereby allowing the operator ofthe base station 1 to charge the user for the services provided.
  • identification of the user can be achieved in accordance with cookies stored at the end station 3, or an identifier associated with the end station 3, which may for example be the MAC (Media Access Control) address ofthe end station interface 33, or the like.
  • access to the services provided by the base station 1 is generally limited to authorised users, although this is not essential.
  • the base station 1 when the user accesses the base station 1 , this is typically achieved by accessing respective web pages generated by the base station 1. This allows the user to select the respective services required, which in this example is an indication ofthe status of a subject.
  • the user will be transfened to a secure environment to allow the subject data to be transfened to the base station 1 for processing.
  • This is typically achieved, for example, by implementing an SSL (Secure Socket Layer) connection between the base station 1 and the end station 3.
  • SSL Secure Socket Layer
  • Any mechanism for secure communication may be used between the base station 1 and the end station 3. Confidentiality of the subject data and the determined status are important as the results are often used in determining the ability and/or eligibility of the subject to compete in sporting and/or racing events, this information can be extremely valuable, especially to the gambling industry. It is therefore preferable to ensure the information is retained confidential at all times. It is generally also prefened to keep confidential the fact that a status test is being performed on a particular subject.
  • the subject data is transfened to the base station 1 at step 230.
  • the base station 1 will typically operate to review the subject data to ensure that it is genuine subject data, and that for example, the data does not disguise an attempt to gain illicit or unauthorised access to the base station 1 to obtain access to the predetermined data. This is typically achieved by having the base station 1 implement a firewall between the processing system 10 and the Internet 2 or LANs 4 to ensure that unwanted data is not received.
  • step 240 the processing system 10 operates to determine the subject data type.
  • the exact subject data provided and, in particular, the parameters for which values are provided may vary depending on the respective implementation. This will be described in further detail below.
  • the subject data may be collected using anays, in which case a number of different anays may be provided.
  • the base station 1 will operate to determine the type of anay being used, to allow the subject data to be inte ⁇ reted.
  • the processing system 10 selects at least some of the predetermined data in accordance with the subject data type.
  • the processing system 10 will operate to select parameter values from the predetermined data for parameters conesponding to those contained in the subject data.
  • the processing system 10 compares the parameter values of the subject- data to the parameter values of the selected predetermined data.
  • the processing system 10 operates to compare the parameter values to those obtained from a number of different individuals that between them have a range of different conditions. This allows the processing system 10 to determine one or more conditions displayed by the subject at step 270.
  • the processing system 10 optionally determines the ability of the subject to compete in a sporting and or racing event in accordance with the determined conditions.
  • the processing system 10 then transfers an indication of at least the conditions to the end station 3 at step 290.
  • the system may be implemented in a variety of ways.
  • the subject data is formed from phenotypic information representative of the cunent biological status of the subject.
  • the phenotypic information results from the expression of the genotype of the subject and is therefore typically in the form of information such as expression data, or the like.
  • the phenotypic information profiles gene expression in one or more specific cell types.
  • the profiled gene expression represents at least a subset of the transcriptome.
  • transcriptome is meant the entire complement of transcripts that are expressed by the specific cell type(s), including transcripts expressed in both normal and disease states.
  • the transcriptome thus has a qualitative element (the identity of individual gene transcripts) and a quantitative element (the proportion of each unique transcript in the total number of individual transcripts present in the cell at a particular moment).
  • the transcriptome comprises messenger RNAs transcribed from a multiplicity of transcription units that populate a genome.
  • the profiled gene expression represents at least a subset of the proteome.
  • proteome refers to the global pattern of protein expression in the specific cell type(s), including proteins expressed in both normal and disease states.
  • the cell types are selected from primary cells, which, generally, are cells that cannot proliferate indefinitely in culture.
  • Primary cells can be derived from adult tissue, or from embryo tissue that is differentiated in culture to an adult cell or to a precursor of an adult cell that displays specialised characteristics.
  • Illustrative cell types include specialised cell types such as but not limited to cardiomyocytes, endothelial cells, sensory neurones, motor neurones, CNS neurones (all types), astrocytes, glial cells, schwann cells, mast cells, eosinophils, smooth muscle cells, skeletal muscle cells, pericytes, lymphocytes, tumour cells, monocytes, macrophages, foamy macrophages, granulocytes, synov/ ⁇ l cells/synovz ⁇ l fibroblasts, epithelial cells (varieties from all tissues/organs).
  • specialised cell types such as but not limited to cardiomyocytes, endothelial cells, sensory neurones, motor neurones, CNS neurones (all types), astrocytes, glial cells, schwann cells, mast cells, eosinophils, smooth muscle cells, skeletal muscle cells, pericytes, lymphocytes, tumour cells, monocytes, macrophages, foamy macrophages,
  • vascular endothelial cells smooth muscle cells (aortic, bronchial, coronary artery, pulmonary artery, etc), skeletal muscle cells, fibroblasts (many types, such as synovial), keratinocytes, hepatocytes, dendritic cells, astrocytes, neurone cells (including mesencephalic, hippocampal, striatal, thalamic, hypothalamic, olfactory bulb, substantia nigra, locus coeruleus, cortex, dorsal root ganglia, superior cervical ganglia, sensory, motor, cerebellar cells), neutiophils, eosinophils, basophils, mast cells, monocytes, macrophage cells, erythrocytes, megakaryocytes, hematopoietic progenitor cells, hematopoietic pluripotent stem cells, any stem cells, any progenitor cells, epithelial
  • the expression data may relate to the level, abundance or functional activity of an RNA molecule or a polypeptide.
  • the RNA molecule includes, but is not restricted to, RNA transcripts such as a primary gene transcript or pre-messenger RNA (pre-mRNA), which may contain one or more introns, as well as a messenger RNA (mRNA) in which any introns of the pre-mRNA have been excised and the exons spliced together, heterogenous nuclear RNA (hnRNA), small nuclear RNA (snRNA), small nucleolar RNA (snoRNA), small cytoplasmic RNA (scRNA), ribosomal RNA (rRNA), translational control RNA (tcRNA), transfer RNA (tRNA), eRNA, messenger-RNA-interfering complementary RNA (micRNA) or interference RNA (iRNA) and mitochondrial RNA (mtRNA).
  • pre-mRNA pre-messenger RNA
  • mRNA messenger RNA
  • hnRNA heterogenous nuclear
  • Suitable polypeptides that are contemplated by the present invention include enzymes, receptors, immunoglobulins, hormones, cytokines, chemokines, neuropeptides, adhesins, glycoproteins and the like.
  • the expression data may relate to the level or abundance of a carbohydrate including monosaccharides, oligosaccharides and polysaccharides.
  • the phenotypic information relates to expression data, these are typically obtained by any suitable qualitative or quantitative technique.
  • multiplexed analysis techniques including anays and distinctly detectable beads as is well known in the art.
  • the phenotypic information includes information representing at least a subset of the transcriptome (also refened to herein as a "subtranscriptome") of one or more cell types. Determination of gene expression, or gene expression profiling, may be accomplished by any one of many suitable procedures available in the art. Examples of such methods may employ differential display, highthroughput sequencing of cDNA libraries, gene expression profiling using solid phase platforms including microchip anays of genes or northern blot analysis of gene transcription, and mass spectroscopy.
  • DDRT-PCR Differential Display Reverse Transcriptase Polymerase Chain Reaction
  • gene expression can be analysed by quantifying the number of expressed genes and their relative abundance under given conditions and at a given time (see e.g., Seilhamer et al, "Comparative Gene Transcript Analysis," U.S. Pat. No. 5,840,484).
  • this method utilises high-throughput cDNA sequencing to identify specific transcripts of interest.
  • the generated cDNA and deduced amino acid sequences are then extensively compared with at least one nucleic acid sequence database (e.g., GenBank). After it is determined if the sequence is an exact match, a similar sequence or entirely dissimilar, the sequence is entered into a data base.
  • the numbers of copies of cDNA conesponding to a particular genes are tabulated, preferably with the aid of a computer program.
  • the numbers of copies are divided by the total number of sequences in the data set, to obtain a relative abundance of transcripts for each conesponding gene.
  • the list of represented genes can then be sorted by abundance in the cDNA population.
  • DNA chip technology allows comparisons to be conveniently conducted by the use of nucleic acid microanays (see, e.g., Kozian and Kirschbaum, 1999 supra for review and references therein).
  • anays are generated using cDNAs (including Expressed Sequence Tags ESTs), PCR products, cloned DNA and synthetic oligonucleotides that are fixed to a substrate such as nylon filters, glass slides or silicon chips.
  • cDNAs including Expressed Sequence Tags ESTs
  • PCR products cloned DNA and synthetic oligonucleotides that are fixed to a substrate such as nylon filters, glass slides or silicon chips.
  • labelled cDNAs or PCR products are hybridised to the anay and the hybridisation patterns compared.
  • detectably (e.g., fluorescently) labelled probes allows mRNA from one or more cell populations to be analysed simultaneously on a single microanay and the results measured at different wavelengths.
  • a microanay-based differential expression screening technique is described in U.S. Pat. No. 5,800,992.
  • Illustrative methods for preparation, use and analysis of microanays are described by Brennan et al. (U.S. Pat. No. 5,474,796), Schena et al. (1996 Proc. Natl. Acad. Sci. USA 93:10614-10619), Baldeschweiler et al. (PCT application WO95/251116), Shalon et al.
  • mRNA ⁇ 1 ⁇ g is isolated from the test cells to generate first-strand cDNA by using a T7-linked oligo(dT)primer.
  • in vitro transcription is performed with biotinylated UTP and CTP (Enzo Diagnostics), the result is a 40- to 80-fold linear amplification of RNA.
  • RNA is fragmented to 50- to 150-nt size before overnight hybridisation to Affymetrix (Santa Clara, Calif.) HU6000 anays (e.g., such anays may contain probe sets for 6,416 human genes (5,223 known genes and 1,193 ESTs)). After washing, anays are stained with streptavidin-phycoerythrin (Molecular Probes) and scanned on a Hewlett Packard scanner. Intensity values are scaled such that overall intensity for each chip ofthe same type is equivalent.
  • Intensity for each feature ofthe anay is captured using the GeneChip® Software (Affymetrix, Santa Clara, Calif), and a single raw expression level for each gene is derived from the 20 probe pairs representing each gene by using a trimmed mean algorithm.
  • a threshold of 20 units is assigned to any gene with a calculated expression level below 20, because discrimination of expression below this level is not performed with confidence in this procedure.
  • gene expression profiles are analysed using suitable statistical analyses, for example, iterative global partitioning clustering algorithms and Bayesian evidence classification, to identify and characterise clusters of genes having similar expression profiles (see, e.g., Long et al, 2001, J. Biol. Chem., 276(23): 19937- 19944).
  • the steps involved in this statistical analysis are (1) determination of the fold induction (log ratio) of the genes, (2) normalisation of the gene profile to a magnitude equal to 1, (3) partition clustering of all genes measured in to determine unique clustering patterns, (4) differentiation of gene clusters in each test populations into the following sub-groups based on their expression as compared to the population-average profile: early up-regulated, late up-regulated, down-regulated and others, (5) performance of a comparative analysis to explore the common genes in the early up- regulated and down-regulated cluster sub-groups in the test populations of cells, and (6) conelation based on the Pearson conelation coefficient to determine differences and similarities among the sub-groups in the test populations of cells.
  • the phenotypic information includes information representing at least a subset of the proteome (also refened to herein as a "subproteome") of one or more cell types.
  • proteome expression patterns, or profiles are analysed by quantifying the number of expressed proteins and their relative abundance under given conditions and at a given time.
  • a profile of a cell's proteome may thus be generated by separating and analysing the polypeptides of a particular tissue or cell type. For example, proteins extracted from tissue or cell samples can be separated into individual proteins by gel electrophoresis (Hochstrasser et al, 1988 Anal Biochem. 173:424-435; Huhmer et ⁇ /., 1997 Anal. Chem.
  • the separation can be achieved using two-dimensional gel electrophoresis, in which proteins from a sample are separated by isoelectric focusing in the first dimension, and then according to molecular weight by sodium dodecyl sulphate slab gel electrophoresis in the second dimension (see, e.g., Anderson et al, 1996 Electrophoresis 17:443-453).
  • the proteins are visualised in the gel as discrete and uniquely positioned spots, typically by staining the gel with an agent such as Coomassie Blue or silver or fluorescent stains.
  • Commercial software packages are available for automated spot detection.
  • gel images are electronically retrieved by high-resolution scanners and analysed (spot-finding) using pattern recognition techniques against 2-D gel database queries (Miura, 2001 Electrophoresis 22:801-813). Proteome maps are then compared against databases for identification of up- or down-regulation in a disease state.
  • the optical density of each protein spot is generally proportional to the level ofthe protein in the sample.
  • the optical densities of equivalently positioned protein spots from different samples for example, from biological samples obtained from different subjects, are compared to identify any changes in protein spot density between the subjects.
  • Sophisticated software packages can be employed to enhance contrast, subtract background, align images, remove artefacts, and perform gel comparison.
  • Spots of interest may be excised from gels and the proteins identified using, for example, standard methods employing chemical or enzymatic cleavage followed by mass spectrometry including Matrix Assisted Laser Deso ⁇ tion Ionisation-Time Of Flight (MALDI-TOF) mass spectrometry and electrospray mass spectrometry (see, e.g., Pandey and Mann, 2000 Nature 405:837-846).
  • MALDI-TOF Matrix Assisted Laser Deso ⁇ tion Ionisation-Time Of Flight
  • electrospray mass spectrometry see, e.g., Pandey and Mann, 2000 Nature 405:837-846.
  • the identity of the protein in a spot may be determined by comparing its partial sequence, typically of at least 5 contiguous amino acid residues, to a protein sequence database (e.g., SwissProt, GenPept or other sequence databases). In some cases, further sequence data may be obtained for definitive protein identification.
  • proteomes can be analysed using activity-based probes ("ABPs”) (see, e.g., U.S. Pat. App. Pub. 2002/0182651).
  • ABPs activity-based probes
  • a protein extract is combined with ABPs to produce covalent conjugates ofthe active target proteins with the probes.
  • the probes comprise a "warhead" directed to a desired protein class.
  • the warhead is covalently linked to a ligand, which is typically detectable, e.g. by fluorescence (“fABP”), and which may be used for separation and/or detection.
  • fABP fluorescence
  • the resulting protein conjugates are proteolytically digested to provide probe- labelled peptides.
  • ABPs are selected such that each active target protein forms a conjugate with a single ABP at a single discrete location in the target protein, each conjugate thereby giving rise to a single ABP-labelled peptide.
  • Enrichment separation, or identification of one or more ABP-labelled peptides is achieved using liquid chromatography and/or electrophoresis. Additionally, mass spectrometry can be employed to identify one or more ABP-labelled peptides by molecular weight and/or amino acid sequence. If desired, the sequence information derived from of the ABP-labelled peptide(s) is used to identify the protein from which the peptide originally derived.
  • Variations of this method can be used to compare the proteome of two more cells or cell populations, e.g., using ABPs having different ligands, or, when analysis comprises mass spectrometry, having different isotopic compositions.
  • ABPs that differ isotopically are used to enhance the information obtained from MS procedures to quantitatively compare individual proteins or classes of proteins between two or more cells or populations of cells.
  • the mass spectrometer may be operated in a dual mode in which it alternates in successive scans between measuring the relative quantities of peptides obtained from prior fractionation and recording the sequence information of the peptides.
  • Peptides can be quantified by measuring in the MS mode the relative signal intensities for pairs of peptide ions of identical sequence that are tagged with the isotopically light or heavy forms of the reagent, respectively, and which therefore differ in mass by the mass differential encoded with the ABP.
  • Peptide sequence information can be automatically generated by selecting peptide ions of a particular mass-to- charge (m/z) ratio for collision-induced dissociation (CID) in the mass spectrometer operating in the MS" mode.
  • CID collision-induced dissociation
  • the resulting CID spectra can be then automatically conelated with sequence databases to identify the protein from which the sequenced peptide originated. Combination of the results generated by MS and MS" analyses of affinity tagged and differentially labelled peptide samples allows the determination of the relative quantities as well as the sequence identities of the components of protein mixtures.
  • Protein identification by MS can be accomplished by conelating the sequence contained in the CID mass spectrum with one or more sequence databases, e.g., using computer searching algorithms (Eng et al, 1994 J. Am. Soc. Mass Spectrom. 5:976-989; Mann et al, 1994 Anal. Chem. 66:4390-9439; Qin et al, 1997 ibid 69:3995-4001; Clauser, et al, 1995 Proc. Natl. Acad. Sci. USA 92:5072-5076).
  • Pairs of identical peptides tagged with the light and heavy affinity tagged reagents, respectively are chemically identical and therefore serve as mutual internal standards for accurate quantification.
  • the MS measurement readily differentiates between peptides originating from different samples, representing different cell states or other parameters, because of the difference between isotopically distinct reagents attached to the peptides.
  • the ratios between the intensities of the differing weight components of these pairs or sets of peaks provide an accurate measure of the relative abundance of the peptides and the conelative proteins because the MS intensity response to a given peptide is independent of the isotopic composition of the reagents.
  • the use of isotopically labelled internal standards is standard practice in quantitative mass spectrometry (De Leenheer et al, 1992 Mass Spectrom. Rev. 11:249-307).
  • differences in concentration of proteins and other biomolecular component types can be detected using a post synthetic isotope labelling method (see, e.g., U.S. Pat. App. Pub. 2003/0129769).
  • a first chemical moiety is attached to a protein, peptide, or the cleavage products of a protein in a first sample and a second chemical moiety is attached to a protein, peptide, or the cleavage products of a protein in a second sample to yield first and second isotopically labelled proteins, peptides or protein cleavage products, respectively, that are chemically equivalent, yet isotopically distinct.
  • the chemical moiety can be a single atom (e.g., oxygen) or a group of atoms (e.g., an acetyl group).
  • the labelled proteins, peptides or peptide cleavage products are isotopically distinct because they contain different isotopic variants of the same chemical entity (e.g., a peptide in the first sample contains 1H where the peptide in the second sample contains 2 H; or a peptide in the first sample contains 12 C where the peptide in the second sample contains 13 C).
  • At least a portion of each sample is typically mixed together to yield a combined sample, which is subjected to mass spectrometric analysis. Control and experimental samples are mixed after labelling, fractions containing the desired components are selected from the mixture, and concentration ratio is determined to identify analytes that have changed in concentration between the two samples.
  • This isotope labelling method permits identification of up- and down-regulated proteins using affinity selection methods, 2-D gel electrophoresis, 1-D, 2-D or multi-dimensional chromatography, or any combination thereof, and employs either autoradiography or mass spectrometry.
  • mass spectrometric analysis can be used to determine peak intensities and quantify isotope ratios in the combined sample to determine whether there has been a change in the concentration of a protein between two samples, and to facilitate identification of a protein from which a peptide fragment is derived.
  • the protein is identified by detection of a signature peptide that is unique to a single protein or protein class of a proteome or subproteome of interest (see, e.g., U.S. Pat. App. Pubs. 2003/0186326 and 2003/0129769).
  • Protein capture anays typically comprise a plurality of protein-capture agents each of which defines a spatially distinct feature ofthe anay.
  • the protein-capture agent can be any molecule or complex of molecules which has the ability to bind a protein and immobilise it to the site of the protein-capture agent on the anay.
  • the protein-capture agent may be a protein whose natural function in a cell is to specifically bind another protein, such as an antibody or a receptor.
  • the protein-capture agent may instead be a partially or wholly synthetic or recombinant protein which specifically binds a protein.
  • the protein-capture agent may be a protein which has been selected in vitro from a mutagenised, randomised, or completely random and synthetic library by its binding affinity to a specific protein or peptide target.
  • the selection method used may optionally have been a display method such as ribosome display or phage display, as known in the art.
  • the protein-capture agent obtained via in vitro selection may be a DNA or RNA aptamer which specifically binds a protein target (see, e.g., Potyrailo et al, 1998 Anal. Chem. 70:3419-3425; Cohen et al, 1998, Proc. Natl. Acad. Sci.
  • aptamers are selected from libraries of oligonucleotides by the SelexTM process and their interaction with protein can be enhanced by covalent attachment, through inco ⁇ oration of brominated deoxyuridine and UN-activated crosslinking (photoaptamers). Aptamers have the advantages of ease of production by automated oligonucleotide synthesis and the stability and robustness of D ⁇ A; universal fluorescent protein stains can be used to detect binding.
  • the in vitro selected protein-capture agent may be a polypeptide (e.g., an antigen) (see, e.g., Roberts and Szostak, 1997 Proc. Natl. Acad. Sci. USA, 94:12297-12302).
  • a polypeptide e.g., an antigen
  • An alternative to an anay of capture molecules is one made through 'molecular imprinting' technology, in which peptides (e.g., from the C-terminal regions of proteins) are used as templates to generate structurally complementary, sequence-specific cavities in a polymerisable matrix; the cavities can then specifically capture (denatured) proteins which have the appropriate primary amino acid sequence (e.g., available from ProteinPrintTM and Aspira Biosystems).
  • peptides e.g., from the C-terminal regions of proteins
  • the cavities can then specifically capture (denatured) proteins which have the appropriate primary amino acid sequence (e.g., available from ProteinPrintTM and Aspira Biosystems).
  • Exemplary protein capture anays include antibody anays, which can facilitate extensive parallel analysis of numerous proteins defining a proteome or subproteome.
  • Antibody anays have been shown to have the required properties of specificity and acceptable background, and some are available commercially (e.g., BD Biosciences, Clontech, BioRad and Sigma). Narious methods for the preparation of antibody anays have been reported (see, e.g., Lopez et al, 2003 J. Chromatogr. B 787:19-27; Cahill, 2000 Trends in Biotechnology 7:47-51; U.S. Pat. App. Pub. 2002/0055186; U.S. Pat. App. Pub.
  • the antibodies of such anays recognise at least a subset of proteins expressed by a cell or population of cells, illustrative examples of which include growth factor receptors, hormone receptors, neurotransmitter receptors, catecholamine receptors, amino acid derivative receptors, cytokine receptors, extracellular matrix receptors, antibodies, lectins, cytokines, se ⁇ ins, proteases, kinases, phosphatases, ras-like GTPases, hydrolases, steroid hormone receptors, transcription factors, heat-shock transcription factors, D ⁇ A-binding proteins, zinc- finger proteins, leucine-zipper proteins, homeodomain proteins, intracellular signal transduction modulators and effectors, apoptosis-related factors, D ⁇ A synthesis factors, D ⁇ A repair factors, D ⁇ A recombination factors, cell-surface antigens, hepatitis C virus (HCN) proteases and HIN proteases.
  • growth factor receptors include growth factor receptors, hormone receptors
  • Antibodies for protein anays are made either by conventional immunisation (e.g., polyclonal sera and hybridomas), or as recombinant fragments, usually expressed in E. coli, after selection from phage display or ribosome display libraries (e.g., available from Cambridge Antibody Technology, Biolnvent, Affitech and Biosite).
  • 'combibodies' comprising non-covalent associations of NH and VL domains, can be produced in a matrix format created from combinations of diabody-producing bacterial clones (e.g., available from Domantis).
  • Exemplary antibodies for use as protein-capture agents include monoclonal antibodies, polyclonal antibodies, Fv, Fab, Fab' and F(ab') 2 imrnunoglobulin fragments, synthetic stabilised Fv fragments, e.g., single chain Fv fragments (scFv), disulphide stabilised Fv fragments (dsFv), single variable region domains (dAbs) minibodies, combibodies and multivalent antibodies such as diabodies and multi-scFv, single domains from camelids or engineered human equivalents.
  • synthetic stabilised Fv fragments e.g., single chain Fv fragments (scFv), disulphide stabilised Fv fragments (dsFv), single variable region domains (dAbs) minibodies, combibodies and multivalent antibodies such as diabodies and multi-scFv, single domains from camelids or engineered human equivalents.
  • the term 'scaffold' refers to ligand-binding domains of proteins, which are engineered into multiple variants capable of binding diverse target molecules with antibodylike properties of specificity and affinity.
  • the variants can be produced in a genetic library format and selected against individual targets by phage, bacterial or ribosome display.
  • Such ligand-binding scaffolds or frameworks include 'Affibodies' based on Staphylococcus aureus protein A (e.g., available from Affibody), 'Trinectins' based on fibronectins (e.g., available from Phylos) and 'Anticalins' based on the lipocalin structure (e.g., available from Pieris). These can be used on capture anays in a similar fashion to antibodies and may have advantages of robustness and ease of production.
  • a support surface which is generally planar or contoured.
  • Common physical supports include glass slides, silicon, microwells, nitrocellulose or PVDF membranes, and magnetic and other microbeads.
  • microdrops of protein delivered onto planar surfaces are widely used, related alternative architectures include CD centrifugation devices based on developments in microfluidics (e.g., available from Gyros) and specialised chip designs, such as engineered microchannels in a plate (e.g., The Living ChipTM, available from Biotrove) and tiny 3D posts on a silicon surface (e.g., available from Zyomyx).
  • Particles in suspension can also be used as the basis of anays, providing they are coded for identification; systems include colour coding for microbeads (e.g., available from Luminex, Bio-Rad and Nanomics Biosystems) and semiconductor nanocrystals (e.g., QDotsTM, available from Quantum Dots), and barcoding for beads (UltraPlexTM, available from Smartbeads) and multimetal microrods (NanobarcodesTM particles, available from Sunomed). Beads can also be assembled into planar anays on semiconductor chips (e.g., available from LEAPS technology and BioAnay Solutions).
  • colour coding for microbeads e.g., available from Luminex, Bio-Rad and Nanomics Biosystems
  • semiconductor nanocrystals e.g., QDotsTM, available from Quantum Dots
  • barcoding for beads UltraPlexTM, available from Smartbeads
  • individual protein- capture agents are typically attached to an individual particle to provide the spatial definition or separation of the anay.
  • the particles may then be assayed separately, but in parallel, in a compartmentalised way, for example in the wells of a microtitre plate or in separate test tubes.
  • a protein sample which is optionally fragmented to form peptide fragments (see, e.g., U.S. Pat. App. Pub. 2002/0055186), is delivered to a protein-capture anay under conditions suitable for protein or peptide binding, and the anay is washed to remove unbound or non-specifically bound components of the sample from the anay.
  • the presence or amount of protein or peptide bound to each feature of the anay is detected using a suitable detection system.
  • the amount of protein bound to a feature of the anay may be determined relative to the amount of a second protein bound to a second feature of the anay. In certain embodiments, the amount of the second protein in the sample is already known or known to be invariant.
  • a protein sample of a first cell or population of cells is delivered to the anay under conditions suitable for protein binding.
  • a protein sample of a second cell or population of cells to a second anay is delivered to a second anay which is identical to the first anay. Both anays are then washed to remove unbound or non-specifically bound components of the sample from the anays.
  • the amounts of protein remaining bound to the features of the first anay are compared to the amounts of protein remaining bound to the conesponding features of the second anay.
  • the amount of protein bound to individual features of the first anay is subtracted from the amount of protein bound to the conesponding features ofthe second anay.
  • fluorescence labelling can be used for detecting protein bound to the anay.
  • the same instrumentation as used for reading DNA microanays is applicable to protein-capture anays.
  • capture anays e.g. antibody anays
  • fluorescently labelled proteins from two different cell states, in which cell lysates are labelled with different fluorophores (e.g., Cy-3 and Cy-5) and mixed, such that the colour acts as a readout for changes in target abundance.
  • Fluorescent readout sensitivity can be amplified 10-100 fold by tyramide signal amplification (TSA) (e.g., available from PerkinElmer Lifesciences).
  • TSA tyramide signal amplification
  • Planar waveguide technology e.g., available from Zeptosens
  • High sensitivity can also be achieved with suspension beads and particles, using phycoerythrin as label (e.g., available from Luminex) or the properties of semiconductor nanocrystals (e.g., available from Quantum Dot).
  • Fluorescence resonance energy transfer has been adapted to detect binding of unlabelled ligands, which may be useful on anays (e.g., available from Affibody).
  • Data analysis for functional protein expression is then conducted in a manner analogous to that discussed for gene expression analysis above.
  • signal intensity measurements are first normalised to magnitude of 1 across the time profile.
  • Data can also be normalised across protein species to a magnitude of 1 at each time point.
  • Partitioning k- means clustering may be applied to the normalised data.
  • Average profiles are calculated for the protein species within each cluster.
  • the similarity of the proteomic clusters to the genomic expression clusters is then determined through association analysis based on a similarity measure, as for example the Pearson's conelation coefficient or Euclidean distance of the two profiles.
  • Coordination of such data would encompass any and all types of suitable comparisons or analyses to determine the differences, similarities, and/or relationships between gene expression and protein modification, resulting in a more complete understanding of the activities occurring within a cell or population of cells, or between two or more cells or populations of cells.
  • the techniques used for profiling a biomolecular system will include internal or external standards to permit quantitative or semi-quantitative determination of the conesponding molecular component types defining the biomolecular system or subset thereof in a subject, to thereby enable a valid comparison of subject data with predetermined data.
  • standards can be determined by the skilled practitioner using standard protocols.
  • the subject data includes absolute values for the abundance or functional activity of individual profiled molecular component types.
  • the subject data may optionally contain genotypic information including genetic information carried in the chromosomes and extrachromosomally.
  • genotypic information including genetic information carried in the chromosomes and extrachromosomally.
  • Such data may be obtained from genetic mapping, genetic screening, pedigree, family history and heritable physical and psychological characteristics.
  • the phenotypic information includes the level or abundance of biomolecules such as but not limited to carbohydrates, lipids, steroids, co-factors, mimetics, prosthetic groups (such as haem), inorganic molecules, ions (such as Ca 2+ ), inositides, hormones, growth factors, cytokines, chemokines, inflammatory agents, toxins, metabolites, pharmaceutical agents, plasma-borne nutrients (including glucose, amino acids, co-factors, mineral salts, proteins and lipids), amino acids, nucleic acids, foreign or pathological extracellular components, intracellular and extracellular pathogens (including bacteria, viruses, fungi and mycoplasma).
  • precursors, monomeric, oligomeric and polymeric forms, and breakdown products ofthe above are also included.
  • the subject data collected may be relevant to a respective condition that is already diagnosed in the subject.
  • the present invention can be utilised to detect previously undiagnosed conditions. In particular, this can be achieved by collecting sufficient parameter values and then comparing these to the predetermined data which is being collected for individuals having a range of conditions. This then allows conditions to be identified before symptoms are necessarily visible.
  • the present invention allows a vet or other medical practitioner to perform an analysis of the subject and in particular their cunent biological condition and determine whether the subject is suffering from any conditions.
  • the system is also useful for diagnosing conditions in situations where the athlete is trying to keep the condition secret, for example, in the case of drug testing to detect banned substances used by the athlete.
  • predetermined data obtained from one or more individuals suffering from a respective condition to allow the condition to be identified, and the sample size will therefore have to be sufficiently large to ensure this occurs. For example, if the chance of an individual from a general population having a specific condition is 1 in 100, it will be necessary to sample at least 100 individuals to ensure at least one individual having the condition is sampled. In fact, it would in this case be typical to sample at least 1000 individuals, to ensure that sufficient individuals having the condition are identified, to allow accurate condition determination.
  • the number of parameters required will depend on the number of conditions to be distinguished. In particular, it will depend on factors such as: • The presence and detection of unknown conditions;
  • the number of parameters employed are at least about 20, preferably at least about 50, more preferably at least about 100, even more preferably at least about 150, even more preferably at least about 200, even more preferably at least about 300, even more preferably at least about 500, even more preferably at least about 1000, even more preferably at least about 1500, even more preferably at least about 2000, even more preferably at least about 4000, even more preferably at least about 6000, even more preferably at least about 8000, and still even more preferably at least about 10000.
  • the effect of a condition on an individual may also vary in accordance with additional phenotypic information relating to a particular characteristic or set of characteristics of the subject, as determined by interaction of the subject's genotype with the environment in which it exists.
  • 'characteristic data' may be selected from age, sex, height, length, weight, ethnicity, race, breed of animal, feeding patterns, exercise patterns, medication supplied, nutritional or growth supplements supplied, nutritional analysis, hair colour, skin colour, eye colour, body composition, fat composition, water retention, obesity, transcriptomic profile, proteomic profile, metabolomic profile, pharmacometabolomic profile, gene allele profile, nucleotide polymo ⁇ hism profile, karyotype profile, pharmacogenetic profiles, blood type, tissue type, endocrine function, immunological function including innate, cellular and humoral immune function, tolerance, allergy, transplant rejection, cancer, hype ⁇ lasia, gastrointestinal function, neurological function, kidney function, heart function, brain function, pancreatic function, bone
  • the phenotypic information may also include demographic information, which can be important for monitoring the spread of a condition globally, as well as to allow analysis to take account of conditions that are limited to predetermined areas. Thus, it is generally preferable to additionally collect characteristic data together with the expression data for the individuals. Moreover, is it contemplated that blood molecules and blood cells serve as a particularly good sunogate marker for conditions existing throughout the body. Because blood, as a biological necessity, must be within a close proximity to every cell in the body, blood molecules are well suited to be used to detect conditions that may be present in one or more cells or tissues of the body.
  • blood molecules and cells continuously and rapidly interact with, monitor, and act to allevt ⁇ te numerous conditions in the body and as part of this process, for example, differentially transcribe and express various mRNA molecules and undergo other phenotypic changes.
  • blood cells are well suited for detecting conditions in the body as well as changes in conditions over time (from for example, year to year, month to month, day to day, hour to hour, minute to minute or second to second) and as well as detecting subtle changes in conditions, for example, changes that indicate the onset of a condition that has not yet risen to a level that is detectable by conventional diagnostic methods or subtle changes resulting from particular medication or relevant to determining the most effective medication at any particular time for a specific subject.
  • the present invention is additionally well suited for veterinary pu ⁇ oses.
  • a tissue biopsy which is a conventional diagnostic method in humans
  • taking a tissue biopsy is particularly arduous because it requires the anaesthetising of the animal and the stabilising of the animal after the procedure so that it may suitably heal.
  • the ability of the present invention to detect conditions in a multitude of tissues without requiring a biopsy is thus advantageous.
  • the individual data can be compared to the diagnostic signatures, allowing a determination of any conditions ofthe individual.
  • individual data can be !5 compared to the diagnostic signatures to diagnose conditions suffered by a individual. This individual data can then be added to the data collected during the clinical trials, allowing the data to be re-mined, thereby allowing the diagnostic signatures to be revised to take the additional data into account.
  • FIG. 10 An example of the generation of diagnostic signatures in a discovery phase will now be described with reference to Figures 6A and 6B, which show a flow chart and data flow respectively.
  • the operators of the base station 1 operate to collect the predetermined data, shown at 50 in Figure 6B, including genotypic and phenotypic information, from a number of individuals.
  • the parameter values that form the predetermined data conespond to expression data and in particular, concentration quantities, abundances, or ratios of respective expression products obtained from an anay or the like will typically be provided from a study of the respective individual, and is preferably provided in a standard format to allow the information to be conectly inte ⁇ reted by the base station 1.
  • the data is collected during clinical trials, by monitoring selected individuals, or any other suitable process, as will be appreciated by persons skilled in the art.
  • individuals being horses or the like
  • it is typical to perform clinical trials to induce conditions within the horses to allow these to be monitored under control conditions.
  • by inducing conditions within sample individuals it is possible to monitor the effect of different stages of the condition on the gene expression data which is collected.
  • diagnostic signatures to be derived for different stages of conditions, as will be described below.
  • this also allows gene expression and other phenotypic information to be collected for sub-clinical diseases, and the like.
  • initial quality control 51 is performed on the collected data to ensure it is suitable for use in determining diagnostic signatures.
  • An initial high level review of gene expression data on an anay includes an assessment of the overall brightness of the anay, any inconsistencies in brightness, dust, scratches or other visible artefacts.
  • an anay specifically designed to genes found in white blood cells when used against white blood cell samples will produce a typical result than can be assessed using the naked eye.
  • An inconsistencies in the way the anay looks may result in the data being excluded.
  • Initial assessment of the quality of clinical data can be performed by a person unskilled in the art of veterinary or medical science. For example, it could include checking consistency of results with previous samples, completeness and values falling within physiological possibilities.
  • the data are stored in respective phenotypic and genotypic databases 52, 53, at step 320.
  • a data model will typically be established to provide structure to the relationship of each individual to its respective genotypic and phenotypic information. It will be appreciated that the nature of the model is not important for the pu ⁇ oses of the general techniques of the invention, although selection of a suitable model can aid with the quality control review outlined above.
  • the model can include required fields, conesponding to essential information, and if these fields are not populated when the data is propagated into the respective database, then this indicates that the data is deficient.
  • step 340 a separate more detailed quality control check is performed separately on the phenotypic and genotypic data at 54, 55, to ensure that the data is of a suitable integrity for performing subsequent analysis.
  • the phenotypic information is reviewed to ensure that required information is provided in the conect form, and in particular demonstrates clinical integrity.
  • the requirements for this information will be predetermined before the study is commenced, and it will therefore be necessary to check whether the resulting information is provided conectly and with a sufficient degree of integrity to allow it to be used in the derivation of a diagnostic signature.
  • one vital piece of information required at this stage is a definitive diagnosis of any conditions suffered by the individual.
  • the individual is a horse having induced gastritis, then an indication of this and the elapsed time period from inducement will be required.
  • the review of the phenotypic data will need to be performed by a skilled individual, and cannot be automated, although it is possible that heuristic based review procedures could be implemented to perform some or all of the quality control review once sufficient knowledge has been derived through review of sufficient samples by the skilled individual.
  • the skilled individual is usually a qualified veterinarian or medical practitioner who is able to assess the likelihood of the phenotypic information being conect.
  • the phenotypic and/or genotypic data is reviewed separately at 55, and again this usually requires manual review by a skilled technician.
  • the form ofthe quality control review will depend on the nature of the data and the manner in which this is collected. Thus, for example, if the phenotypic data is collected using an anay, then the review will generally include examining the anay chip to ensure that it the assay has been performed conectly. This quality control is generally performed on a chip by chip basis to ensure that each chip demonstrates absolute data integrity, and hence the resulting data do not include any faults.
  • This process generally uses a combination of standard checks, such as ensuring control genes have been conectly expressed, and any other developed tests, which may be specific to the respective clinical trial.
  • a group of individuals are selected at step 360, and as shown at 57.
  • the individuals are selected on the basis ofthe piupose of the diagnostic signature.
  • the query is to be used to determine signatures for the condition of gastritis, then it will be typical to use mine data of individuals having gastritis, and selected individuals not having gastritis. It is not possible however to use individuals for whom the presence or absence of gastritis is unknown. Similarly, it may be desirable to determine a signature for male horses with gastritis, in which case, female horses should be excluded from the query used to mine the database.
  • additional quality control 58 is performed to determine if the genotypic data for the individuals can be used in comparative analysis. For example, there may be differences in the relative gene expression profiles arising from the use of different anays, or different tests in the determination of this information. Accordingly, this is usually accounted for by normalising the phenotypic data for the individuals within the group. Phenotypic data for groups to be compared can be statistically analysed to determine "outlier" data that may need to be excluded from the comparison. Such statistical analyses include Box and Whisker and Kernel Density plots. It will be appreciated that if the phenotypic data for any individual fails the quality control test, then the individual will be excluded from the subsequent determination ofthe diagnostic signature.
  • any individuals that are unsuitable for use in the respective data mining query are excluded from the subsequent analysis. It will be appreciated that in order to be useful in subsequent data mining, the group will require a minimum number of individuals, which is typically eight or more, in order to allow the data mining to be statistically significant.
  • a data mining procedure is performed to allow one or more diagnostic signatures to be determined at 59.
  • the manner in which the data mining is performed will depend on the respective implementation, as well as other factors, such as the number of members in the group.
  • the system operates by forming parameter vectors for each individual in the group.
  • Each parameter vector is generally formed from a vector containing gene expression values for different genes at respective locations within the parameter vector. These values are refened to as parameter values.
  • the processing system 10 can then operates to consider the relative position of the parameter vectors in an N-dimensional space, where N conesponds to the number of parameters, allowing diagnostic signatures to be derived. A number of options for performing this process are described in more detail below.
  • the processing system 10 will operate at step 370 to produce diagnostic signatures that may be used to characterise the group of individuals identified above. It will be appreciated that there is a multiplicity of ways of defining such diagnostic signatures for example regularised discriminant analysis, Support Vector Machines, recursive partitioning, artificial neural networks, or the like, as will be described in more detail below with respect to data mining.
  • a further quality control step 60 is performed at step 380 to characterise the ability of the diagnostic signatures to predict group membership, by applying the signatures to suitable individuals, such as the individuals in the group, or other individuals known to have a definitive clinical diagnosis. This is performed to ensure that the signature allows conect characterisation and validation to be achieved. This may be performed for example by using k fold cross-validation, and the construction of permutation distributions.
  • signatures may be defined for respective conditions inespective of the phenotypic traits ofthe individuals.
  • signatures may be defined for respective conditions inespective of the phenotypic traits ofthe individuals.
  • all individuals suffering from a condition tend to have similar parameter values, then all the individuals having the condition will be contained in the same group inespective of each individual's phenotypic traits.
  • a respective signature will be defined for each phenotypic group.
  • a signature may be defined for male horses having a respiratory condition, with a separate signature being defined for female horses having the same respiratory condition.
  • At least one signature will be defined conesponding to healthy individuals not having any conditions. It will be appreciated that this can be used in determining if an individual has an unidentified condition, as will be described in more detail below. This can also be used to identify sub-clinical diseases, a predisposition for developing a condition or conditions that are not previously apparent through existing diagnostic techniques.
  • the user determines gene expression data in the form of parameter values, and other phenotypic information relating to the subject.
  • the end station 3 is used to generate subject data in accordance with the determined parameter values and phenotypic information.
  • the user transfers the subject data to the processing system 10 as described above.
  • the processing system 10 extracts the parameter values and the phenotypic data from the subject data, and in this example, uses the parameter values to generate a parameter vector at step 440.
  • the processing system obtains one or more of the signatures from the database at step 450.
  • the signatures may be selected in accordance with the phenotypic information, such that the subject parameter vector is only compared to signatures having suitable phenotypic traits.
  • the subject is a male horse, then it may be pointless comparing the subject parameter vector to a signature representing a group of female horses having a respiratory disease.
  • a signature conesponds to a group of individuals having a range of phenotypic traits then this signature will be used to predict group membership using the subject parameter vector, at step 460. It will be appreciated that there is a multiplicity of ways of predicting group membership from the subject parameter vector, just as there is a multiplicity of ways of constructing group signatures, as will be appreciated by persons skilled in the art.
  • the processing system 10 operates to determine the uncertainties in group prediction using the subject parameter vector and signatures in the N dimensional vector space. These uncertainties are expressed as probabilities that the test subject has a condition previously characterised by membership of one ofthe groups in the predetermined data.
  • uncertainties may be based on some measure of distance between the subject parameter vector and a group signature, or by a Bayes rule applied to a set of discriminant functions.
  • the signatures may be based on specific values such that they represent a single point in the N dimension vector space.
  • the signatures may conespond to ranges such that each signature defines a range of parameter values for which the subject would have the respective condition.
  • the parameter vector is approximately equidistant to two or more signatures, this may indicate that there is a chance that the individual either has a previously undetermined condition, or alternatively is suffering for example from a combination ofthe two conditions. It will be appreciated that signatures may be generated for common combinations of conditions, as well as single conditions.
  • the presence of the signature for healthy individuals allows a healthy subject to be determined. If the subject parameter vector is significantly separated from this signature, this will indicate that the subject is generally unhealthy, and this allows previously unidentified conditions to be determined, for example, if the subject parameter vector is not near any ofthe other signatures.
  • the magnitude of the parameter values will allow the severity of conditions to be determined.
  • the greater a difference in magnitude between the parameter values for a healthy subject compared to a subject suffering from a condition will generally indicate a greater severity ofthe respective condition.
  • groups may be defined for different severity of condition.
  • a first group may be defined for the initial stages of a condition that is treatable, whilst a second group is defined for the same condition when it has progressed beyond the initial stages and is no longer treatable.
  • a direct comparison of the subject parameter values can be made with the predetermined data for other individuals suffering from the same condition, can also be used to allow the severity ofthe conditions to be determined.
  • the processing system 10 inte ⁇ rets the separation of the parameter vector from the signatures and uses this to determine any conditions displayed by the subject. An indication of this is then transfened to the end station 3 at step 490.
  • the received subject data represents an additional source of data which may be used in re-tuning the diagnostic signatures.
  • a large quantity of data is received from external sources, and this allows the size of the groups used in determining the signatures to be increased, allowing a statistically more significant signature to be determined.
  • the received subject data represented at 62 must be reviewed for quality control pu ⁇ oses at 63, as set out in step 500, before being published to the biowarehouse at 56, as set out in step 510.
  • the base station 1 can implement a dual processing system set up as shown for example in Figure 8.
  • the base station 1 includes a processing system 12 coupled to the LANs 4 and the Internet 2 via a first firewall 13, and a second database 14 coupled to the first processing system 12 via a second firewall 15.
  • processing systems 12, 14 will be substantially similar to the processing system 10 described above, and will not therefore be described in further detail.
  • communication with the end stations 3, including the receipt of the subject data, and provision of results, is achieved using the processing system 12.
  • the processing system 12 In the case of receiving of subject data, or any other requests, the received submission is analysed by the processing system 12, and any relevant information extracted. The extracted information, which is determined by the processing system 12 to be a genuine submission, can then be transfened to the processing system 14.
  • the processing system 12 can receive the subject data, and operate to extract the parameter values therefrom.
  • the processing system 12 then generates the parameter vector, or the like, which is transfened to the processing system 14 for subsequent comparison with the predetermined data.
  • the processing system 14 can determine those conditions suffered by the subject and then transfer an indication of this back to the processing system 12 through the firewall 15. The processing system 12 can then transfer an indication of this indication to the end station 3.
  • a further alternative to the present invention is for the comparison to be performed on the basis of parameter ranges defined for different conditions.
  • each condition may have associated therewith a sequence of parameter value ranges determined based on ranges of parameter values for individuals diagnosed with the respective condition.
  • the parameter value ranges can then indicate for a respective condition the parameter values that can be expected, allowing the determined parameter values to be compared to the respective range for each condition to determine if the parameter values provided fall within a respective range.
  • a respective parameter range can be determined for each condition, with the parameter values determined for a subject being compared to each range, to determine those ranges within which the subject data falls.
  • An indication of the likelihood of the subject having a respective condition can then be determined statistically based on the number of individuals having the respective condition.
  • the processing system 10 would operate to compare the determined parameter values against parameter values of horses suffering from the condition and horses not suffering from the condition. In this situation, the manner in which the collection of the parameter values is performed may very.
  • the parameter values may include for example expression data collected using an anay, for example. If the anays are to collect values conesponding to 5000 parameters it is typical for an anay to be provided with 5000 features thereon with each feature conesponding to a respective parameter. Alternatively, 10,000 features may be provided with two features conesponding to each parameter. In any event, a person skilled in the art will appreciate that a number of variations on this are possible.
  • a typical sequence of events may be for a user to submit a general test having a large number of parameters similar to that described above which allows respective conditions to be first identified. Once a condition has been identified, the user can then purchase specifically designed anay plates adapted to monitor the specific condition. Measurements of the parameter values relevant to the condition can then be made far more accurately allowing the progress of the condition to be monitored in detail. This can allow users to be provided with information concerning whether conditions are improving or not.
  • the subject data for a respective subject is compared to predetermined data for a number of different individuals.
  • longitudinal analysis can also be performed.
  • the subject data is compared to subject data previously collected for the same subject.
  • this can allow the progression of the disease over a time period to be monitored and displayed to the user.
  • the most recently obtained subject data is compared to earlier subject data for the same subject (and optionally predetermined ata), to determine disease progression.
  • levels of respective parameter values can be used to indicate the severity of the disease. This can be achieved by comparing the subject data to predetermined data in the manner described above, or alternatively using other techniques. As the parameter values vary over time, this can be used to provide an indication of whether the condition is improving or worsening. This is turn can be used to monitor the effectiveness of any treatment given to the subject.
  • the obvious solution to this problem is to reduce training for a predetermined time period, or resting the horse.
  • trainers will generally not want to reduce the training too much as the horse will become unfit.
  • worse problems can arise if the trainer resumes training too early.
  • the trainer can submit subject data on a periodic basis such as every week allowing the fitness of the horse to be determined on a weekly basis. An indication of this can then be transfened back to the user allowing the trainer to determine when training ofthe horse should resume, or how hard training should be.
  • Gene anays also called GeneChip® anays
  • GeneChip® anays are perhaps the most common anay technology in the art but the present invention also contemplates the use of protein-capture anays and anays capable of detecting other biological material such as carbohydrates, lipids, steroids, amino acids or a combination ofthe foregoing, as discussed above.
  • a horse anay and a blood sample are needed.
  • the anay has DNA dotted onto its surface (DNA of the genes in horse blood cells).
  • the DNA on the anay consists of one strand of the double-stranded DNA molecule - the other strand is provided by the blood sample and is labelled with a dye.
  • An anay reader can determine the amount of mRNA in a sample (gene turned “on” or “off) by determining the amount of dyelabelled DNA that hybridises to an anay.
  • the reader produces a value compared to a reference for every single gene on the anay.
  • the 5,000 to 10,000 values can then be compared to the inventors' database (also refened to herein as the "Genetraks database”). Genes turned “on” or “off, individually or in patterns, can then be identified and conelated to the specific conditions of a racehorse.
  • MnSOD manganese superoxide dismutase
  • LFNg, IL-4 may also be turned “off
  • the genes for Grola, IL-8, TNF and MIF may be turned “on”.
  • This pattern of "gene expression” can be conelated to a specific condition, such as respiratory inflammation caused by a virus. Patterns of gene expression change as a horse succumbs to or recovers from a viral infection. As the technology and database develops, predictions on the stage of infection or influence of treatments can be made.
  • secure anays utilise a randomisation of the layout of the anay to avoid the problems of reverse engineering or the like.
  • the operators of the base station 1 will determine a number of features to be included on the anay, and provide an indication of these features to the anay supplier at step 610.
  • the anay supplier will operate to generate a prefened anay layout using a processing system. This is performed in accordance with normal operating procedures.
  • the anay suppliers will generally utilise applications software to determine a prefened anay layout which optimises the anay build process.
  • the layout is generally organised so that creation ofthe anay is simplified.
  • the anay supplier will operate to generate a number of randomised anay layouts.
  • the randomised anay layouts have one or more of the features positioned in an alternative location when compared to the prefened anay layout.
  • the anay supplier will generally operate to move or swap the locations of one or more of the features on the anay. In order to swap features, it must be ensured that the features are of different types.
  • the anay supplier will also operate to generate a conesponding number of codes.
  • the code can be defined by one or more detectable and/or quantifiable attributes such as alphanumeric characters, the shape, or surface deformation(s) of the anay, bar codes or an electromagnetic radiation-related attribute including atomic or molecular fluorescence emission, luminescence, phosphorescence, infra-red radiation, electromagnetic scattering including light and X-ray scattering, light transmittance, light absorbance and electrical impedance.
  • serial numbers are used and in particular, a respective serial number is provided for each randomised anay layout that is generated.
  • the anay supplier will operate to generate anays in accordance with the randomised anay layouts and the serial numbers.
  • each generated anay will have features positioned thereon in accordance with a respective one of the randomised layouts, together with an indication ofthe conesponding serial number.
  • anay supplier prefferably produces the anays in batches with up to 1 ,000 anays in each batch, with each batch being created in accordance with a difference randomised layout.
  • the randomised anays are transfened to the users for subsequent use in generating the subject data, whilst at step 570 the serial numbers, together with conesponding layouts are transfened to the base station 1.
  • step 700 the user will obtain a biological sample from the subject and then perform an assay process using the anay at step 710.
  • biological samples include tissue cultured cells, e.g., primary cultures, explants, and transformed cells; cellular extracts, e.g., from cultured cells or tissue, whole cell extracts, cytoplasmic extracts, nuclear extracts; blood, etc.
  • tissue cultured cells e.g., primary cultures, explants, and transformed cells
  • cellular extracts e.g., from cultured cells or tissue, whole cell extracts, cytoplasmic extracts, nuclear extracts
  • blood etc.
  • Biological samples also include sections of tissues such as biopsy and autopsy samples, and frozen sections taken for histological pu ⁇ oses.
  • the biological sample is selected from tissue samples (e.g., organ biopsy), cellular samples (e.g., cardiac cells, muscle cells, epithelial cells, endothelial cells, kidney cells, prostate cells, blood cells, lung cells, brain cells, adipose cells, tumour cells, pancreatic cells, ocular cells, mammary cells etc) and fluid samples (e.g., urine, sweat, saliva, mucus secretion, respiratory fluid, synovt ⁇ l fluid, plueral fluid, pericardial fluid, faeces, nasal fluid, ocular fluid, intracellular fluid, intercellular fluid or a circulatory fluid such as whole blood, serum, plasma, lymph, cerebrospinal fluid, or combinations of any of these, or fractions thereof) obtained from the subject.
  • tissue samples e.g., organ biopsy
  • cellular samples e.g., cardiac cells, muscle cells, epithelial cells, endothelial cells, kidney cells, prostate cells, blood cells, lung cells, brain cells, ad
  • the biological sample comprises blood or fraction thereof (e.g., blood cells such as mature, immature and developing leukocytes, lymphocytes, polymo ⁇ honuclear leukocytes, neutrophils, monocytes, reticulocytes, basophils, coelomocytes, haemocytes, eosinophils, megakaryocytes, macrophages, dendritic cells natural killer cells, especially white blood cells including peripheral blood mononuclear cells).
  • blood cells such as mature, immature and developing leukocytes, lymphocytes, polymo ⁇ honuclear leukocytes, neutrophils, monocytes, reticulocytes, basophils, coelomocytes, haemocytes, eosinophils, megakaryocytes, macrophages, dendritic cells natural killer cells, especially white blood cells including peripheral blood mononuclear cells).
  • a sample such as, for example, a nucleic acid extract or protein extract is isolated from, or derived from, a particular source.
  • the extract may be isolated directly from a tissue or a biological fluid acquired from a subject.
  • the user uses the end station 3 to encode the values obtained from the anay as subject data, together with a serial number indication.
  • the subject data is then transfened to the base station 1, in the manner described above, at step 730.
  • the processing system 10 operates to determine the serial number from the subject data at step 740.
  • the serial number is then used to access the respective anay layout stored in the database 11 at step 750.
  • the anay layout will then be used by the processing system to inte ⁇ ret the subject data, and in particular, to determine the respective feature to which each value conesponds. This allows the processing system 10 to hence determine the parameter values for the respective subject data.
  • serial number may also be used to check the user is an authorised user.
  • each user is provided with anays having a respective serial number (a range of serial numbers)
  • having the anay supplier provide an indication of the user and the serial number(s) to the operator, this allows the operator to verify the identity ofthe user. This provides an audit trail for the anays.
  • a further way in which the present invention may be utilised is to provide feedback on the accuracy of provided results.
  • the base station 1 is used to provide an indication of one or suspected conditions in a subject, the user can be requested to provide an indication whether the diagnosis provided by the base station 1 is conect. This may form a requirement, such that a user will only be provided with services by the base station if they agree to this term.
  • the conectness of the assessment by the base station 1 can usually be determined by either treating the subject and determining if the treatment is successful, or by monitoring the development of the condition over a predetermined time period. Once it has been determined that the diagnosis is conect or inconect, an indication of this can be transfened to the base station 1. At this point, the respective subject data collected for the respective subject can be saved as predetermined data in the database 11, with the confirmation of the condition being used as the indication ofthe condition in the predetermined data.
  • the processing system 10 is typically coupled to a sample database that is used to store the subject data obtained from each subject. Once confirmation of the conditions is received the subject data and the condition indication is transfened to the predetermined data stored in the database 11.
  • any individual may use the system. Initially at least however, it is necessary for the user to be able to generate the subject data. In the case in which anays are used, for example, this requires the user to first collect biological material, such as blood, and then analyse the material using the anay. This is generally difficult and requires skilled operators using existing technology. Accordingly, the user may have to be a skilled technician. However, it is envisaged that collection techniques will become simpler, allowing the process to be implemented by any user.
  • the users could include:
  • Event organisers • Pathology labs (that would typically perform the work on behalf of an individual, such as a horse owner).
  • the indication of any conditions suffered by the user together with information concerning the ability of the subject to compete in events, or the like, is provided in the form of a report.
  • the content of the report may need to be tailored depending on the type of user.
  • a trainer will not be interested in knowing about parameter values for their horse, but will rather want to know what conditions the horse has, and the severity.
  • the user is a skilled medical practitioner, then there may be some benefit in having more detailed information provided thereon.
  • the processing system 10 can be adapted to generate tailored reports in accordance with report templates stored in the database 11, or the memory 21.
  • the processing system 10 will determine the type of user, and then access a respective report template.
  • the report template will specify the type of information to be provided to the user, allowing the processing system 10 to populate the report in accordance with the results of the above described analysis.
  • the processing system 10 can access a user report template, which will include a number of fields.
  • the processing system will determine from the field the information required, and populate the fields accordingly. This may require some additional processing to place the information in the required form.
  • the information will also be directed to a level the user can understand, and will therefore typically avoid the use of technical terms (such as medical terms) for non-technical users.
  • the processing system may be adapted to determine the condition and severity. This is then used to access a look-up table, which indicates how serious the condition is to the subject.
  • the LUT may indicate that the condition is serious and medical condition should be obtained.
  • the report may therefore indicate merely that the subject has a condition and medical attention should be obtained.
  • the advice may depend on phenotypic data. Thus, a young horse may be more or less likely to require medical treatment for a given condition that an older horse.
  • the processing system may be adapted to indicate not only the condition and severity, but also provide an indication of various important parameter values (such as red or white blood cell counts), to allow the medical practitioner to determine what action to take.
  • the information displayed may depend not only on the user, but also the respective condition. Furthermore, the information could be displayed graphically or as numerical or textual information.
  • the rules for the determination of the level of severity of the condition or the like must be established to allow the LUT to be produced. This is generally achieved through a heuristic rules based approach, which is achieved by having the report generation initially performed by an expert, such as a veterinarian, or the like. As the reports are completed, the knowledge gained during this procedure is captured and stored in the LUT, thereby allowing the subsequent reporting to be performed in an automated manner.
  • the processing system 10 can be adapted to provide other advice. This can include for example, recommendations for changes in feeding habits, or the like. In general medical advice would not be given due to the issue of liability. However, it will be appreciated that the operator of the base station 1 could provide a medically trained individual to provide medical advice if required.
  • the reports may also be generated utilising other systems.
  • An example of an alternative system is the Pacific Knowledge Systems "Labwizard" LIS Inte ⁇ retive Report Toolkit, which utilises RippleDown technology to provide knowledge capture and subsequent automated report generation.
  • a range of different architectures may be implemented in addition to those described above. Whilst these will not be described in detail, it will be appreciated that any form of architecture suitable for implementing the invention may be used. However, one beneficial technique is the use of distributed architectures.
  • a number of base stations 1 may be provided at respective geographical locations. This can increase the efficiency of the system by reducing data bandwidth costs and requirements, as well as ensuring that if one base station becomes congested or a fault occurs, other base stations 1 could take over. This also allows load sharing or the like, to ensure access to the system is available at all times.
  • each database 11 contains the same information and signatures such that the use of different ones of the base stations 1 would be transparent to the user.
  • the end stations 3 can be hand-held devices, such as PDAs, mobile phones, or the like, which are capable of transferring the subject data to the base station via a network such as the Internet 4, and receiving the reports.
  • the end station 3 is used in conjunction with, or includes, a device for determining the genotypic data from a blood, or other appropriate sample, this allows users of the system to take a sample from a subject in situ, determine the subject data and transfer this directly to the base station. It will be appreciated that as the processes at the base station can be substantially automated, this could be used to allow at least a preliminary diagnosis to be returned to the user via the end station 3 in a matter of minutes.
  • the subject data may be selected from any expression product ofthe genome or characteristic or set of characteristics of the subject whose levels or abundance may vary within the subject or between two or more different subjects depending on their status.
  • the data include, but are not restricted to, biological, physiological and pathological data of the subject.
  • biological data include, transcriptomic profiles, proteomic profiles, metabolomic profiles, pharmacometabolomic profiles, gene allele profiles, nucleotide polymo ⁇ hism profiles, karyotype profiles, pharmacogenetic profiles, enzyme function, receptor function, and the like.
  • Physiological data may be selected from age, sex, height, length, weight, ethnicity, race, breed of animal, feeding patterns, exercise patterns, medication supplied, nutritional or growth supplements supplied, hair, skin and eye colour, fat composition, obesity, blood type, tissue type, endocrine function, immunological function, gastrointestinal function, neurological function, kidney function, heart function, brain function, pancreatic function, bone function, joint function, prosthesis, tissue reconstruction, surgery, pain, mental function, psychiatric disorder, mood disorder and the like.
  • pathological data include infectious disease including viral infection, bacterial infection, mycobacterium infection, parasitic infection, prion function, cancer, transplant rejection, inflammatory diseases such arthritis and fibrosis, toxicological profiles, substance abuse including drug dependency and the like.
  • the system uses a self learning classification system, in which diagnosis is made using a historical database of test results (the predetermined data), which is updated as each test sample (subject data) is recorded.
  • the historical database is typically maintained on a server.
  • classification is based on the parameters estimated for discrimination or regression, using the genes remaining after the algorithm has discarded un-informative genes.
  • Clinical application ofthe system can be used to diagnose a subject such as an animal with an unknown clinical or performance state. That is, the animal may or may not have some disease, or may or may not be race-ready.
  • a metabolic profile is measured for the animal subject.
  • the metabolic profile is comprised of expression signatures measured on an oligonucleotide chip.
  • the metabolic profile is compared with a set of pre-computed diagnostic signatures (templates), and together these are used to predict the health status of the subject.
  • prediction will include probabilistic estimates of uncertainty, and be accompanied by a list of possible differential diagnoses.
  • Diagnostic signatures are computed by data mining a historical database, which contains metabolic profile data on subject animals (predetermined data), and associated clinical information on subject health and performance status. These historical data use the same metabolic profile measurement technique as is used in clinical application. In a prefened example, these metabolic profiles are comprised of expression signatures measured on an oligonucleotide chip.
  • Data mining may be performed using a number of techniques including:
  • the signature structure for status determination depends on the details of the data mining algorithm used to derive the signature.
  • the signature is derived using regularised discriminant analysis.
  • the signature it used to allocate a new sample to one of a set of predetermined groups.
  • the signature takes the form of a coefficient for each gene, and for each group. For example, with 3000 genes and 3 groups the signature would involve 9,000 numbers - one coefficient for each gene and each group.
  • the signature is used to calculate a score for each group, and the sample is allocated to the group for which it has the highest score.
  • the signature has been developed using Bayesian stochastic variable elimination, it will have a similar structure - but will have coefficients for a small subset ofthe genes (implicitly other genes have zero coefficients). Different genes may have non zero coefficients in different groups.
  • the signature has been developed using recursive partitioning.
  • the signature is represented as a decision tree, in which each node is defined by a gene, a threshold and a relation.
  • a node might be represented by Gene: 3171 threshold 3.612 Relation "Greater Than"
  • Each node points either to a child node, or to a predicted status class or status value.
  • Diagnostic signatures are typically applied to a much more heterogeneous source of samples, than the sample base from which they were developed. This inevitably raises issues of robustness - a diagnostic applied to samples with different demographic characteristics from the training set may break down. This issue is controlled in two ways. Firstly, before any diagnostic signature is used in application, it must first be validated with a new source of samples. These samples must be more heterogeneous than the training set, and will be typically be stratified by known sources of variation (sex, age, drug treatment etc). Secondly, all diagnostic signatures must include robustness statistics, which measure the likely applicability ofthe signature to the given sample.
  • the precise form of the robustness statistic depends on the nature of the data mining procedure used, and the form of the diagnostic signature. For any diagnostic signature involving status classes it will usually consist of information about the distribution of multivariate distances to the nearest class. The status determined for a sample which is extreme on the distribution of distances to all classes will be considered suspect.
  • Diagnostic signatures are combined with test subject metabolic profiles to produce a diagnosis.
  • prediction is based on a Bayes classification rule, and estimates of uncertainty are based on posterior probabilities of class membership.
  • classification is based on the support vectors, and uncertainties are estimated from distance of the test profile to the decision boundary.
  • classification is based on the estimated decision tree, or averaged over multiple decision trees.
  • Re-testing may occur at a time when an earlier unknown clinical condition has become known. For the example given above, it may be the case that at a time of re-testing for race-readiness it is known that during the initial test the animal did have disease A. Provision is made to allow updates to and modification of the clinical data obtained for each test subject, as diagnosis is confirmed or modified.
  • data mining is repeated at regular intervals as the historic database grows. Test records added to the historic database will frequently contain only partial clinical or performance data. For any given clinical or performance factor, data will be filtered to remove subjects for which the particular characteristic is unrecorded. The data mining algorithm will then be used to construct new diagnostic signatures for the given clinical or performance characteristic. The procedure of filtering and mining is repeated for each characteristic of interest. In this way, the sample sizes used to obtain diagnostic signatures are constantly increasing, and predictive performance improves. The system becomes self- learning.
  • Historical database must be initialised, and preliminary data mining conducted before clinical application of the diagnostic system.
  • the database will be initialised using a training set comprising data from animals with known metabolic conditions. Appropriate experimental design is vital to the construction of the initial training data set.
  • Empirical predictors derived using data mining are susceptible to artefactual relationships, involving nuisance factors - such as regional differences in diet and husbandry. For this reason, the training data set must be obtained from a multicentre trial, and stratified appropriately.
  • Figure 11 shows the flow of information and processing in the self-learning diagnostic system.
  • Figure 11 shows the elements of Figure 6B in a development domain 70, highlighting that these portions of processing only need to be performed during initial set-up and re-tuning of the diagnostic signatures.
  • An end user domain is shown at 71, highlighting that the user must obtain the phenotypic and phenotypic data at 62, with reports being returned to the user at 64.
  • the processing to determine a diagnosis by comparison of the diagnostic signatures stored in the signature database 61, to the received genotypic data is performed by the base station as shown. Specific Examples
  • FIG 12 is a flow diagram illustrating one specific example of an information technology architecture and data flow as part of a remote delivery service process.
  • External users are shown as Class One 505, Class Two 510, and Class Three 515 that are interested in obtaining information regarding their respective gene expression results when using the proprietary gene expression analysis service.
  • These users may include, for example, pathology laboratories, drug laboratories, pharmaceutical companies, collaborators, medical and/or veterinary practitioners or similar, owners of performance animals, athletes and/or athletic trainers.
  • Each of these users 505, 510, 515 will be interested in different aspects of the gene expression results and will therefore interact in a different fashion, but all will interact remotely via an user interface module 520.
  • Interface 520 may, for example, be a browser-based interface as found on most computers and delivered via web pages on the world-wide-web (the Internet).
  • the initial interaction to the user interface module 520 will be via a controlled firewall and web server.
  • the firewall will be the first line of defence against unwanted and unauthorised intrusion. Port blocking techniques and protocol restrictions will be imposed at the firewall.
  • the firewall and web server environment will be fully maintained with the latest security patches to ensure cunency of protection against hackers and intrusion.
  • Each user will establish a secure connection 525 (user authentication and establish secure web connection) to ensure confidential identification in both directions for the user and service delivery provider.
  • the security is managed by a customer access management system 565 that controls access of users 505, 510, 515.
  • Class One and Two Users 505, 510 are shown sending information as a query 530 and 531, that includes a question regarding health or condition status of an animal (inte ⁇ retation request), sample details, gene expression results, clinical information, pathology laboratory results, gene identities, gene sequences, collaborative requests, etc.
  • Class Three Users 515 are shown sending information 535 as a query including intenogation requests regarding a health status of individual animals/athletes or groups of individual animals/athletes.
  • Queries 530 and 531 may contain formatted gene expression and clinical information as a request, one such embodiment would employ the use of digitally signed XML documents to ensure authenticity and content of the request. Other authentication, authorisation and encryption and key management standards will be applied as they become available.
  • queries are temporarily stored in a transaction staging module 540 and queries 532 and 533 will be drawn into respective pathology service module 550 and collaborative services modules 555 only on request from the service module.
  • This process may employ a second firewall and may be configured to further restrict network traffic. This firewall will only permit internal requests from 550 555 560 to pass through the firewall. All other network traffic will be blocked as will unnecessary ports and protocols.
  • Respective pathology services module 550 and collaborative services module 555 include special software capable of servicing requirements of the different types of users 505, 510. Pathology services module 550 and collaborative services module 555 are shown in communication with each other.
  • Core central databases 590 store genetic information (genetic database) 591, sample and gene expression information (sample database) 593, and conelative data (conelative database & heuristics) 595.
  • the genetic information stored in genetic database 591 is used to create gene expression devices
  • Design details 592 are also stored in the sample database which contains gene location information on the device and are used to inte ⁇ ret results from such a device.
  • the genetic database 591 is also used to provide gene identification and gene sequence information to collaborative services module 555 and collaborative services 575 (e.g., inte ⁇ retations, gene lists and gene sequences) to Class Two users 510.
  • Information in the sample database 593 can be clustered together based on similarity using computer algorithms such as K-means, principal component analysis (PCA) and self-organising maps, commonly available in packages provided by companies such as spotfire, silicon genetics, and at higher levels of inte ⁇ retation, Omniviz. These clusters amount to identified conelations 594 between gene expression and sample information and are stored in various formats, in the conelative database 595.
  • An heuristic or neural network or rule-based computer software system pre-programmed with rules or training sets takes queries 534 (e.g., expression details and sample details), stores these details in the sample database 593 and then compares the query pattern to those already stored in the conelative database 595 and produces standardized reports and conelation details 570 (according to the rules of the heuristic program).
  • Conelation details are converted to useful information such as gene expression conelation results, for example a fully formatted report to include inte ⁇ retations 571 and inte ⁇ retations 575 (and optionally genes lists and gene sequences) and are securely delivered back to the requestor via the internet to Class One and Two users 505, 510.
  • Financials database 597 keeps track of details including for example accounting, purchasing and payroll details.
  • Sales and marketing database 596 keeps track of items such as sales and marketing details, client details, customer relations management and stock management.
  • Internal data warehouse 560 receives information from databases 590, 596 and 597. This internal data warehouse 560 will only be accessed by authorized internal users conducting legitimate business activities.
  • a secure (intemal) data warehouse 545 services the needs of Class Three users 515.
  • Specific (and confidential) information 580 is extracted from internal data warehouse 560 that is then stored in secure customer data warehouse 545 where authorized users 515 can query 535 (for example as intenogation requests), specific and confidential information such as clinical history information, pathology results and inte ⁇ retations.
  • This information is presented in a secure user-friendly and or visual format 585 in relation to individuals or groups of athletes or performance animals, and/or time series of results.
  • Figure 13 is a flow diagram of one specific example showing steps for assessing a biological sample for diagnosing or assessing a condition of an animal.
  • a user collects a biological sample 1010, for example a blood sample from a horse.
  • biological parameters including biochemical and haematological parameters, clinical data (including blood profile tests) and appraisal information are collected and recorded in a standard format 1015, for example by filling in a standard form.
  • the biological sample 1010 is processed so that nucleic acids contained therein are detectable when hybridised with a complementary (or mismatch-complementary) nucleic acid located on an anay 1020.
  • the nucleic acid may be detectable by a label inco ⁇ orated therein, for example a target nucleic acid.
  • the anay 1020 is a device such as a microanay which is read 1030 by standard methods and equipment common to the art to identify and measure relative abundance or absolute abundance of those nucleic acids from the biological sample which have bound to probe nucleic acids immobilised as part of anay 1020 (inclusion of a reference sample run in parallel allows for the calculation of the relative abundance of target nucleic acids, whereas a method developed by the company Affymetrix, Inc (the "Affymetrix system”) as described at their website “affymetrix.com” relies on internal references).
  • Anay 1020 may comprise a large number of probe nucleic acids, e.g., 1000's of nucleic acids.
  • a large number of probe nucleic acids may be particularly useful if an animal is not presenting with any visible signs of poor condition, e.g., overt disease.
  • labelled target nucleic acids of a sample are first applied to an anay comprising a "full-screen" of target nucleic acids (e.g., 1,000's of nucleic acid probes that represent most or many of the nucleic acids expressed in a sample).
  • the labelled nucleic acid targets may be applied to a sub-set ofthe full-screen, e.g., a selected panel of nucleic acid targets that may be associated with a particular condition, for example, respiratory diseases, drug consumption, etc.
  • Data from the read microanay 1030 and clinical data and appraisal information 1015 is formatted 1040 and transmitted via a communications network 1050, for example the Internet, to a remote diagnostic server 1060. It will be appreciated that transmission of the formatted data to the remote diagnostic server 1060 requires less bandwidth than transmitting database information to the user and less skill and time on behalf of the user.
  • the transmitted data is analysed 1070, for example by comparison to a database of previously collected information in relation to clinical information and expression levels (relative abundance) of the nucleic acids applied to the microanay 1020. Also, experts, for example, bioinformaticists, biologists, doctors, pathologists, and the like may analyse the data to provide additional useful information. The analysis enables conelation to a condition 80.
  • the expression levels (relative or absolute abundance) of the nucleic acid probes applied to the microanay 1020 are conelated with previously collected data relating to known conditions stored in a database 1080 and compiled 1090.
  • the database may also store information in relation to an identity of known nucleic acids, nucleotide sequence on the anay and/or location of nucleic acids on the anay, its biological function and links to other databases.
  • Results in relation to health and performance condition are transmitted via a communications network 1050 and may also be provided to the user as a report 1095, for example a hardcopy printout or visually on a computer monitor.
  • the described system has advantages of requiring low bandwidth for transmitting sample data and final report between user and remote database/processor, data processing is centralised and more efficient, expert analysis ofthe sample data is centralised, the computer software may inco ⁇ orate heuristic methods thereby minimising human interaction, the possibility of user and inte ⁇ retation bias is avoided, and information stored in the commercially valuable database is under strict control and does not require direct access by an outside user.
  • the steps are described in more detail hereinafter.
  • Figure 14 shows an environment for working the method described in Figure 13.
  • a user 1100 which may be a veterinarian or practitioner, collects a sample 1120 from an animal
  • a blood sample from a horse or athlete for example a blood sample from a horse or athlete.
  • information in relation to a condition of the animal is collected in a standard format 1102.
  • the sample is collected, nucleic acids isolated therefrom, prepared and applied to an anay 1120 and the anay is read by an anay reader 1130.
  • Data from the anay reader 1130 and clinical appraisal and condition information 1102 is entered into a computer and formatted by a processor
  • FIG. 1140 which may be for example, a laptop computer with a modem.
  • the formatted data is transmitted via a communications network 1150, for example the Internet.
  • a remote diagnostic server 1160 receives the transmitted data and the data is compared with a database(s) 1161 which stores data, for example, data in relation to nucleic acid location on an anay, expression level (relative abundance or absolute abundance) of a nucleic acid hybridised with a conesponding nucleic acid on an anay, and data conelating nucleic acid expression level and performance, health, or condition of an animal.
  • Figure 15 is a flow diagram illustrating steps for preparing an anay.
  • a biological sample 1210 is collected from an animal.
  • Biological sample 1210 may comprise for example, a blood sample (preferably white blood cells isolated therefrom), urine sample or tissue sample (including fetal tissues and tissues in various stages of development).
  • a specific aim of collecting the biological sample is to isolate and sequence as many relevant genes from the sample for use on an anay. Thousands of nucleic acids may be isolated that may form a large number of probes for a broad screening of an animal's genetic make-up or gene expression pattern.
  • Nucleic acids are isolated from the biological sample.
  • the sample may be used to prepare genomic DNA or tissue specific mRNA 1223.
  • RNA is isolated from the biological sample 1210 and a cDNA library 1220 is prepared from the isolated RNA.
  • Plasmids 1221 comprising cDNA inserts from library 1220 may be sequenced 1222 from either or both 5' and/or 3' end ofthe nucleic acid. Preferably, sequencing is from the 3' end. Sequences may comprise Expressed Sequence Tags (EST). If an isolated nucleic acid does not encode a full-length gene (e.g., an EST), a partial nucleic acid may be used as a probe to isolate a full-length nucleic acid.
  • EST Expressed Sequence Tags
  • EST sequence information may be compared directly with a sequence database 1230, for example GenBank, and a search for related or identical sequences performed.
  • Putative gene identification and function 1231 may be determined from a search, for example a BLAST search performed in step 1230.
  • a computer may be programmed to enable the normalisation and standardisation of the relative abundance data of mRNAs in a sample.
  • Gene-specific oligonucleotides 1232 may be synthesised using information from EST or full- nucleotide sequence 1222 data. Gene-specific oligonucleotides 1232 may be used as amplification primers to amplify (step 1224) a region of a conesponding nucleic acid.
  • the nucleic acid used as template to amplify a region of conesponding nucleic acid may be, for example, isolated plasmid DNA 1221 and/or genomic DNA, cDNA or mRNA (e.g., used with RT-PCR) 1223. The nucleic acid thus prepared can be used directly as the nucleic acids for attaching to an anay 1240.
  • Amplification products 1225 may also be generated using non-gene-specific primers (e.g., oligo-dT, plasmid sequence flanking a nucleic acid of interest).
  • Oligonucleotides conesponding to a gene 1232 may also be used on anay 1240, alternatively the oligonucleotide conesponding to known sequence can be built successively nucleotide by nucleotide on a support using Affymetrix methodology such as that in US patent no. 5,831,070, inco ⁇ orated herein by reference.
  • the step relating to constructing cDNA 1220 and isolating plasmids 1221 comprising the cDNA may be omitted.
  • isolated genomic DNA or tissue specific mRNA 1223 is used as a template to make amplification product 1225 by amplification using gene-specific primers 1232.
  • Amplification product 1225 may be attached to anay 1240.
  • Nucleic acids attached to or built onto anay 1240 preferably represent most, more preferably all, expressed genes in a given tissue from an animal of interest.
  • the anay should contain genes expressed in the cells of blood under various conditions and at various stages of cell differentiation.
  • Figure 16 shows a flow diagram comprising steps for determining gene expression in biological samples comprising both reference target 1305 and sample target 1310.
  • Nucleic acids in particular RNA (total RNA or mRNA) are isolated from biological samples 1305 and 1310, which may be the same sample.
  • cDNA is prepared from the RNA and the cDNA is labelled resulting in labelled targets 1320 and 1325.
  • targets 1320 and 1325 Alternatively, or in addition, cDNA may be used as a template to synthesise labelled antisense RNA for use as targets 1320 and 1325.
  • Reference target 1325 may be provided as a previously prepared labelled target of known concentration. Accordingly, reference target 1325 need not be synthesised in parallel with each sample target. Internal controls for reference target 1325 and sample target 1320 provide a means for normalising and scaling relative probe concentrations.
  • Sample target 1320 and reference target 1325 are hybridised with anay 1330 in step 1340.
  • Array 1330 may, for example, have been prepared by steps shown in Figure 15.
  • the hybridised anay is washed 1345 to remove non-specific hybridisation of targets 1320 and 1325. It will be appreciated that one skilled in the art could select different stringency conditions of wash 1345 as required.
  • Anay 1330 is read in an anay reader 1350 to determine relative abundance of RNA in the original sample, which conelates with expression of the conesponding gene in the biological sample.
  • Figure 17 is a flow diagram illustrating steps for building a database.
  • Biological samples 1410 are collected from animals having specific known condition(s).
  • a statistically relevant number of biological samples 1410 are collected from a variety of normal animals to establish a normal reference range of nucleic acid abundance levels. This should account for natural variation, including that associated with state of fitness, sex, age, season, breed and diurnal changes.
  • Nucleic acids are isolated and labelled 1415 from sample 1410, thereby forming respective target nucleic acids.
  • the labelled target nucleic acids 1415 are applied to anay 1420, which may be prepared as described in Figure 15.
  • the anay is read 1430 and data formatted 1440 into an electronic form, for example a digital signal, suitable for transmission via a communications network 1450.
  • Clinical information from clinical appraisal, in relation to conditions of animals of interest is measured, documented and compiled 1460.
  • the clinical information is preferably collected in a standard format, and for example, variable states such as the level of fitness or body score (fatness) may be assigned given a value or number (for example between 1-10).
  • Specific clinical conditions may be graded (for example between 1-10) and assigned a unique and standard identifier.
  • An example of such a system is cunently used in clinical medicine and veterinary science and termed SNOMED or SNOVET (Standardised Nomenclature of Medicine or Veterinary Science), where a clinical condition can be described using a numerical system. This system has not been used for describing the normal condition or the ability of a performance animal to perform to its best.
  • a numerical grading system could also be used to standardise the collection of such data, for example, time spent on a treadmill is a strong indicator of exercise tolerance, as is blood concentration of oxygen and ability to transport oxygen. Conditions may include disease, response to drugs, training, nutrition and environment.
  • the clinical information 1460 is formatted into electronic form 1440, for example a digital signal, suitable for transmission via a communications network 1450.
  • the process is repeated such that a collection of several anay readouts for particular conditions are made.
  • a standard range for example, a population median of 95%) of values for each of the represented genes and its relative abundance can be calculated. This reference range can then be used as a comparison to test sample results.
  • Nucleic acid expression information from a read anay 1430 for a target sample is conelated with previously measured conditions 1460 to provide information on nucleic acid expression level (abundance or relative abundance) with any previously measured condition.
  • This information is compiled at server 1470 and good data is stored and bad data rejected 1480.
  • the compilation process includes collection of a large enough set of anay readout information for a particular condition so that inferences can be drawn on gene expression profiles and conditions.
  • the compilation 1470 may also include use of sophisticated pattern recognition and organisational software and algorithms (examples common to the art include algorithms such as K means, Anova and Mann Whitney, Self Organising Maps, principal component analysis, hierarchical clustering - any one of which is available as part of proprietary software packages) such that expression patterns that differ to normal or expected condition can be identified.
  • the compilation 1470 will preferably include sophisticated methods of supervised classification such as regularised discriminant analysis, diagonal discriminant analysis, support vector machines, or recursive partitioning - any one of which is readily conducted using proprietary software packages.
  • Concunently, comprehensive clinical information 1460 for animals may be collected and biological samples 1410 tested on anays so that conelations can be made between any clinical observation and anay data. In this manner a database is created comprising data on nucleic acid expression which may include data conelating any desired condition, for example normal and specific abnormal condition(s), with nucleic acid expression.
  • the stored data 1480 may be accessed using specific programs and algorithms 1490.
  • a biological sample comprising nucleic acids for example total RNA and mRNA
  • the biological sample may include cells ofthe immune system at various stages of development, differentiation and activity.
  • the biological sample in most instances would be whole blood collected from a vein of a performance animal.
  • the biological sample may include a fluid and/or tissue, for example sputum, urine, tissue biopsies, bronchial or nasal lavages, joint fluid, peritoneal fluid or thoracic fluid which, in part, comprises cells of the immune system that have infiltrated such tissues or fluids.
  • Cells present in blood which comprise mRNA may include mature, immature and developing neutrophils, lymphocytes, monocytes, reticulocytes, basophils, eosinophils, macrophages. All of these cell types also appear in tissues of non-blood origin at various times in various conditions.
  • the biological sample is collected and prepared using various methods. For example, an easy method of collecting cells ofthe blood is by venipuncture.
  • the biological sample may be collected from a performance animal, for example, a horse with suspected laminitis, a human athlete or camel with osteochondrosis, or a greyhound with subclinical cystitis.
  • Blood sample Ten ml of blood is drawn slowly (to prevent hemolysis) from the vein of an animal (jugular vein in a horse and camel, veins on the forearm/limb of humans and dogs) into a 1:16 volume of 4% sodium citrate to prevent clotting and the sample is mixed and then placed on ice.
  • the sample is centrifuged at 3000 RPM at 4°C for 15 minutes and white blood cells (WBC) (commonly called the "buffy coat”) are removed from the interface between plasma and red blood cells (RBC) into a separate tube using a pipette.
  • WBC white blood cells
  • the WBCs are then treated with at least 20 volumes of 0.8% ammonium chloride solution to lyse any contaminating RBC and re-centrifuged at 3000 RPM at 4°C for 5 minutes.
  • the pelletted WBCs are then washed in 0.9% sodium chloride, re-centrifuged, and kept on ice.
  • the cell pellet is then used directly in RNA extraction.
  • a fluid sample for example, sputum, urine, bronchial or nasal lavages, joint fluid, peritoneal fluid or thoracic fluid, is centrifuged at 3000 RPM at 4°C for 20 minutes to collect cells. Samples comprising large amounts of mucous are treated with a mucolytic agent such as dithiothretol prior to centrifugation. A cell pellet is then washed in 0.9% sodium chloride, re- centrifuged and the cell pellet is used directly in RNA extraction.
  • a mucolytic agent such as dithiothretol
  • a tissue biopsy is frozen in dry ice or liquid nitrogen and crushed to powder using a mortar and pestle. The frozen tissue is then used directly in RNA extraction.
  • Total RNA and or mRNA is isolated from a biological sample. Use of isolated mRNA rather than total RNA may provide results with less background and improved signal.
  • RNA is commonly isolated by skilled persons in the art, and examples of some methods for isolating mRNA are described below.
  • RNA extraction kits for example, Qiagen RNA and Direct RNA extraction kits, and RNA extraction kits produced by Invitrogen (formerly Life Technologies) and Amersham Pharmacia Biotech herein inco ⁇ orated by reference, may be used by following the manufacturer's instructions. Key elements of these mRNA extraction protocols include use of an appropriate amount of sample, protection of the sample from RNAse contamination, elution of the sample from a column at 70°C and quantitation and quality checking in an agarose 0.7% gel and using an OD 260/280 ratio. About 0.2 gm (wet weight) of pelleted white blood cells or tissue is required for each mRNA extraction which will yield about 1- 2 ⁇ g of mRNA. Disposable gloves should be worn throughout the procedure, with frequent changes. Both the column and solution used for elution should be at 70°C.
  • PAXgene ® tubes (Qiagen) and incubate at room temperature for 4-8 hours.
  • RNA quantification and assessment of RNA size and quality include standard gel electrophoresis methods of running a small quantity of an RNA sample on an agarose gel with known standards, staining the gel with for example ethidium bromide to detect the sample and standards and comparing relative intensities and size of standard RNA and sample RNAs, comparison of the intensities of the ribosomal RNA bands.
  • RNA concentration in a solution may be determined by measuring absorbance at 260/280 nm in a spectrophotometer relative to known standards and calculated using known formulas.
  • RNA prepared as described above may be synthesised to cDNA and labelled resulting in a labelled probe using kits provided by suppliers such as Amersham Pharmacia Biotech, Invitrogen, Stratagene or NEN, herein inco ⁇ orated by reference.
  • a typical reaction may comprise: template RNA, an oligo-dT primer and/or gene-specific primers, reverse transcriptase enzyme, deoxyribonucleic triphosphates (dNTP), a suitable buffer, and a label inco ⁇ orated into at least one of the dNTPs.
  • dNTP deoxyribonucleic triphosphates
  • Such a reaction when combined with a method of amplifying the resultant cDNA is refened to as RT-PCR (reverse transcriptase- polymerase chain reaction).
  • the reaction mixture comprises of the following: 6.0 ⁇ l of 5X first-strand buffer, 3.0 ⁇ l of 0.1M DTT, 0.6 ⁇ l of unlabeled dNTPs, 3.0 ⁇ l of Cy3 or Cy5 dUTP (1 mM, Amersham), 2.0 ⁇ l of Superscript II (Reverse transcriptase 200 U/ ⁇ L, Life Technologies) made to 15 ⁇ l with pure water.
  • Unlabelled dNTPs are sourced from a stock solution consisting of 25mM dATP, 25 mM dCTP, 25 mM dGTP, 10 mM dTTP.
  • 5X first-strand buffer consists of 250 mM Tris- HCL (pH 8.3), 375mM KCI, 15mM MgC12). The mixture is incubated at 42°C for 1 hr. Add an additional 1 ⁇ l of reverse transcriptase to each sample. Incubate for an additional 0.5-1 hrs. Degrade the RNA and stop the reaction by adding 15 ⁇ l of 0.1N NaOH, 2mM EDTA and incubate at 65-70°C for 10 min. If starting with total RNA, degrade the RNA for 30 min instead of 10 min. Neutralize the reaction by adding 15 ⁇ l of 0.1N HC1. Add 380 ⁇ l of TE (lOmM Tris, lmM EDTA) to a Microcon YM-30 column (Millipore).
  • Cotl DNA Methods for making Cotl DNA are common in the art), 20 ⁇ g polyA RNA (10 ⁇ g/ ⁇ l, Sigma, #P9403) and 20 ⁇ g tRNA (10 ⁇ g/ ⁇ l, Life Technologies, #15401-011). Centrifuge 7-10 min. at 14,000 x g. The probe needs to be concentrated such that with the addition of other solutions required for hybridisation the volume is not excessive, or is suitable for use with a desired slide and cover slip size. Invert the microcon into a clean tube and centrifuge briefly at 14,000 RPM to recover the probe.
  • a nucleic acid may be labelled with one or more labelling moieties for detection of hybridised labelled nucleic acid (i.e., probe) and target nucleic acid complexes.
  • Labelling moieties may include compositions that can be detected by spectroscopic, photochemical, biochemical, immunochemical, optical or chemical means. Labelling moieties may include radioisotopes, such as 32P, 33P or 35S, chemiluminescent compounds, labelled binding proteins, heavy metal atoms, spectroscopic markers, such as fluorescent markers and dyes, magnetic labels, linked enzymes, and the like. Prefened fluorescent markers include Cy3 and Cy5, for example available from Amersham Pharmacia Biotech (as decribed above).
  • RNA transcript synthesis and labelling The Affymetrix system uses RNA as substrate and generates biotin labelled cRNA through a series of reactions using a BioAnay HighYeild RNA transcript labelling kit (available from Enzo)and the following protocol:
  • cRNA Cleanup 3.1 Add 60 ⁇ l of DEPC treated water to each sample, bringing total volume to approximately 100 ⁇ l.
  • Samples may be stored in -20° C freezer until the next day.
  • Quantification of cRNA, Fragmentation and Preparation of Hybridisation Mix 4.1. Determine concentration of cRNA sample by spectrophotometry. 4.1.1. Measure and record the volume of each sample.
  • a 60 value should be greater than or equal to 0.09. Repeat steps 2.1.4-2.1.6 for any sample that doesn't fall within this range or wait to see if sample fails on the gel image before repeating.
  • nucleic acids representing expressed genes from cells found in blood of a performance animal, for example a horse, human, camel or dog.
  • the nucleic acids may be of any length, for example a polynucleotide or oligonucleotide as defined herein.
  • Each nucleic acid occupies a known location on an anay.
  • a nucleic acid target sample probe is hybridised with the anay of nucleic acids and an amount or relative abundance of target nucleic acid hybridised to each probe in the anay is determined.
  • High-density anays are useful for monitoring gene expression and presence of allelic markers which may be associated with disease. Fabrication and use of high density anays in monitoring gene expression have been previously described, for example in WO 97/10365, WO 92/10588 and US Patent No. 5,677,195, all inco ⁇ orated herein by reference.
  • high-density oligonucleotide anays are synthesised using methods such as the Very Large Scale Immobilised Polymer Synthesis (VLSLPS) described in US Patent No. 5,445,934, inco ⁇ orated herein by reference.
  • VLSLPS Very Large Scale Immobilised Polymer Synthesis
  • Arrays for humans are commercially available from companies such as Incyte, Research Genetics, and Affymetrix.
  • Canine expression anays have been developed by Lion Bioscience, Pfizer and GeneLogic. These anays typically comprise between 2,000 and 60,000 transscripts and are species specific (none are available for the horse or camel). Some of these genes are in multiple copies on the anay and have not been fully annotated or given a true gene identity. Additionally, it is not known whether DNA on the anay, when hybridised to a test sample, specifically binds to a single gene. This latter instance results from splice variants of RNA transcripts in tissues such that one gene may encode multiple transcripts.
  • Human and dog anays can be used in methods described herein. However, these anays are cunently non-specific and include genes that are not expressed in blood cells of animals, and/or do not contain genes important in controlling the function of blood cells, and/or contain regions of genes that are not specific to blood cells. Clones containing specific genes are available and can be purchased for human (mouse and dog) for use on anays (for example from the IMAGE consortium or Lion Bioscience). However, it is not possible to obtain specific clones for use on a blood-specific anay without prior knowledge of what genes are expressed in blood cells. The IMAGE consortium also does not guarantee that the gene of interest is contained in the clone purchased.
  • Figure 14 shows steps for constructing an anay in one embodiment.
  • Samples comprising cells expressing as many genes of interest in relation to condition(s) of a performance animal are collected.
  • a sample comprising a mixture of nucleated blood cells from performance animals with conditions such as, osteochondrosis, laminitis, tendon soreness, bursitis, abcesses, inflammation, allergy, viral infection, parasite infection, asthma, etc.
  • mRNA isolation kits or the protocol described above. Concunently, 5 ⁇ g of mRNA is isolated from umbilical cord blood, and/or early stage foetus. Cells and tissues contained within these sources would express genes that may not be expressed in the cells extracted from blood in the above example. Isolation of cytoplasmic mRNA from cells is prefened. This step involves rupturing the cells with a solution comprising detergent and/or chaotropic agent and salt such that cell nuclei and the nuclear membrane remain intact. The cell nuclei are pelleted by centrifugation and the supernatant is used for mRNA extraction.
  • a solution comprising detergent and/or chaotropic agent and salt
  • Protocols for this procedure are available as part of mRNA isolation kits (eg available by Qiagen). These mRNAs may be used to construct cDNA libraries. Kits for the construction of cDNA libraries are available from companies including Stratagene and Invitrogen (eg Uni- ZAP XR cDNA synthesis library construction kit #200450).
  • the library preferably should be constructed such that the orientation of the cDNA in the vector is known, that the mRNA is primed using oligo dT, the vector is capable of receiving a nucleic acid insert up to 10 kb and that purification of DNA suitable for DNA sequencing is possible and easy.
  • Plasmids generated from such a library can be DNA sequenced using protocols that are well established in the art and are available, for example, from Applied Biosystems. Briefly, a mix of 0.5 ⁇ g of plasmid DNA, 3.2 pmol of a primer that hybridises to the vector DNA (eg Ml 3 -21, or Ml 3 reverse primer), thermostable DNA polymerase, dNTP and labelled dNTP is subjected to a routine PCR procedure to generate fragments of DNA that can be separated by gel electrophoresis and using machinery such as that available from Applied Biosystems (eg a 3700 DNA sequencer).
  • a primer that hybridises to the vector DNA eg Ml 3 -21, or Ml 3 reverse primer
  • Generated DNA sequence data (chromatogram) is assessed and quality scores and binning of similar sequences is done using a computer program package such as Phred/Phrap/Consed.
  • the raw DNA sequence data can then be loaded into a database where comments (annotation) on the sequence can be made, such as quality score, bin, length of poly A sequence (should there be one), BLAST search results, highest homology in GenBank, clone identity, other entries in GenBank.
  • Subjective factors influencing whether a nucleic acid should be used on an anay include quality and confidence of the DNA sequence, a GenBank homology score with identified nucleic acids, evidence of a poly-A tail (indicative of a translated transcript), uniqueness of the 3' sequence data (compared to both GenBank and an in-house database of clone sequences).
  • Nucleic acid primers can be selected using a program such as Primer 3 available via the Internet (www-genome.wi.mit.edu/cgi-bin primer/primer3).
  • the selected primers may be used for amplifying a nucleic acid, for example by PCR, or directly applied to an anay.
  • Uniqueness of a nucleic acid can be tested by performing additional BLAST searches on GenBank and an in-house database.
  • Primers are preferably designed such that melting temperatures are similar, and amplification products are of a similar nucleic acid length.
  • Primers for PCR are generally between 18 and 25 nucleotide bases long.
  • Primers for direct use on a microanay or device are preferably between 50 and 80 nucleotide bases long.
  • Both the amplification product and the single primer should hybridise to DNA that uniquely identifies a gene transcript.
  • Specific programs using various formulas are available for calculating the melting temperature of various lengths of DNA (eg Primer 3).
  • selected DNA sequences can be provided to Affymetrix for production of a proprietary and custom anay. The sequences generated in-house are provided to Affymetrix in Fasta format along with details of which parts of the sequence to be used for the generation of a probe set (11 probes, each 25 nucleotide bases long) for each gene represented on the anay.
  • Nucleotide sequences may be compared with an existing database, for example GenBank, to determine a previously provided name, tissue expression, timing of expression, biochemical pathway, cluster membership, and possible function or cellular role of an expressed nucleic acid.
  • a nucleic acid fragment may be used as a probe to isolate a full-length nucleic acid which may encode a gene which is associated with a particular disease or condition.
  • identified nucleic acids may be used to isolate homologues thereof, inclusive of orthologues from other species.
  • An identified nucleic acid may also be cloned into a suitable expression vector to produce an expressed polypeptide in vitro, which may be used, for example as an antigen in generating antibodies and for use on protein anays.
  • the antibodies may be used for developing specific diagnostic assays or therapies, for three- dimensional protein structure such as X-ray crystallographic studies, or for therapeutic development.
  • An anay may comprise any number of different nucleic acids, but typically comprises greater than about 100, preferably greater than about 1,000, more preferably greater than about 5,000 different nucleic acids.
  • An anay may comprise more than 1,000,000 different nucleic acids.
  • Each nucleic acid is preferably represented more than once for scanning internal comparison and control.
  • the nucleic acids are provided in small quantities and are gene- specific and/or species-specific usually between 50 and 600 nucleotides long, ananged on a solid support.
  • the Affymetrix system uses 11 probes per gene, each of 25 nucleotides, that are built onto the anay using a photolithographic method (US Patent Nos. 6,309,831; 6,168,948; 5,856,174; 5,599,695; 5,831,070; 6,153,743; 6,239,273; 6,271,957; 6,329,143; 6,310,189 and 6,346,413).
  • the nucleic acids may be dotted onto the solid support or bound to microspheres, or in solution.
  • a typical anay may have a surface area of less than 1 cm2, for example a microanay.
  • a nucleic acid can be attached to a solid support via chemical bonding. Furthermore, the nucleic acid does not have to be directly bound to the solid support, but rather can be bound to the solid support through a linker group.
  • the linker groups may be of sufficient length to provide exposure to the attached nucleic acid. Linker groups may include ethylene glycol oligomers, diamines, diacids and the like. Reactive groups on the solid support surface may react with one of the terminal portions of the linker to bind the linker to the solid support. Another terminal portion of the linker is then functionalised for binding the nucleic acid.
  • a solid support may be any suitable rigid or semi-rigid support, including charged nylon or nitrocellulose, chemically treated glass slides available from companies such as NEN, Corning, S&S, anays available through Affymetrix, membranes, filters, chips, slides, wafers, fibers, magnetic or nonmagnetic beads, gels, tubing, plates, polymers, microparticles and capillaries.
  • the solid support can have a variety of surface forms, such as wells, trenches, pins, channels and pores, to which the nucleic acids are bound.
  • the solid support is optically transparent.
  • the anay may be constructed using an "anaying machine” manufactured by companies for example Molecular Dynamics, Genetic Microsystems, Hitachi, Biorobotics, Amersham, Coming. Alternatively, the anay may be manufactured according to specific instructions provided by the user to Affymetrix. Source materials for this machine include microtitre plates comprising nucleic acids representative of unique genes, or sequence information.
  • An anay element may comprise, for example, plasmid DNA comprising nucleic acids specific for a gene sequence, an amplified product using gene-specific or non-specific primers and template DNA or RNA, or a synthesised specific oligonucleotide or polynucleotide.
  • Anay elements may be purified, for example, using Sephacryl-400 (Amersham Pharmacia Biotech, Piscataway, N. J.), Qiagen PCR cleanup columns, or high performance liquid chromotography (for oligonucleotides).
  • Purified anay elements may be applied to a coated glass substrate using a procedure described in U.S. Pat. No. 5,807,522, inco ⁇ orated herein by reference.
  • DNA for use on Corning amino-silane coated slides (CMT-GAPSTM) is re-suspended in 3xSSC to a concentration of 0.15-0.5 ⁇ g/ ⁇ l and then used directly in an anaying machine in 96 or 384-well plates.
  • An example for preparing an anay element is provided by the manganese superoxide dismutase gene.
  • a clone comprising a nucleic acid insert is prepared and isolated as described above. The clone is sequenced to identify the nucleotide sequence.
  • a BLAST search using the identified nucleotide sequence is performed to determine homology of the cloned nucleic acid with nucleic acids in a database, for example GenBank. Identification of nucleotide sequence homology with superoxide dismutase genes stored in the database provides a level of confidence that the clone comprises at least in part a gene for superoxide dismutase for the horse.
  • Unique primers can be designed to amplify a nucleic acid using PCR and the clone DNA, or genomic DNA from the same species as a template. Purified amplification product can be directly attached to an array and thereby act as a target for a complementary labelled nucleic acid probe in the test and reference samples. Alternatively, a unique sequence can be determined and an oliognucleotide manufactured and purified for direct use on an anay, or the sequence information supplied directly to Affymetrix for the construction of a custom anay.
  • the anay may comprise negative and positive control samples (preferably as duplicates or triplicates) such as nucleic acids from species different from a sample being tested (negative controls) and various nucleic acids (representative of RNAs and both ends of RNA molecules) that are found in all tissues as a constant and known quantity (positive controls).
  • negative and positive control samples preferably as duplicates or triplicates
  • nucleic acids from species different from a sample being tested (negative controls) and various nucleic acids (representative of RNAs and both ends of RNA molecules) that are found in all tissues as a constant and known quantity (positive controls).
  • positive controls are identified and used by the anay reader to provide data on true signal (i.e., Specific hybridisation between probe and target) and noise (i.e., Non-specific hybridisation between probe and target) and average intensity from multiple reads of several different locations for each nucleic acid attached to the anay.
  • a test sample and a reference sample may be simultaneously assayed on the anay.
  • the reference sample may comprise mRNA from multiple sources, such that most, preferably all of the nucleic acids on the anay are represented in the test sample, and can be used by the anay reader as a non-zero standard and for comparison with an average ofthe read-outs from the test sample.
  • a relative intensity for each gene on the anay can be calculated.
  • the relative abundance of expression of each gene in a sample can also be calculated using controls within the anay, such as certain genes expressed in a tissue at a constant level under all conditions.
  • an absolute level of expression is calculated based on the difference between the perfect match and mismatch hybridisation for each ofthe 11 probes for each gene. Using such a process a gene is scored as present or absent and an absolute measure of intensity is given along with a p value.
  • the inte ⁇ reted anay may highlight only a few genes that are substantially different in expression between a test and reference sample.
  • the overall pattern of expression may provide a "finge ⁇ rint" to characterise the way in which the original cells have responded to a particular condition of a performance animal.
  • the gene for superoxide dismutase may be the only gene up-regulated in a particular condition, especially in conditions of inflammation, or a large number of genes may be up- and down- regulated in various conditions. It is this finge ⁇ rint, rather than specific knowledge of gene sequence or function that can be used as a marker for various conditions. It would be expected that finge ⁇ rints be useful across species barriers to include performance animals such as humans, horse, dog and camel.
  • the a ⁇ angement of nucleic acids on the anay may be periodically changed and these anays are then assigned a particular batch code that conesponds to a specific anay comprising a specific nucleic acid a ⁇ angement.
  • the ability to change the anangement of nucleic acids on the anay and knowledge of the exact anangement may prevent other people from generating a database using the anays described above.
  • Using a batch code also enables tracking of manufacturers of the anays in regards to the number of anays produced.
  • the batch code further enables validation of a user of the communication network or "internet" diagnostic method and system. Batch code can also identify a particular type of anay used, should more disease-specific a ⁇ ays be designed and manufactured.
  • Control samples may be respectively labelled in parallel with a test and reference sample. Quantitation controls within a sample may be used to assure that amplification and labelling procedures do not change a true distribution of nucleic acid probes in a sample.
  • a sample may include or be "spiked" with a known amount of a control nucleic acid which specifically hybridises with a control target nucleic acid. After hybridisation and processing, a hybridisation signal obtained should reflect accurately amounts of control nucleic acid added to the sample.
  • a microanay may have intemal controls, for example a nucleic acid encoding a common gene expressed by the performance animal with known expression levels and a nucleic acid encoding a gene from another species that is known not to hybridise to the test or reference sample.
  • blocking agents such as Cot DNA from the tested species may also be used.
  • the inventors constructed equine cDNA gene libraries from white blood cells (WBC) drawn from five horses, and a 60-day-old foetus. Briefly, about 10,000 bacterial clones containing equine genes from these libraries were picked at random and the cloned genes were analysed by high throughput directional sequencing to obtain ⁇ 600 bp of 3' sequence for each clone.
  • WBC white blood cells
  • the BLAST algorithm matches a query sequence to detect relationships among sequences that share regions of similarity while giving a statistical score to eliminate the probability for background hits. Annotations for each sequence were derived from using the highest BLAST score values aligned to the query sequence. Additionally, all genes available in the inventors' equine-specific database (also refened to herein as the "Genetraks database") were compared U5 to themselves using the BLAST algorithm, and any homologous sequences were removed.
  • a control sequence file contained DNA sequence that allowed for the design of:
  • Pruning files were also generated. As is known in the art, pruning is a sequence comparison method. The standard practice for probe selection is to prune against specific bacterial and species-specific controls, in addition to any custom sequences provided for the design. Pruning increases the quality of the unique probe sets selected for the design and reduces the risk of cross-hybridization with other sequences. There were two types of pruning sequence files created for probe selection — hard pruning and soft pruning:
  • a hard pruning sequence file contained sequences that were not to be included on the GeneChip®.
  • the hard pruning file contained repetitive elements and ribosomal RNA sequences that are abundantly expressed in equine WBC. Probes that cross-hybridise to hard pruning sequences are not included in a probe set.
  • a soft pruning sequence file contained sequences to be included on the GeneChip® but acting as controls, so that any primers on the chip would preferably not cross hybridise with these sequences. These sequences included the standard bacterial and species-specific Affymetrix controls (e.g., intronic sequence, ribosomal sequences, housekeeping genes).
  • Affymetrix then used this information to design six to 11 unique probe pairs per gene.
  • Nucleic acid probes may be prepared as described above from a biological sample from a performance animal that has been assessed concunently by physical inspection and/or blood tests or other method. Nucleic acid targets from a statistically relevant number of normal animals previously hybridised to anays, and a reference range for each of the genes on the anay is calculated and used as a normal reference range (for example a 95% population median). Results from a test sample from a test animal can be compared with the same genes as the normal reference to determine if the test sample falls within the normal reference range.
  • nucleic acid targets may also be prepared from biological samples from apparently normal animals, animals with overt disease, various progressive stages of disease, hitherto undiagnosed or unclassified conditions or stages of such conditions, animals treated with known amounts of drugs (legal or otherwise), animals suspected of being treated with drugs (legal or otherwise), animals under specific exercise regimes for the sake of performance, animals subjected to (intentional or not) various nutritional states and/or environmental conditions.
  • Databases of information from the use of such samples and anays are created such that test samples can be compared. The database will then contain specific patterns of gene expression for particular conditions.
  • a nucleic acid probe Prior to hybridisation, a nucleic acid probe may be fragmented. Fragmentation may improve hybridisation by minimising secondary structure and/or cross-hybridisation with another nucleic acid probe in a sample or a nucleic acid comprising non-complementary sequence. Fragmentation can be performed by mechanical or chemical means common in the art.
  • a labelled nucleic acid target may hybridise with a complementary nucleic acid probe located on an anay.
  • Incubation conditions may be adjusted, for example incubation time, temperature and ionic strength of buffer, so that hybridisation occurs with precise complementary matches (high stringency conditions) or with various degrees of less complementarity (low or medium stringency conditions). High stringency conditions may be used to reduce background or non-specific binding.
  • Specific hybridisation solutions and hybridisation apparatus are available commercially by, for example, Stratagene, Clontech, Geneworks.
  • Affymetrix have detailed a standard procedure for the hybridisation of probes with an anay (as describe at their website, affymetrix.com, inco ⁇ orated herein by reference), however, a typical method entails the following:
  • Adjust probe volume (prepared as above) to a value indicated in the "Probe & TE" column below according to the size of the cover slip to be used and then add the appropriate volume of 20XSSC and 10% SDS.
  • 20xSSC is 3.0 M NaCl, 300 mM NaCitrate (pH 7.0).
  • wash solutions generally comprise salt and detergent in water and are commercially available.
  • the wash solutions are applied to the anay at a predetermined temperature and can be performed in a commercially available apparatus.
  • Stringency conditions of the wash solution may vary, for example from low to high stringency as herein described. Washing at higher stringency may reduce background or non-specific hybridisation. It is understood that standardisation of this step is required to produce maximum signal to noise ratio by varying the concentration of salt used, whether detergent is present (SDS), the temperature of the wash solution and the time spent in the wash solution.
  • SDS detergent is present
  • a typical wash protocol consists of removing the slide from a slide chamber, removing the cover slip and placing the slide into 0.1%SSC (recipe provided above) and 0.1% SDS at room temperature for 5 minutes. Transfer the slide to 0.1 % SSC for 5 minutes and repeat. Dry the slide using centrifugation or a stream of air. Equipment is available to enable the handling of more than one slide at a time (for example, slide racks).
  • a scanner or "anay reader” is used to determine the levels and patterns of fluorescence from hybridised probes.
  • the scanned images are examined to determine degree of hybridisation and the relative abundance of each nucleic acid on the anay.
  • a test sample signal conesponds with relative abundance of an RNA transcript, or gene expression, in a biological sample.
  • an Affymetrix anay is read and computer algorithms calculate the difference between hybridisation on perfect match and mismatch probes for each of the 11 probes sets for each gene. It then calculates a presence or absence, an absolute value for each gene and a p value for the absolute call.
  • Array readers are available commercially from companies such as Axon and Molecular Dynamics and Affymetrix. These machines typically use lasers, and may use lasers at different frequencies to scan the anay and to differentiate, for example, between a test sample (labelled with one dye) and the control or reference sample (labelled with a different dye). For example, an anay reader may generate spectral lines at 532 nm for excitation of Cy3, and 635 nm for excitation of Cy5.
  • a relative quantity of RNA may be calculated by the anay reader and computer for respective nucleic acids on the anay for respective samples based on an amount of dye detected, average of duplicate samples for respective genes and subtraction of background noise using controls.
  • the reader is pre-programmed to perform such calculations (using proprietary software supplied with the anay reader, such as MAS 5.0 for the Affymetrix system and Genepix for the Axon Instruments reader) and with information on the location of each nucleic acid on the anay such that each nucleic acid is given a readout value.
  • Controls or reference samples providing a readout for particular nucleic acids that falls within standard ranges ensures conect integrity ofthe anay and hybridisation procedures.
  • Programs typically generate digital data and format it for transmission
  • Generated data is transmitted via a communications network to a remote central database.
  • a user having access to the gene expression data enters information in relation to a test sample into a standard diagnostic form such that it can be digitalised.
  • the information will include clinical appraisal and blood profile results.
  • the format of such information is standard globally such that details on clinical conditions may be based on numerical input and each field of entry can be digitalised. For example, body temperature field could be number 0001 , a recorded temperature within normal range would receive the number 0, 0.5OC above what is considered to be the normal range for that species would receive a number 5, IOC above normal range would receive 10.
  • Integument eyes, sores, abcesses, wounds, insects/parasites, allergy, infection.
  • Cardio/Respiratory eyes, nasal discharge, rales, viral/bacterial infection, allergy, chronic obstructive pulmonary disease, cough wheeze, crepitous sounds in the thorax, epistaxis, auscultation sounds, heart sounds, capillary refill, mucous membrane colour.
  • Gastrointestinal dianhoea, colic/stasis, parasites, appetite level, drenching time and dose.
  • Reproductive stage of pregnancy, abortion, inflammation, discharges.
  • Musculoskeletal lameness, laminitis, bone or shin soreness, muscle soreness or tying up, tendon or ligament affected, level of pain, X-ray data, scintigraphy data, CAT scan data, bursitis, bruising, cramping or "tying up”.
  • Blood test results biochemistry, immunology, serology (viral, bacteriological, hormone levels), cell counts, cell mo ⁇ hology, pathologist inte ⁇ retation.
  • Other diagnostic test results X-ray, biopsy, histopathology, CAT scan, MRI, bacteriology, virology.
  • Other data Season (date), location, male or female, vaccination history, body score (fitness and fat), fitness level.
  • the entire system could be based on the aforementioned SNOMED system with appropriate modifications to encompass descriptions of exercise physiology and the normal animal.
  • the entire system could rely on text or categorical data that can be appraised and scored by software such as Omniviz. Whatever system is used, if would be appreciated that the aim is to adequately, systematically and in a standard manner describe the cunent condition of the animal to the best of cu ⁇ ently available technologies and could include results from machinery such as X-ray, ultrasound, scintigraphy and blood analysis.
  • the user also ensures that anay results (that may for example be automatically collected from a reader), anay specifications, data mining specifications, level of inte ⁇ retation required and the clinical information are entered and conespond to the same animal and the same sample.
  • the form is transmitted electronically to a central database and recognised as an individual accession or request by the database.
  • the central database recognises the user (using for example digital certificates), the user recognises the central database, the anay batch code and gene anay order are verified, and the user is allowed access (which may be automatic) and automatic processing of the request is performed if security and billing information are adequate.
  • the processing involves specific mining of central data and specific user requested information is retrieved and resent automatically.
  • gene expression data from an anay reader may be transmitted via a communications network directly to a server which is connected to a central database. Additional information could be input by the user at a processor which is also linked to the anay reader.
  • a central database inte ⁇ rets the anay specifications (e.g., nucleic acid order on a microanay), decodes the information transmitted, determines nucleic acid expression level in a biological sample and compares the expression level and patterns of expression with known standards or reference range.
  • Various levels of database inte ⁇ retation may be applied to the data transmitted, depending on the user requirements.
  • Clusters of genes may be up-regulated or down-regulated in certain conditions and the database makes automated conelations to specific conditions by accessing various levels of database information.
  • Mining software such as Metamine (Silicon Genetics), AnaySCOUT (Lion Bioscience) can be used in this instance, and more advanced data mining technologies could be used to identify patterns and nearest neighbour information in data (such as products from AnVil Informatics Inc and OmniViz Inc).
  • software capable of taking rule-based instmctions such as that described by Pacific Knowledge Systems Sydney Australia in their "ripple down” technology
  • having the ability to self learn such as that described in Khan et al. Nature Medicine 7 (6) 673, inco ⁇ orated herein by reference, could be used at this stage to limit the level of human interaction in determining a diagnosis.
  • an artificial neural network is used, and samples are divided into training and validation sets to create trained calibrated models. The calibrated models are then used to rank genes in diagnostic importance.
  • Levels of database may include: • Unique gene sequences (eg 3' and 5' EST sequence of genes)
  • Primer sequences used to generate amplification products eg two primer sequences used to uniquely amplify the gene for gamma interferon in a particular species
  • Microanay construction and format eg coded information on anay manufacture batch and identification of genes and position on the anay
  • Blood profile and clinical data associated with particular conditions eg standard clinical information and IDEXX-machine generated blood profile data
  • the database may be built over time and more intensive searching of the database may incur a greater cost.
  • changes may be made to the above methodology to increase the sensitivity of the detection of variation in expression of condition-specific genes - this could include the use of condition-specific anays or condition- specific primers.
  • Condition-specific anays can be manufactured by a company such as Affymetrix (under instmctions) that would allow for increased sensitivity and specificity, much reduced size of anays, decreased cost of production, and the ability to process multiple samples at once.
  • the process of building the database is iterative, such that specific genes are conelated to specific conditions, and the detection of variations in these genes becomes more sensitive and specific through the use of various modifying processes through the procedure (e.g., the use of gene-specific primers for the amplification and labelling of cDNA from RNA, and the selection of limited numbers of genes on a disease- or condition-specific anay, detection of splice variants and single nucleotide polymo ⁇ hisms).
  • various modifying processes e.g., the use of gene-specific primers for the amplification and labelling of cDNA from RNA, and the selection of limited numbers of genes on a disease- or condition-specific anay, detection of splice variants and single nucleotide polymo ⁇ hisms.
  • the database reports back electronically to a remote user, either automatically or with a level of human intervention.
  • the electronic report may be converted to a printed document.
  • the report provides details of an animal's condition that is determined by conelation of gene expression data with information stored in a remote database, and optionally expert analysis.
  • genes up-regulated or down-regulated for example, with Iaminitis or joint capsule inflammation or bursitis, a report on the up-regulation of genes such as interleukin-3, manganese superoxide dismutase, Gro ⁇ , metalloproteinase matix-metallo- elastase, ferritin light chain may have some conelation to tissue inflammation, and down- regulation of genes such as insulin-like growth factor and its receptor may be conelated to recovery from such a condition).
  • the identity of these genes cannot be predicted to be associated to any condition unless the above described methodology is used and databases on relative expression of genes for particular conditions have been compiled. Therefore a screening test covering all genes may need to be performed first and a second, more specific test then applied. ill
  • the demonstration study involved 108 blood samples. Twenty were from horses with induced osteoarthritis, 11 from horses with Equine He ⁇ es Vims (EHV), 14 from horses with gastric ulcer syndrome and 63 from normal healthy horses.
  • EHV Equine He ⁇ es Vims
  • Blood samples were collected in Paxgene tubes and mRNA extracted from each sample, using methods described above.
  • RNA extracted from each sample was checked for quality and quantity prior to running on a GeneChip® using an Agilent "Lab-on-a-Chip" system. Examples of the results from such a chip confirming the quality of sample RNA are shown in Figure 18, including a description of the metrics used to determine the quality and quantity of total RNA. By contrast, the trace shown in Figure 19 represents poor quality RNA that was failed by quality control.
  • RNA was used as a template to generate double stranded cDNA.
  • cRNA was generated and labeled using biotinylated Uracil (dUTP).
  • biotin-labeled cRNA was cleaned and the quantity determined using a spectrophotometer and MOPS gel analysis.
  • labelled cRNA was fragmented to ⁇ 300bp in size.
  • a hybridisation cocktail is prepared containing 0.05 ⁇ g/ ⁇ l of labelled and fragmented cRNA, spike-in positive hybridisation controls, and the Affymetrix oligonucleotides 0 B2, bioB, bioC, bioD and ere.
  • the final volume (80 ⁇ l) of the hybridisation cocktail is added to a GeneChip® cartridge.
  • the cartridge is placed in a hybridisation oven at constant rotation for 16 hours.
  • the fluid is removed from the GeneChip® and stored.
  • the GeneChip® is placed in an Affymetrix fluidics station.
  • the GeneChip® is washed, stained with steptavidin-phycoerythin dye and then 0 washed again using low salt solutions.
  • the scanner and MAS 5 software generated an image file from a single Genechip® called a .DAT file (see Figures 20 and 21).
  • the .DAT file was then pre-processed prior to any statistical analysis.
  • the .DAT file is an image (see Figure 20 and 21).
  • the image was inspected manually for artefacts (e.g. high/low intensity spots, scratches, high regional or overall background).
  • artefacts e.g. high/low intensity spots, scratches, high regional or overall background.
  • the B2 oligonucleotide hybridisation performance is easily identified by an alternating pattern of intensities creating a border and anay name.
  • the MAS 5 software used the B2 oligonucleotide border to align a grid over the image so that each square of oligonucleotide was centred and identified.
  • the other spiked hybridisation controls (bioB, bioC, bioD and ere) were used to evaluate sample hybridisation efficiency by reading "present" gene detection calls with increasing signal values, reflecting their relative concentrations. (If the .DAT file is of suitable quality it is converted to an intensity data file (.CEL file) by Affymetrix MAS 5 software.)
  • the .CEL files generated by the MAS 5 software from .DAT files contain calculated raw intensities for the probe sets. Gene expression data was obtained by subtracting a calculated background from each cell value. To eliminate negative intensity values, a noise conection fraction based from a local noise value from the standard deviation of the lowest 2% of the background was applied. All .CEL files generated from the GeneChips® were subjected to specific quality metrics developed by Gene Logic. GeneChips® that failed these metrics were not included in the study.
  • Some metrics are routinely recommended by Affymetrix and can be determined from Affymetrix internal controls provided as part of the GeneChip®. These quality metrics are used to ensure that data are not unduly influenced by failures in hybridisation, inadequate plate washing or contamination or flaws in the Affymetrix chips.
  • RMA Robust Multi-chip Analysis
  • the RMA algorithm does not use mis-match probes.
  • normalisation occurs at the level of the probe pair. It is based on quantile-quantile normalisation, in which all chips are constrained to have the same quantiles of probe intensity.
  • kernel density plots were used to display the distribution of gene expression values for each chip. These kernel density estimates were plotted on the same axes - to identify any genes with atypical responses.
  • the centred gene expression values are defined as: ⁇ j ⁇ -Xy ⁇ X ⁇ where Yy is the centred jth gene expression value of the ith sample, Xy is the uncentred gene expression value for the jth gene in the ith sample, and X J is the mean expression for the jth gene over all samples.
  • Multivariate summaries of gene expression are generated using principal components analysis.
  • the components are calculated using the left singular vectors from a singular value decomposition ofthe centred data matrix Y.
  • the eigenvectors x are then used as the coefficients of linear functions ofthe principal component scores, to define new linear combinations - the discriminant functions.
  • the Euclidean Distances are calculated between the test observation and each disease mean, in the space of the linear discriminant functions.
  • the test observation is then allocated to the disease group for which it has the smallest distance. This gives a predicted value for the test observation.
  • Steps from 4 to 7 are repeated with a varying number of principal components.
  • test observation is re-instated, and the next observation dropped and regarded as a test observation. Steps 2 to 10 are repeated until each observation has been used as a test observation. 12.
  • the predicted disease groups for each observation are tabulated against the true disease groups. The number of principal components is chosen to maximise the accumulated prediction success.
  • Figure 22 shows a scatter plot of the four conditions (osteoarthritis, EHV, gastric ulcer syndrome and normal) with respect to the first two linear discriminant functions in the demonstration study. There are clear separations between each of the groups - masked to some extent by the restrictions of plotting in two dimensions.
  • the technique may also be applied to any subjects, including humans.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Data Mining & Analysis (AREA)
  • Public Health (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Epidemiology (AREA)
  • Chemical & Material Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Bioethics (AREA)
  • Genetics & Genomics (AREA)
  • Organic Chemistry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Molecular Biology (AREA)
  • Zoology (AREA)
  • Primary Health Care (AREA)
  • Analytical Chemistry (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Pathology (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

La présente invention se rapporte à un procédé d'évaluation de l'état d'un sujet. En particulier, ce procédé consiste à obtenir des données relatives au sujet, qui incluent des valeurs respectives pour chacun des paramètres d'un ensemble de paramètres, lesdites valeurs de paramètres étant représentatives de l'état biologique courant du sujet. Les données relatives au sujet sont comparées à des données préétablies qui incluent des valeurs pour au moins certains des paramètres et une indication d'état pathologique. L'état du sujet, et en particulier, la présence et/ou l'absence d'un ou plusieurs états pathologiques, peut alors être évalué en fonction des résultats de la comparaison.
PCT/AU2003/001517 2002-11-14 2003-11-14 Evaluation d'etat WO2004044236A1 (fr)

Priority Applications (4)

Application Number Priority Date Filing Date Title
AU2003275800A AU2003275800B2 (en) 2002-11-14 2003-11-14 Status determination
CA2505151A CA2505151C (fr) 2002-11-14 2003-11-14 Evaluation d'etat
NZ539578A NZ539578A (en) 2002-11-14 2003-11-14 Health and performance status determination of a biological subject using level, abundance or functional activity of a gene expression product in sample cells
EP03810913A EP1581658A4 (fr) 2002-11-14 2003-11-14 Evaluation d'etat

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
AU2002952696A AU2002952696A0 (en) 2002-11-14 2002-11-14 Status determination
AU2002952696 2002-11-14

Publications (1)

Publication Number Publication Date
WO2004044236A1 true WO2004044236A1 (fr) 2004-05-27

Family

ID=28796082

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU2003/001517 WO2004044236A1 (fr) 2002-11-14 2003-11-14 Evaluation d'etat

Country Status (5)

Country Link
EP (1) EP1581658A4 (fr)
AU (1) AU2002952696A0 (fr)
CA (1) CA2505151C (fr)
NZ (1) NZ539578A (fr)
WO (1) WO2004044236A1 (fr)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1976426A2 (fr) * 2006-01-19 2008-10-08 Samuel R. Valenti Systeme et procedes de diagnostic medical
WO2008129458A1 (fr) * 2007-04-18 2008-10-30 Koninklijke Philips Electronics N.V. Procédé appliqué éà des spectres de fréquences d'adn pour exploration de données
EP2087451A2 (fr) * 2006-12-01 2009-08-12 Ameritox, Ltd. Procédé et appareil pour générer des rapports de toxicologie
US8137912B2 (en) 2006-06-14 2012-03-20 The General Hospital Corporation Methods for the diagnosis of fetal abnormalities
US8168389B2 (en) 2006-06-14 2012-05-01 The General Hospital Corporation Fetal cell analysis using sample splitting
US8195415B2 (en) 2008-09-20 2012-06-05 The Board Of Trustees Of The Leland Stanford Junior University Noninvasive diagnosis of fetal aneuploidy by sequencing
WO2012059839A3 (fr) * 2010-11-01 2012-09-07 Koninklijke Philips Electronics N.V. Test de diagnostic in vitro comprenant le courtage automatisé de paiements de royalties pour des tests propriétaires
US8585971B2 (en) 2005-04-05 2013-11-19 The General Hospital Corporation Devices and method for enrichment and alteration of cells and other particles
US8895298B2 (en) 2002-09-27 2014-11-25 The General Hospital Corporation Microfluidic device for cell separation and uses thereof
WO2014201516A2 (fr) 2013-06-20 2014-12-24 Immunexpress Pty Ltd Identification de marqueur biologique
US8921102B2 (en) 2005-07-29 2014-12-30 Gpb Scientific, Llc Devices and methods for enrichment and alteration of circulating tumor cells and other particles
EP2071339A3 (fr) * 2007-12-12 2015-05-20 Sysmex Corporation Système pour fournir des informations de test animal et procédé pour fournir des informations de test animal
ES2542277A1 (es) * 2014-12-23 2015-08-03 Soler Gabinete De Ingeniería S.L. Sistema de detección y aviso de cólicos en equinos
CN110851414A (zh) * 2019-11-06 2020-02-28 云南艾拓信息技术有限公司 一种以聚类法进行边界数据分析的方法及其系统
US10591391B2 (en) 2006-06-14 2020-03-17 Verinata Health, Inc. Diagnosis of fetal abnormalities using polymorphisms including short tandem repeats
US10704090B2 (en) 2006-06-14 2020-07-07 Verinata Health, Inc. Fetal aneuploidy detection by sequencing
CN114838796A (zh) * 2022-04-29 2022-08-02 合肥市正茂科技有限公司 视觉辅助车辆动态称重方法与称重系统

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11450121B2 (en) * 2017-06-27 2022-09-20 The Regents Of The University Of California Label-free digital brightfield analysis of nucleic acid amplification
US11902129B1 (en) 2023-03-24 2024-02-13 T-Mobile Usa, Inc. Vendor-agnostic real-time monitoring of telecommunications networks

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1992010588A1 (fr) 1990-12-06 1992-06-25 Affymax Technologies N.V. Mise en sequence par hybridation d'un acide nucleique cible en une matrice d'oligonucleotides determines
US5445934A (en) 1989-06-07 1995-08-29 Affymax Technologies N.V. Array of oligonucleotides on a solid substrate
WO1997010365A1 (fr) 1995-09-15 1997-03-20 Affymax Technologies N.V. Mesure de l'expression par l'hybridation avec des systemes tres denses d'oligonucleotides
US5677195A (en) 1991-11-22 1997-10-14 Affymax Technologies N.V. Combinatorial strategies for polymer synthesis
US5807522A (en) 1994-06-17 1998-09-15 The Board Of Trustees Of The Leland Stanford Junior University Methods for fabricating microarrays of biological samples
US5831070A (en) 1995-02-27 1998-11-03 Affymetrix, Inc. Printing oligonucleotide arrays using deprotection agents solely in the vapor phase
US5965366A (en) * 1994-07-22 1999-10-12 The United States Of America As Represented By The Secretary, Department Of Health And Human Services Methods of identifying patients having an altered immune status
WO2001008188A1 (fr) * 1999-07-26 2001-02-01 Moeller Gmbh Procede de commande electronique d'entrainement
WO2001016859A2 (fr) * 1999-08-27 2001-03-08 Pljvita Corporation Systeme et methode d'evaluation genomique et proteomique de maladies animales par comparaison de profils d'expression
WO2001020533A2 (fr) * 1999-09-15 2001-03-22 Luminex Corporation Creation d'une base de donnees biochimiques et procedes d'utilisation
WO2001025473A1 (fr) * 1999-06-28 2001-04-12 Source Precision Medicine, Inc. Systemes et techniques permettant de caracteriser un etat ou un agent biologique par expression genique calibree
WO2001028415A1 (fr) * 1999-10-15 2001-04-26 Dodds W Jean Diagnostic de sante animale
US20010018182A1 (en) * 1998-06-19 2001-08-30 Rosetta Inpharmatics, Inc. Methods of monitoring disease states and therapies using gene expression profiles
WO2002033520A2 (fr) * 2000-10-18 2002-04-25 Genomic Health, Inc. Systeme d'information sur le profil genomique et procede associe
WO2002090579A1 (fr) * 2001-05-04 2002-11-14 Genomics Research Partners Pty Ltd Systeme base sur la bioinformatique permettant d'evaluer la condition physique d'un animal de competition par analyse de l'expression de l'acide nucleique

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000047996A2 (fr) * 1999-02-09 2000-08-17 Illumina, Inc. Traitement informatise d'informations dans des groupements ordonnes de maniere aleatoire
US6647341B1 (en) * 1999-04-09 2003-11-11 Whitehead Institute For Biomedical Research Methods for classifying samples and ascertaining previously unknown classes
US6180351B1 (en) * 1999-07-22 2001-01-30 Agilent Technologies Inc. Chemical array fabrication with identifier
AU2002213134A1 (en) * 2000-10-13 2002-04-22 The Brigham And Women's Hospital, Inc. Method and display for multivariate classification
WO2002073504A1 (fr) * 2001-03-14 2002-09-19 Gene Logic, Inc. Systeme et procede d'extraction et d'utilisation de donnees d'expression genique provenant de multiples sources

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5445934A (en) 1989-06-07 1995-08-29 Affymax Technologies N.V. Array of oligonucleotides on a solid substrate
WO1992010588A1 (fr) 1990-12-06 1992-06-25 Affymax Technologies N.V. Mise en sequence par hybridation d'un acide nucleique cible en une matrice d'oligonucleotides determines
US5677195A (en) 1991-11-22 1997-10-14 Affymax Technologies N.V. Combinatorial strategies for polymer synthesis
US5807522A (en) 1994-06-17 1998-09-15 The Board Of Trustees Of The Leland Stanford Junior University Methods for fabricating microarrays of biological samples
US5965366A (en) * 1994-07-22 1999-10-12 The United States Of America As Represented By The Secretary, Department Of Health And Human Services Methods of identifying patients having an altered immune status
US5831070A (en) 1995-02-27 1998-11-03 Affymetrix, Inc. Printing oligonucleotide arrays using deprotection agents solely in the vapor phase
WO1997010365A1 (fr) 1995-09-15 1997-03-20 Affymax Technologies N.V. Mesure de l'expression par l'hybridation avec des systemes tres denses d'oligonucleotides
US20010018182A1 (en) * 1998-06-19 2001-08-30 Rosetta Inpharmatics, Inc. Methods of monitoring disease states and therapies using gene expression profiles
WO2001025473A1 (fr) * 1999-06-28 2001-04-12 Source Precision Medicine, Inc. Systemes et techniques permettant de caracteriser un etat ou un agent biologique par expression genique calibree
WO2001008188A1 (fr) * 1999-07-26 2001-02-01 Moeller Gmbh Procede de commande electronique d'entrainement
WO2001016859A2 (fr) * 1999-08-27 2001-03-08 Pljvita Corporation Systeme et methode d'evaluation genomique et proteomique de maladies animales par comparaison de profils d'expression
WO2001020533A2 (fr) * 1999-09-15 2001-03-22 Luminex Corporation Creation d'une base de donnees biochimiques et procedes d'utilisation
WO2001028415A1 (fr) * 1999-10-15 2001-04-26 Dodds W Jean Diagnostic de sante animale
WO2002033520A2 (fr) * 2000-10-18 2002-04-25 Genomic Health, Inc. Systeme d'information sur le profil genomique et procede associe
WO2002090579A1 (fr) * 2001-05-04 2002-11-14 Genomics Research Partners Pty Ltd Systeme base sur la bioinformatique permettant d'evaluer la condition physique d'un animal de competition par analyse de l'expression de l'acide nucleique

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
BIOINFO., vol. 18, no. 2, February 2002 (2002-02-01), pages 327 - 328 *
DATABASE MEDLINE [online] BUMM K.: "Utilizing and integrating gene expression microarray data in clinical research and data management", XP008093112, accession no. NCBI Database accession no. 11847084 *
DATABASE MEDLINE [online] RINGNER M. ET AL.: "Analyzing array data using supervised methods", XP003019100, Database accession no. 12052147 *
PHARMA., vol. 3, no. 3, May 2002 (2002-05-01), pages 403 - 415, XP008067366 *
See also references of EP1581658A4

Cited By (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8986966B2 (en) 2002-09-27 2015-03-24 The General Hospital Corporation Microfluidic device for cell separation and uses thereof
US10081014B2 (en) 2002-09-27 2018-09-25 The General Hospital Corporation Microfluidic device for cell separation and uses thereof
US11052392B2 (en) 2002-09-27 2021-07-06 The General Hospital Corporation Microfluidic device for cell separation and uses thereof
US8895298B2 (en) 2002-09-27 2014-11-25 The General Hospital Corporation Microfluidic device for cell separation and uses thereof
US8585971B2 (en) 2005-04-05 2013-11-19 The General Hospital Corporation Devices and method for enrichment and alteration of cells and other particles
US9174222B2 (en) 2005-04-05 2015-11-03 The General Hospital Corporation Devices and method for enrichment and alteration of cells and other particles
US9956562B2 (en) 2005-04-05 2018-05-01 The General Hospital Corporation Devices and method for enrichment and alteration of cells and other particles
US10786817B2 (en) 2005-04-05 2020-09-29 The General Hospital Corporation Devices and method for enrichment and alteration of cells and other particles
US8921102B2 (en) 2005-07-29 2014-12-30 Gpb Scientific, Llc Devices and methods for enrichment and alteration of circulating tumor cells and other particles
EP1976426A2 (fr) * 2006-01-19 2008-10-08 Samuel R. Valenti Systeme et procedes de diagnostic medical
EP1976426A4 (fr) * 2006-01-19 2011-01-19 Samuel R Valenti Systeme et procedes de diagnostic medical
US9273355B2 (en) 2006-06-14 2016-03-01 The General Hospital Corporation Rare cell analysis using sample splitting and DNA tags
US10591391B2 (en) 2006-06-14 2020-03-17 Verinata Health, Inc. Diagnosis of fetal abnormalities using polymorphisms including short tandem repeats
US8372584B2 (en) 2006-06-14 2013-02-12 The General Hospital Corporation Rare cell analysis using sample splitting and DNA tags
US10155984B2 (en) 2006-06-14 2018-12-18 The General Hospital Corporation Rare cell analysis using sample splitting and DNA tags
US10704090B2 (en) 2006-06-14 2020-07-07 Verinata Health, Inc. Fetal aneuploidy detection by sequencing
US8168389B2 (en) 2006-06-14 2012-05-01 The General Hospital Corporation Fetal cell analysis using sample splitting
US8137912B2 (en) 2006-06-14 2012-03-20 The General Hospital Corporation Methods for the diagnosis of fetal abnormalities
US9017942B2 (en) 2006-06-14 2015-04-28 The General Hospital Corporation Rare cell analysis using sample splitting and DNA tags
US11674176B2 (en) 2006-06-14 2023-06-13 Verinata Health, Inc Fetal aneuploidy detection by sequencing
US9347100B2 (en) 2006-06-14 2016-05-24 Gpb Scientific, Llc Rare cell analysis using sample splitting and DNA tags
US11781187B2 (en) 2006-06-14 2023-10-10 The General Hospital Corporation Rare cell analysis using sample splitting and DNA tags
EP2087451A4 (fr) * 2006-12-01 2013-09-11 Ameritox Ltd Procédé et appareil pour générer des rapports de toxicologie
EP2087451A2 (fr) * 2006-12-01 2009-08-12 Ameritox, Ltd. Procédé et appareil pour générer des rapports de toxicologie
WO2008129458A1 (fr) * 2007-04-18 2008-10-30 Koninklijke Philips Electronics N.V. Procédé appliqué éà des spectres de fréquences d'adn pour exploration de données
EP2071339A3 (fr) * 2007-12-12 2015-05-20 Sysmex Corporation Système pour fournir des informations de test animal et procédé pour fournir des informations de test animal
US10669585B2 (en) 2008-09-20 2020-06-02 The Board Of Trustees Of The Leland Stanford Junior University Noninvasive diagnosis of fetal aneuploidy by sequencing
US9353414B2 (en) 2008-09-20 2016-05-31 The Board Of Trustees Of The Leland Stanford Junior University Noninvasive diagnosis of fetal aneuploidy by sequencing
US9404157B2 (en) 2008-09-20 2016-08-02 The Board Of Trustees Of The Leland Stanford Junior University Noninvasive diagnosis of fetal aneuploidy by sequencing
US8195415B2 (en) 2008-09-20 2012-06-05 The Board Of Trustees Of The Leland Stanford Junior University Noninvasive diagnosis of fetal aneuploidy by sequencing
US8682594B2 (en) 2008-09-20 2014-03-25 The Board Of Trustees Of The Leland Stanford Junior University Noninvasive diagnosis of fetal aneuploidy by sequencing
US8296076B2 (en) 2008-09-20 2012-10-23 The Board Of Trustees Of The Leland Stanford Junior University Noninvasive diagnosis of fetal aneuoploidy by sequencing
CN103314383A (zh) * 2010-11-01 2013-09-18 皇家飞利浦电子股份有限公司 包括专有测试的特许使用费的自动化代理的体外诊断测试
CN110838348A (zh) * 2010-11-01 2020-02-25 皇家飞利浦电子股份有限公司 包括专有测试的特许使用费的自动化代理的体外诊断测试
WO2012059839A3 (fr) * 2010-11-01 2012-09-07 Koninklijke Philips Electronics N.V. Test de diagnostic in vitro comprenant le courtage automatisé de paiements de royalties pour des tests propriétaires
CN105956398A (zh) * 2010-11-01 2016-09-21 皇家飞利浦电子股份有限公司 包括专有测试的特许使用费的自动化代理的体外诊断测试
WO2014201516A2 (fr) 2013-06-20 2014-12-24 Immunexpress Pty Ltd Identification de marqueur biologique
ES2542277A1 (es) * 2014-12-23 2015-08-03 Soler Gabinete De Ingeniería S.L. Sistema de detección y aviso de cólicos en equinos
WO2016102730A1 (fr) * 2014-12-23 2016-06-30 Soler Gabinete De Ingenieria, S.L. Système de détection et d'avertissement de coliques chez des équidés
CN110851414A (zh) * 2019-11-06 2020-02-28 云南艾拓信息技术有限公司 一种以聚类法进行边界数据分析的方法及其系统
CN110851414B (zh) * 2019-11-06 2023-05-05 云南艾拓信息技术有限公司 一种以聚类法进行边界数据分析的方法及其系统
CN114838796A (zh) * 2022-04-29 2022-08-02 合肥市正茂科技有限公司 视觉辅助车辆动态称重方法与称重系统
CN114838796B (zh) * 2022-04-29 2023-06-09 合肥市正茂科技有限公司 视觉辅助车辆动态称重方法与称重系统

Also Published As

Publication number Publication date
NZ539578A (en) 2007-12-21
AU2002952696A0 (en) 2002-11-28
CA2505151A1 (fr) 2004-05-27
EP1581658A1 (fr) 2005-10-05
CA2505151C (fr) 2013-07-30
EP1581658A4 (fr) 2007-12-26

Similar Documents

Publication Publication Date Title
US20110231106A1 (en) Status determination
US20040236516A1 (en) Bioinformatics based system for assessing a condition of a performance animal by analysing nucleic acid expression
CA2505151C (fr) Evaluation d'etat
Der et al. Single cell RNA sequencing to dissect the molecular heterogeneity in lupus nephritis
Quackenbush Microarray analysis and tumor classification
US7729864B2 (en) Computer systems and methods for identifying surrogate markers
US20230114581A1 (en) Systems and methods for predicting homologous recombination deficiency status of a specimen
US20200347444A1 (en) Gene-expression profiling with reduced numbers of transcript measurements
EP2371969B1 (fr) Identification de tumeurs
AU2016341281A1 (en) Methods and systems for assessing infertility as a result of declining ovarian reserve and function
US20110312520A1 (en) Methods and compositions for diagnosing conditions
EP2556185B1 (fr) Profilage de l'expression génique faisant appel à un nombre réduit de mesures concernant des transcrits
EP1425412A2 (fr) Evaluation sanguine de maux
CN106460045B (zh) 人类基因组常见拷贝数变异用于癌症易感风险评估
CN110770839A (zh) 来自未知基因型贡献者的dna混合物的精确计算分解的方法
Cobb et al. Injury in the era of genomics
Pool et al. Recovery of missing single-cell RNA-sequencing data with optimized transcriptomic references
König et al. Reliability of gene expression ratios for cDNA microarrays in multiconditional experiments with a reference design
AU2003275800B2 (en) Status determination
JP3440270B2 (ja) 生理現象関連遺伝子情報の提供装置、その提供方法及びその提供を行うためのプログラムを格納した記録媒体
Gurgul et al. Application of Nanopore Sequencing for High Throughput Genotyping in Horses
AU2002252820A1 (en) Bioinformatics based system for assessing a condition of a performance animal by analysing nucleic acid expression
Knudtson et al. The ABRF MARG microarray survey 2005: taking the pulse of the microarray field
Mansfield et al. Arrays Amaze. Unraveling the transcriptisome in transplantation
Jain et al. Molecular diagnostics as basis of personalized medicine

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2003275800

Country of ref document: AU

Ref document number: 539578

Country of ref document: NZ

WWE Wipo information: entry into national phase

Ref document number: 2505151

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 2003810913

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 2003810913

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP